cs.CV(2024-10-24)

📊 共 22 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks 提出MedRegA,首个双语区域感知医学多模态大语言模型,提升医学图像理解与交互。 large language model multimodal
2 Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant 提出VisTEL和KaLMA,显著提升Text-KVQA任务的性能。 multimodal
3 VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks 提出VideoWebArena,用于评估长上下文多模态Agent的视频理解Web任务能力。 multimodal
4 Unbounded: A Generative Infinite Game of Character Life Simulation 提出Unbounded,一个基于生成模型的无限角色生命模拟游戏 large language model instruction following
5 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Ferret-UI 2:一种用于跨平台通用用户界面理解的多模态大语言模型 large language model multimodal
6 SegLLM: Multi-round Reasoning Segmentation SegLLM:提出一种多轮交互推理分割模型,利用对话记忆增强LLM分割能力。 multimodal
7 ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval 提出ChatSearch数据集与生成式检索模型ChatSearcher,用于通用对话式图像检索。 multimodal
8 PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding PESFormer:基于直接时间戳编码提升宏表情和微表情定位性能 TAMP

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
9 Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis 提出无外部监督的双目引导3D高斯点云合成方法以解决稀疏视图合成问题 3D gaussian splatting gaussian splatting splatting
10 Sort-free Gaussian Splatting via Weighted Sum Rendering 提出基于加权和渲染的无排序高斯溅射方法,提升移动端渲染性能。 3D gaussian splatting 3DGS gaussian splatting
11 3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation 提出3D-Adapter,为图像扩散模型注入3D几何感知能力,提升三维生成质量。 gaussian splatting splatting geometric consistency
12 MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision MoGe:通过优化训练监督,解锁开放域图像的精确单目几何估计 MoGe
13 Segmentation-aware Prior Assisted Joint Global Information Aggregated 3D Building Reconstruction 提出分割感知的先验辅助全局信息聚合方法,提升弱纹理区域三维重建质量 depth estimation geometric consistency
14 Real-time 3D-aware Portrait Video Relighting 提出基于NeRF的实时3D人像视频光照重定向方法,实现视角和光照的同步调整。 NeRF neural radiance field

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
15 Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing 提出MotionCLR:一种基于注意力机制的运动扩散模型,用于交互式运动生成与无训练编辑。 motion diffusion model motion diffusion motion generation
16 Rectified Diffusion Guidance for Conditional Generation 提出修正扩散引导(ReCFG)方法,解决条件生成中CFG的分布偏移问题,提升生成质量。 classifier-free guidance

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
17 A Cranial-Feature-Based Registration Scheme for Robotic Micromanipulation Using a Microscopic Stereo Camera System 提出基于颅骨特征的配准方案,用于显微立体视觉引导的机器人微操作 manipulation
18 Large Spatial Model: End-to-end Unposed Images to Semantic 3D 提出Large Spatial Model,实现从无位姿图像到语义3D的端到端重建。 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
19 Rigid Single-Slice-in-Volume registration via rotation-equivariant 2D/3D feature matching 提出一种基于旋转等变特征匹配的刚性单切片-体配准方法 feature matching
20 Classifying Bicycle Infrastructure Using On-Bike Street-Level Images 提出一种基于车载图像的时序分析自行车基础设施分类系统 database matching

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
21 Interpretable Representation Learning from Videos using Nonlinear Priors 提出非线性先验以解决视频可解释性表示学习问题 representation learning

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
22 BIFRÖST: 3D-Aware Image compositing with Language Instructions Bifröst:基于语言指令的3D感知图像合成框架,解决复杂空间关系建模问题 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页