cs.CV（2024-10-24）

📊 共 22 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (6) 支柱四：生成式动作 (Generative Motion) (2 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱二：RL算法与架构 (RL & Architecture) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	提出MedRegA，首个双语区域感知医学多模态大语言模型，提升医学图像理解与交互。	large language model multimodal
2	Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant	提出VisTEL和KaLMA，显著提升Text-KVQA任务的性能。	multimodal
3	VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks	提出VideoWebArena，用于评估长上下文多模态Agent的视频理解Web任务能力。	multimodal
4	Unbounded: A Generative Infinite Game of Character Life Simulation	提出Unbounded，一个基于生成模型的无限角色生命模拟游戏	large language model instruction following
5	Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Ferret-UI 2：一种用于跨平台通用用户界面理解的多模态大语言模型	large language model multimodal
6	SegLLM: Multi-round Reasoning Segmentation	SegLLM：提出一种多轮交互推理分割模型，利用对话记忆增强LLM分割能力。	multimodal
7	ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval	提出ChatSearch数据集与生成式检索模型ChatSearcher，用于通用对话式图像检索。	multimodal	✅
8	PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding	PESFormer：基于直接时间戳编码提升宏表情和微表情定位性能	TAMP

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis	提出无外部监督的双目引导3D高斯点云合成方法以解决稀疏视图合成问题	3D gaussian splatting gaussian splatting splatting
10	Sort-free Gaussian Splatting via Weighted Sum Rendering	提出基于加权和渲染的无排序高斯溅射方法，提升移动端渲染性能。	3D gaussian splatting 3DGS gaussian splatting
11	3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation	提出3D-Adapter，为图像扩散模型注入3D几何感知能力，提升三维生成质量。	gaussian splatting splatting geometric consistency
12	MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision	MoGe：通过优化训练监督，解锁开放域图像的精确单目几何估计	MoGe
13	Segmentation-aware Prior Assisted Joint Global Information Aggregated 3D Building Reconstruction	提出分割感知的先验辅助全局信息聚合方法，提升弱纹理区域三维重建质量	depth estimation geometric consistency
14	Real-time 3D-aware Portrait Video Relighting	提出基于NeRF的实时3D人像视频光照重定向方法，实现视角和光照的同步调整。	NeRF neural radiance field

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing	提出MotionCLR：一种基于注意力机制的运动扩散模型，用于交互式运动生成与无训练编辑。	motion diffusion model motion diffusion motion generation
16	Rectified Diffusion Guidance for Conditional Generation	提出修正扩散引导（ReCFG）方法，解决条件生成中CFG的分布偏移问题，提升生成质量。	classifier-free guidance	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
17	A Cranial-Feature-Based Registration Scheme for Robotic Micromanipulation Using a Microscopic Stereo Camera System	提出基于颅骨特征的配准方案，用于显微立体视觉引导的机器人微操作	manipulation
18	Large Spatial Model: End-to-end Unposed Images to Semantic 3D	提出Large Spatial Model，实现从无位姿图像到语义3D的端到端重建。	manipulation

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Rigid Single-Slice-in-Volume registration via rotation-equivariant 2D/3D feature matching	提出一种基于旋转等变特征匹配的刚性单切片-体配准方法	feature matching
20	Classifying Bicycle Infrastructure Using On-Bike Street-Level Images	提出一种基于车载图像的时序分析自行车基础设施分类系统	database matching

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Interpretable Representation Learning from Videos using Nonlinear Priors	提出非线性先验以解决视频可解释性表示学习问题	representation learning

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	BIFRÖST: 3D-Aware Image compositing with Language Instructions	Bifröst：基于语言指令的3D感知图像合成框架，解决复杂空间关系建模问题	spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页