cs.CV（2024-06-26）

📊 共 27 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (11) 支柱三：空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱一：机器人控制 (Robot Control) (3) 支柱五：交互与反应 (Interaction & Reaction) (2) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱二：RL算法与架构 (RL & Architecture) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Refer-and-Ground Multimodal Large Language Model for Biomedicine	提出BiRD模型，首个用于生物医学图像Refer-and-Ground的多模态大语言模型	large language model multimodal
2	MammothModa: Multi-Modal Large Language Model	MammothModa：一种在基础模型上实现SOTA性能的多模态大语言模型	large language model multimodal
3	MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data	MUMU：利用文本到图像数据引导多模态图像生成	multimodal
4	Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI	提出地球观测基础模型评估基准，提升遥感任务的标签效率。	foundation model
5	Improving EO Foundation Models with Confidence Assessment for enhanced Semantic segmentation	提出CAS模型，通过置信度评估提升遥感影像语义分割性能	foundation model
6	On the Role of Visual Grounding in VQA	提出视觉 grounding 推理框架，揭示 VQA 模型中的 shortcut 学习问题	visual grounding
7	Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks	利用生成式AI和多模态视网膜图像，结合卷积神经网络辅助诊断阿尔茨海默病	multimodal
8	GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension	提出GUIDE数据集，用于指导性视频理解，填补任务级经验指导的空白。	foundation model TAMP
9	Chrono: A Simple Blueprint for Representing Time in MLLMs	Chrono：一种MLLM中表示时间的简单通用序列蓝图，提升视频时序定位性能	large language model multimodal
10	MatchTime: Towards Automatic Soccer Game Commentary Generation	提出MatchTime：面向自动足球赛事解说生成的时序对齐数据集与模型	TAMP
11	Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs	提出Speech2UnifiedExpressions，同步合成逼真口语情感面部和身体表情	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
12	On Scaling Up 3D Gaussian Splatting Training	Grendel：提出一种可扩展的3D高斯溅射训练分布式系统，解决高分辨率和大规模场景重建的内存瓶颈。	3D gaussian splatting 3DGS gaussian splatting	✅
13	GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting	提出GS-Octree以解决强光照下物体级3D重建问题	3D gaussian splatting gaussian splatting splatting
14	Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning	提出基于梯度信息的3D高斯Splats后处理剪枝方法，实现高效压缩。	3D gaussian splatting 3DGS gaussian splatting
15	DoubleTake: Geometry Guided Depth Estimation	DoubleTake：利用几何引导的深度估计，实现交互式速率下的高质量3D重建。	depth estimation scene reconstruction
16	VDG: Vision-Only Dynamic Gaussian for Driving Simulation	提出VDG：一种仅使用视觉信息的动态高斯模型，用于驾驶仿真。	gaussian splatting splatting scene reconstruction	✅
17	Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos	提出Dynamic Gaussian Marbles，用于单目视频的新视角合成，提升动态场景几何重建质量。	gaussian splatting splatting
18	Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference	提出一种基于单张参考图、无训练的通用3D相对姿态估计方法	semantic map

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
19	GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality	GaussianDreamerPro：提出高质量可操控的文本驱动3D高斯模型生成框架	manipulation dreamer 3D gaussian splatting
20	3D Feature Distillation with Object-Centric Priors	提出基于物体中心先验的3D特征蒸馏方法，提升单视角RGB-D图像的语言引导机器人操作性能。	manipulation distillation open-vocabulary
21	CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection	提出CTS框架，解决3D检测中Sim-to-Real无监督域自适应问题	sim-to-real

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Geometric Features Enhanced Human-Object Interaction Detection	提出GeoHOI，利用几何特征增强Transformer在遮挡场景下的人-物交互检测性能	human-object interaction HOI
23	Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models	提出基于空间约束扩散模型的人体感知3D场景生成方法，解决物体重叠问题。	human-object interaction human-scene interaction human motion

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation	提出EgoVideo自中心视觉基础模型，并成功应用于EgoVis挑战赛多个任务。	egocentric Ego4D foundation model	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model	提出Changen2：一种多时相遥感生成式变化基础模型，用于生成变化数据以训练变化检测模型。	spatiotemporal foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	On Reducing Activity with Distillation and Regularization for Energy Efficient Spiking Neural Networks	提出基于知识蒸馏和正则化的SNN训练方法，降低活动量并保持精度	distillation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image	DICE：首个单图端到端手脸交互形变捕捉方法	spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页