cs.CV（2025-02-17）

📊 共 18 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	NOTA: Multimodal Music Notation Understanding for Visual Large Language Model	提出NOTA数据集与NotaGPT模型，提升视觉大语言模型对乐谱的理解能力	large language model multimodal	✅
2	PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection	PRISM：一种免训练的多模态数据自剪枝选择方法，解决视觉特征分布各向异性问题。	large language model multimodal	✅
3	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	提出基于模块化视觉对比解码(MVCD)框架，提升LLM在多模态推理中的视觉感知能力。	large language model multimodal	✅
4	Token Communications: A Large Model-Driven Framework for Cross-modal Context-aware Semantic Communications	提出Token Communications框架，利用大模型驱动跨模态上下文感知语义通信。	large language model foundation model multimodal
5	Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics	构建心理测量框架，评估视觉语言模型的基本空间能力	embodied AI chain-of-thought
6	Intuitive physics understanding emerges from self-supervised pretraining on natural videos	利用自然视频自监督预训练，模型涌现直观物理理解能力	large language model multimodal
7	Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions	提出结合基础模型与组合搜索的算法，检测视觉模型中沿预定义维度存在的系统性弱点。	foundation model
8	Duo Streamers: A Streaming Gesture Recognition Framework	Duo Streamers：一种用于资源受限场景的流式手势识别框架	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	PUGS: Zero-shot Physical Understanding with Gaussian Splatting	PUGS：基于高斯溅射的零样本物理属性理解方法	gaussian splatting splatting	✅
10	From Open-Vocabulary to Vocabulary-Free Semantic Segmentation	提出Vocabulary-Free语义分割，无需预定义类别即可识别场景中的物体。	open-vocabulary open vocabulary
11	3D Gaussian Inpainting with Depth-Guided Cross-View Consistency	提出3DGIC，利用深度引导的跨视角一致性实现3D高斯Inpainting	3D gaussian splatting 3DGS gaussian splatting
12	Deep Neural Networks for Accurate Depth Estimation with Latent Space Features	提出基于潜在空间特征的深度神经网络，提升单目深度估计精度，尤其在室内场景。	depth estimation monocular depth scene reconstruction
13	HumanGif: Single-View Human Diffusion with Generative Prior	HumanGif：利用生成先验的单视图人像扩散模型，实现逼真的人体动画生成。	NeRF character animation

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
14	HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	HermesFlow：弥合多模态理解与生成能力差距的通用框架	DPO large language model foundation model	✅
15	High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation	提出时空一致高斯表示与GauMamba，用于高动态天气雷达序列预测	Mamba gaussian splatting splatting
16	video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model	提出video-SALMONN-o1，首个面向通用视频理解的推理增强型音视频大语言模型。	direct preference optimization large language model multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Diffusion Models without Classifier-free Guidance	提出Model-guidance训练扩散模型，无需Classifier-free guidance，提升训练和推理效率。	classifier-free guidance	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening	Diffusion-Sharpening：通过去噪轨迹锐化微调扩散模型，提升下游任务对齐。	trajectory optimization DPO	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页