cs.CV（2025-04-20）

📊 共 16 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation	提出NVSMask3D，利用相机位姿插值和硬视觉提示实现3D开放词汇实例分割	3D gaussian splatting gaussian splatting splatting
2	VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control	VGNC：通过验证引导的高斯数量控制减少稀疏视角3DGS的过拟合	3D gaussian splatting 3DGS gaussian splatting
3	Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	提出DVBench：用于评估视觉语言模型在安全关键驾驶场景理解能力的综合基准	scene understanding large language model multimodal	✅
4	Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding	Metamon-GS：通过方差引导的密度增加和光照编码增强3D高斯表达能力	3D gaussian splatting 3DGS gaussian splatting
5	Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction	BA-Track：结合Bundle Adjustment与3D跟踪，实现动态场景下的精确重建	scene reconstruction
6	Seurat: From Moving Points to Depth	Seurat：利用运动点轨迹推断单目视频深度变化	depth estimation spatial relationship

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
7	Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens	提出基于离散扩散时间步Token的生成式多模态预训练方法，提升多模态理解与生成能力。	large language model multimodal	✅
8	Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark	提出Video-MMLU，用于评估LMMs在多学科讲座理解中的能力	large language model multimodal
9	LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection	提出LSP-ST，通过梯形形状偏置的侧调优实现鲁棒的红外小目标检测。	foundation model
10	LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation	LGD：利用生成式描述增强零样本指代图像分割的区域-文本匹配	large language model
11	ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task	提出ResNetVLLM，用于零样本视频理解的多模态视觉语言模型	large language model
12	ResNetVLLM-2: Addressing ResNetVLLM's Multi-Modal Hallucinations	ResNetVLLM-2：通过忠实度检测和RAG缓解ResNetVLLM中的多模态幻觉问题	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension	提出Relation-R1，通过认知链式思考引导的强化学习，统一解决关系理解问题。	reinforcement learning large language model chain-of-thought
14	FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models	FlowLoss：面向视频扩散模型的动态光流条件损失策略，提升时序一致性	flow matching optical flow
15	Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline	提出闭环模拟器与因果基准，解决模仿学习规划器的“抄袭”问题	reinforcement learning imitation learning

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	VM-BHINet:Vision Mamba Bimanual Hand Interaction Network for 3D Interacting Hand Mesh Recovery From a Single RGB Image	提出VM-BHINet，利用Vision Mamba解决单RGB图像中的3D交互手部网格重建问题	bi-manual Mamba SSM

⬅️ 返回 cs.CV 首页 · 🏠 返回主页