cs.CV(2025-04-20)

📊 共 16 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation 提出NVSMask3D,利用相机位姿插值和硬视觉提示实现3D开放词汇实例分割 3D gaussian splatting gaussian splatting splatting
2 VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control VGNC:通过验证引导的高斯数量控制减少稀疏视角3DGS的过拟合 3D gaussian splatting 3DGS gaussian splatting
3 Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding 提出DVBench:用于评估视觉语言模型在安全关键驾驶场景理解能力的综合基准 scene understanding large language model multimodal
4 Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding Metamon-GS:通过方差引导的密度增加和光照编码增强3D高斯表达能力 3D gaussian splatting 3DGS gaussian splatting
5 Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction BA-Track:结合Bundle Adjustment与3D跟踪,实现动态场景下的精确重建 scene reconstruction
6 Seurat: From Moving Points to Depth Seurat:利用运动点轨迹推断单目视频深度变化 depth estimation spatial relationship

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
7 Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens 提出基于离散扩散时间步Token的生成式多模态预训练方法,提升多模态理解与生成能力。 large language model multimodal
8 Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark 提出Video-MMLU,用于评估LMMs在多学科讲座理解中的能力 large language model multimodal
9 LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection 提出LSP-ST,通过梯形形状偏置的侧调优实现鲁棒的红外小目标检测。 foundation model
10 LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation LGD:利用生成式描述增强零样本指代图像分割的区域-文本匹配 large language model
11 ResNetVLLM -- Multi-modal Vision LLM for the Video Understanding Task 提出ResNetVLLM,用于零样本视频理解的多模态视觉语言模型 large language model
12 ResNetVLLM-2: Addressing ResNetVLLM's Multi-Modal Hallucinations ResNetVLLM-2:通过忠实度检测和RAG缓解ResNetVLLM中的多模态幻觉问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
13 Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension 提出Relation-R1,通过认知链式思考引导的强化学习,统一解决关系理解问题。 reinforcement learning large language model chain-of-thought
14 FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models FlowLoss:面向视频扩散模型的动态光流条件损失策略,提升时序一致性 flow matching optical flow
15 Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline 提出闭环模拟器与因果基准,解决模仿学习规划器的“抄袭”问题 reinforcement learning imitation learning

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 VM-BHINet:Vision Mamba Bimanual Hand Interaction Network for 3D Interacting Hand Mesh Recovery From a Single RGB Image 提出VM-BHINet,利用Vision Mamba解决单RGB图像中的3D交互手部网格重建问题 bi-manual Mamba SSM

⬅️ 返回 cs.CV 首页 · 🏠 返回主页