cs.CV(2025-09-19)
📊 共 41 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (14 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (13 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (10 🔗1)
支柱七:动作重定向 (Motion Retargeting) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (13 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild | 提出MS-GS,利用多外观3D高斯溅射解决野外稀疏视图场景重建问题 | depth estimation monocular depth 3D gaussian splatting | ||
| 16 | Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval | 提出GVR,通过视图检索实现3D高斯场景的零样本视觉定位 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 17 | FingerSplat: Contactless Fingerprint 3D Reconstruction and Generation based on 3D Gaussian Splatting | 提出基于3D高斯溅射的非接触式指纹三维重建与生成方法 | 3D gaussian splatting gaussian splatting splatting | ||
| 18 | GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading | GS-Scale:通过主机卸载解锁大规模3D高斯溅射训练 | 3D gaussian splatting gaussian splatting splatting | ||
| 19 | Sparse Multiview Open-Vocabulary 3D Detection | 提出一种稀疏多视角开放词汇3D检测方法,无需3D训练,性能优异。 | open-vocabulary open vocabulary foundation model | ||
| 20 | StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes | StereoAdapter:一种用于水下场景立体深度估计的自适应框架 | depth estimation stereo depth metric depth | ✅ | |
| 21 | RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation | RangeSAM:探索视觉基础模型在激光雷达Range-View分割中的潜力 | scene understanding foundation model multimodal | ||
| 22 | Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation | 单目深度估计可解释性研究:通过扰动分析与保真度评估提升模型透明度 | depth estimation monocular depth | ||
| 23 | Towards Sharper Object Boundaries in Self-Supervised Depth Estimation | 提出基于混合分布的自监督深度估计,显著提升物体边界清晰度 | depth estimation monocular depth scene understanding | ||
| 24 | Camera Splatting for Continuous View Optimization | 提出Camera Splatting,通过连续视角优化实现高质量新视角合成 | 3D gaussian splatting gaussian splatting splatting | ||
| 25 | 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction | 提出混合2D/3D高斯平面表示,提升纹理缺失场景的三维重建质量。 | depth estimation scene reconstruction | ||
| 26 | RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars | RadarGaussianDet3D:一种高效的基于高斯分布的4D毫米波雷达3D目标检测器 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 27 | Global Regulation and Excitation via Attention Tuning for Stereo Matching | 提出GREAT框架,通过注意力机制增强立体匹配全局上下文信息,提升在病态区域的匹配精度。 | scene flow | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching | 提出DistillMatch,利用视觉基础模型的知识蒸馏进行多模态图像匹配。 | distillation foundation model multimodal | ||
| 29 | BaseReward: A Strong Baseline for Multimodal Reward Model | BaseReward:多模态奖励模型新基准,为MLLM对齐提供有效方案。 | reinforcement learning RLHF large language model | ||
| 30 | UNIV: Unified Foundation Model for Infrared and Visible Modalities | 提出UNIV,通过跨模态对比学习解决红外-可见光融合中的模式偏见问题 | contrastive learning foundation model | ||
| 31 | DC-Mamba: Bi-temporal deformable alignment and scale-sparse enhancement for remote sensing change detection | DC-Mamba:面向遥感变化检测,提出双时态可变形对齐与尺度稀疏增强方法 | Mamba SSM state space model | ||
| 32 | Random Direct Preference Optimization for Radiography Report Generation | 提出基于随机直接偏好优化的胸片报告生成方法,提升临床指标。 | DPO direct preference optimization large language model | ||
| 33 | Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization | 提出课程引导的群相对策略优化算法,提升自动驾驶目标检测的鲁棒性。 | reinforcement learning reward design large language model | ||
| 34 | SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models | SAMPO:基于运动提示的分尺度自回归生成世界模型,提升视频预测质量与推理效率。 | world model scene understanding spatiotemporal | ||
| 35 | ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding | ChronoForge-RL:通过强化学习的时序锻造增强视频理解 | reinforcement learning contrastive learning distillation | ||
| 36 | Enhancing WSI-Based Survival Analysis with Report-Auxiliary Self-Distillation | 提出Rasa框架,利用报告辅助自蒸馏增强WSI生存分析,提升癌症预后预测。 | distillation large language model | ✅ | |
| 37 | BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent | 提出BTL-UI模型,模拟人脑认知过程,提升GUI智能体的交互能力。 | reinforcement learning large language model multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 38 | See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model | 提出SEE&TREK,增强多模态大语言模型在纯视觉下的空间理解能力 | motion reconstruction large language model multimodal | ||
| 39 | Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution | 融合SAM2和Cutie优势,提出SCOPE模型,提升视频目标分割的鲁棒性 | motion prediction | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 40 | SGMAGNet: A Baseline Model for 3D Cloud Phase Structure Reconstruction on a New Passive Active Satellite Benchmark | SGMAGNet:用于三维云相结构重建的被动主动卫星基准模型 | spatiotemporal multimodal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 41 | Simulated Cortical Magnification Supports Self-Supervised Object Learning | 模拟皮层放大提升自监督物体学习性能 | egocentric |