cs.CV(2025-09-19)
📊 共 40 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (15 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (13 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (10 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (15 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (13 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild | 提出MS-GS,利用多外观稀疏视图3D高斯溅射重建野外场景。 | depth estimation monocular depth 3D gaussian splatting | ||
| 17 | Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval | 提出GVR,通过视图检索实现3D高斯场景的零样本视觉定位 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 18 | FingerSplat: Contactless Fingerprint 3D Reconstruction and Generation based on 3D Gaussian Splatting | FingerSplat:基于3D高斯溅射的非接触式指纹3D重建与生成 | 3D gaussian splatting gaussian splatting splatting | ||
| 19 | GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading | GS-Scale:通过主机卸载解锁大规模3D高斯溅射训练 | 3D gaussian splatting gaussian splatting splatting | ||
| 20 | Sparse Multiview Open-Vocabulary 3D Detection | 提出一种稀疏多视角开放词汇3D检测方法,无需训练且性能优异 | open-vocabulary open vocabulary foundation model | ||
| 21 | StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes | 提出StereoAdapter以解决水下场景深度估计问题 | depth estimation stereo depth metric depth | ✅ | |
| 22 | RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation | RangeSAM:探索视觉基础模型在激光雷达Range-View分割中的潜力 | scene understanding foundation model multimodal | ||
| 23 | Shedding Light on Depth: Explainability Assessment in Monocular Depth Estimation | 单目深度估计可解释性研究:提出Attribution Fidelity评估解释可靠性 | depth estimation monocular depth | ||
| 24 | Towards Sharper Object Boundaries in Self-Supervised Depth Estimation | 提出基于混合分布的自监督深度估计,显著提升物体边界清晰度 | depth estimation monocular depth scene understanding | ||
| 25 | Camera Splatting for Continuous View Optimization | 提出Camera Splatting,通过连续视角优化实现高质量新视角合成 | 3D gaussian splatting gaussian splatting splatting | ||
| 26 | 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction | 提出混合2D/3D高斯平面表示,提升纹理缺失场景的三维重建质量。 | depth estimation scene reconstruction | ||
| 27 | RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars | RadarGaussianDet3D:基于高斯分布的4D毫米波雷达高效3D目标检测器 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 28 | Global Regulation and Excitation via Attention Tuning for Stereo Matching | 提出GREAT框架,通过注意力机制增强立体匹配全局上下文和几何信息,提升病态区域匹配精度。 | scene flow | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching | 提出DistillMatch,利用视觉基础模型的知识蒸馏进行多模态图像匹配 | distillation foundation model multimodal | ||
| 30 | BaseReward: A Strong Baseline for Multimodal Reward Model | BaseReward:多模态奖励模型新基准,为MLLM对齐提供实用指南 | reinforcement learning RLHF large language model | ||
| 31 | UNIV: Unified Foundation Model for Infrared and Visible Modalities | 提出UNIV以解决红外与可见光模态的跨模态对齐问题 | contrastive learning foundation model | ||
| 32 | DC-Mamba: Bi-temporal deformable alignment and scale-sparse enhancement for remote sensing change detection | DC-Mamba:遥感影像变化检测中,通过可变形对齐与尺度稀疏增强提升性能 | Mamba SSM state space model | ||
| 33 | Random Direct Preference Optimization for Radiography Report Generation | 提出基于随机直接偏好优化的胸片报告生成框架,提升临床性能。 | DPO direct preference optimization large language model | ||
| 34 | Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization | 提出课程引导的群组相对策略优化算法,提升自动驾驶目标检测的鲁棒性。 | reinforcement learning reward design large language model | ||
| 35 | SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models | SAMPO:基于运动提示的分尺度自回归生成世界模型,提升视频预测质量与效率。 | world model scene understanding spatiotemporal | ||
| 36 | ChronoForge-RL: Chronological Forging through Reinforcement Learning for Enhanced Video Understanding | ChronoForge-RL:通过强化学习的时序锻造,增强视频理解能力 | reinforcement learning contrastive learning distillation | ||
| 37 | Enhancing WSI-Based Survival Analysis with Report-Auxiliary Self-Distillation | 提出Rasa框架,利用报告辅助自蒸馏增强WSI的生存分析 | distillation large language model | ✅ | |
| 38 | BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent | 提出BTL-UI模型,模拟人脑认知过程,提升GUI智能体的交互能力。 | reinforcement learning large language model multimodal |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 39 | SGMAGNet: A Baseline Model for 3D Cloud Phase Structure Reconstruction on a New Passive Active Satellite Benchmark | SGMAGNet:用于三维云相结构重建的被动主动卫星基准模型 | spatiotemporal multimodal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 40 | Simulated Cortical Magnification Supports Self-Supervised Object Learning | 模拟皮层放大提升自监督物体学习性能 | egocentric |