cs.CV(2025-12-08)
📊 共 11 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning | 提出CUHK-X多模态数据集,用于人体活动场景理解与推理,并构建基准测试。 | scene understanding spatiotemporal large language model | ✅ | |
| 2 | COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D Supervision | COREA:通过双向3D-to-3D监督对可重光照3D高斯和SDF进行粗到精的3D表示对齐 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 3 | More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery | 评估SAM 3在机器人手术中的分割、3D感知与重建能力 | depth estimation monocular depth sam 3D | ||
| 4 | MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation | MuSASplat:轻量级多尺度自适应实现高效稀疏视角3D高斯溅射 | 3D gaussian splatting gaussian splatting splatting | ||
| 5 | From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images | 提出基于生成模型的城市摄影测量方法,从极端倾斜卫星图像合成地面视角。 | 3DGS NeRF height map | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction | UltrasODM:用于3D自由手超声重建的双流光流Mamba网络 | Mamba optical flow | ✅ | |
| 7 | Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models | 提出TRR框架,通过策略引导自反思提升大型视觉语言模型的安全性 | reinforcement learning multimodal | ✅ | |
| 8 | Deterministic World Models for Verification of Closed-loop Vision-based Systems | 提出确定性世界模型,用于验证基于视觉的闭环系统,提升验证精度。 | world model | ||
| 9 | Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes | Lang3D-XL:通过语言嵌入3D高斯模型实现大规模场景的语义理解 | distillation multimodal |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models | 提出一种无训练的自校正框架,用于减少视觉-语言模型中的幻觉问题。 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery Detection | 提出多方向相似性网络MSN,用于检测手工和深度合成的复制-粘贴图像篡改。 | manipulation |