cs.CV(2025-12-08)

📊 共 11 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
1 A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning 提出CUHK-X多模态数据集,用于人体活动场景理解与推理,并构建基准测试。 scene understanding spatiotemporal large language model
2 COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D Supervision COREA:通过双向3D-to-3D监督对可重光照3D高斯和SDF进行粗到精的3D表示对齐 3D gaussian splatting 3DGS gaussian splatting
3 More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery 评估SAM 3在机器人手术中的分割、3D感知与重建能力 depth estimation monocular depth sam 3D
4 MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation MuSASplat:轻量级多尺度自适应实现高效稀疏视角3D高斯溅射 3D gaussian splatting gaussian splatting splatting
5 From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images 提出基于生成模型的城市摄影测量方法,从极端倾斜卫星图像合成地面视角。 3DGS NeRF height map

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
6 UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction UltrasODM:用于3D自由手超声重建的双流光流Mamba网络 Mamba optical flow
7 Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models 提出TRR框架,通过策略引导自反思提升大型视觉语言模型的安全性 reinforcement learning multimodal
8 Deterministic World Models for Verification of Closed-loop Vision-based Systems 提出确定性世界模型,用于验证基于视觉的闭环系统,提升验证精度。 world model
9 Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes Lang3D-XL:通过语言嵌入3D高斯模型实现大规模场景的语义理解 distillation multimodal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)

#题目一句话要点标签🔗
10 Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models 提出一种无训练的自校正框架,用于减少视觉-语言模型中的幻觉问题。 multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
11 MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery Detection 提出多方向相似性网络MSN,用于检测手工和深度合成的复制-粘贴图像篡改。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页