cs.CV（2025-12-08）

📊 共 11 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning	提出CUHK-X多模态数据集，用于人体活动场景理解与推理，并构建基准测试。	scene understanding spatiotemporal large language model	✅
2	COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D Supervision	COREA：通过双向3D-to-3D监督对可重光照3D高斯和SDF进行粗到精的3D表示对齐	3D gaussian splatting 3DGS gaussian splatting
3	More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery	评估SAM 3在机器人手术中的分割、3D感知与重建能力	depth estimation monocular depth sam 3D
4	MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation	MuSASplat：轻量级多尺度自适应实现高效稀疏视角3D高斯溅射	3D gaussian splatting gaussian splatting splatting
5	From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images	提出基于生成模型的城市摄影测量方法，从极端倾斜卫星图像合成地面视角。	3DGS NeRF height map	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
6	UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction	UltrasODM：用于3D自由手超声重建的双流光流Mamba网络	Mamba optical flow	✅
7	Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models	提出TRR框架，通过策略引导自反思提升大型视觉语言模型的安全性	reinforcement learning multimodal	✅
8	Deterministic World Models for Verification of Closed-loop Vision-based Systems	提出确定性世界模型，用于验证基于视觉的闭环系统，提升验证精度。	world model
9	Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes	Lang3D-XL：通过语言嵌入3D高斯模型实现大规模场景的语义理解	distillation multimodal

🔬 支柱九：具身大模型 (Embodied Foundation Models) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models	提出一种无训练的自校正框架，用于减少视觉-语言模型中的幻觉问题。	multimodal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
11	MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery Detection	提出多方向相似性网络MSN，用于检测手工和深度合成的复制-粘贴图像篡改。	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页