cs.CV(2025-03-29)

📊 共 14 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7) 支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction FreeSplat++:面向高效室内场景重建的通用3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
2 NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations NeuralGS:结合神经场与3D高斯溅射,实现紧凑的3D表示 3D gaussian splatting 3DGS gaussian splatting
3 Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments 提出基于不确定性对齐的开放词汇语义分割方法,用于室内机器人场景理解 scene understanding open-vocabulary open vocabulary
4 Evaluating Compositional Scene Understanding in Multimodal Generative Models 评估多模态生成模型在组合场景理解中的能力,揭示其与人类的差距 scene understanding multimodal
5 Empowering Large Language Models with 3D Situation Awareness 提出基于情境感知的大语言模型3D场景理解方法,提升视角依赖任务性能。 scene understanding egocentric large language model
6 CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction CityGS-X:一种高效且几何精确的大规模场景重建可扩展架构 3D gaussian splatting gaussian splatting splatting
7 Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery 评估DeepSeek模型在机器人辅助手术视觉语言理解中的推理能力 scene understanding large language model multimodal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
8 Efficient Adaptation For Remote Sensing Visual Grounding 提出基于PEFT的遥感视觉定位高效适配方法,降低计算成本并保持精度。 foundation model visual grounding
9 Multi-label classification for multi-temporal, multi-spatial coral reef condition monitoring using vision foundation model with adapter learning 提出基于DINOv2-LoRA的珊瑚礁多标签分类方法,用于多时空条件下的珊瑚礁监测。 foundation model

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
10 FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video 提出FRAME,利用头戴式相机和设备位姿实现高质量人体运动捕捉 egocentric multimodal
11 When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning? 提出YesBut(V2)基准,评估大型视觉语言模型在矛盾幽默理解中的比较推理能力 HuMoR

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
12 Z-SASLM: Zero-Shot Style-Aligned SLI Blending Latent Manipulation Z-SASLM:零样本风格对齐的SLI混合潜在空间操控方法 manipulation
13 Skeletonization Quality Evaluation: Geometric Metrics for Point Cloud Analysis in Robotics 提出点云骨架化质量评估框架,用于提升机器人应用中的形状分析性能 manipulation

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
14 Efficient Explicit Joint-level Interaction Modeling with Mamba for Text-guided HOI Generation 提出高效的显式关节级交互建模方法以解决文本引导的HOI生成问题 Mamba human-object interaction HOI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页