cs.CV(2025-10-29)
📊 共 21 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (5)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (2 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction | 提出D$^2$GS,一种无需激光雷达的城市场景高精度重建方法。 | metric depth gaussian splatting splatting | ||
| 9 | LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation | 提出LangHOPS,首个基于MLLM的开放词汇层级物体部件分割框架。 | open-vocabulary open vocabulary large language model | ||
| 10 | Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments | 提出视觉-语言融合框架,解决真实场景下零样本场景理解难题 | scene understanding large language model multimodal | ||
| 11 | SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments | SPADE:水下零样本单目深度估计的稀疏自适应深度估计器 | depth estimation monocular depth | ||
| 12 | EA3D: Online Open-World 3D Object Extraction from Streaming Videos | EA3D:从视频流中在线提取开放世界3D对象,实现几何重建与场景理解 | visual odometry scene understanding |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models | RT-DETRv4:利用视觉基础模型,无痛提升实时目标检测性能 | distillation foundation model | ||
| 14 | AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians | 提出基于Atlanta-world引导的隐式结构高斯溅射,实现室内外场景高精度重建。 | world model gaussian splatting splatting | ||
| 15 | Larger Hausdorff Dimension in Scanning Pattern Facilitates Mamba-Based Methods in Low-Light Image Enhancement | 提出基于Hilbert扫描Mamba的低光照图像增强方法,提升图像细节和视觉质量 | Mamba |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA | 提出StreamingCoT数据集,用于流视频问答中的时序动态理解和多模态思维链推理。 | spatiotemporal large language model multimodal | ✅ | |
| 17 | Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples | 提出基于MDP的骨骼动作识别信息样本选择模型,提升有限样本下的识别精度。 | spatiotemporal |
🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection | 提出VDRP框架,解决零样本HOI检测中视觉多样性和区域感知问题。 | human-object interaction HOI | ✅ | |
| 19 | Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer | 提出Brain-IT以解决fMRI图像重建的信度问题 | interaction transformer |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks | 综述多模态空间推理大模型,并构建开放基准评测体系 | egocentric spatial relationship embodied AI | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Diffusion-Driven Progressive Target Manipulation for Source-Free Domain Adaptation | 提出扩散驱动的渐进式目标域操控方法,解决无源域自适应问题。 | manipulation |