cs.CV(2025-03-11)
📊 共 8 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SpurLens: Automatic Detection of Spurious Cues in Multimodal LLMs | SpurLens:自动检测多模态LLM中的虚假线索,提升模型可靠性 | large language model multimodal | ||
| 2 | Enhancing Sentiment Analysis through Multimodal Fusion: A BERT-DINOv2 Approach | 提出基于BERT和DINOv2的多模态情感分析框架,融合文本和图像信息以提升情感理解。 | multimodal | ||
| 3 | Open-World Skill Discovery from Unsegmented Demonstrations | 提出基于自监督学习的技能边界检测方法,从无分割演示视频中发现技能 | instruction following | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields | 提出NeRF-VIO以解决基于地图的视觉惯性定位问题 | VIO NeRF neural radiance field | ||
| 5 | Acoustic Neural 3D Reconstruction Under Pose Drift | 提出声学神经3D重建算法,联合优化场景表示和传感器位姿,解决位姿漂移问题。 | implicit representation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training | 提出GTR框架,解决RL训练VLM Agent时出现的思维坍塌问题 | reinforcement learning large language model chain-of-thought |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Keypoint Semantic Integration for Improved Feature Matching in Outdoor Agricultural Environments | 提出关键点语义融合方法,提升户外农业环境中特征匹配的鲁棒性 | feature matching |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness | DexGrasp Anything:提出物理约束感知的通用灵巧抓取扩散模型 | dexterous hand |