cs.CV(2025-07-23)
📊 共 7 篇论文
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱九:具身大模型 (Embodied Foundation Models) (2)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | URPO: A Unified Reward & Policy Optimization Framework for Large Language Models | URPO:统一奖励与策略优化框架,提升大语言模型对齐效果 | reinforcement learning large language model instruction following | ||
| 2 | From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding | 提出基于真实扫描的场景理解方法,提升LLM场景编辑和机器人策略学习效果 | policy learning scene understanding | ||
| 3 | Eyes Will Shut: A Vision-Based Next GPS Location Prediction Model by Reinforcement Learning from Visual Map Feed Back | 提出基于视觉地图反馈强化学习的下一GPS位置预测模型VLMLocPredictor | reinforcement learning | ||
| 4 | PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models | PIG-Nav:基于预训练图像的目标导航模型关键技术洞察 | representation learning foundation model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Dual-branch Prompting for Multimodal Machine Translation | 提出D2P-MMT,利用双分支Prompt和扩散模型提升多模态机器翻译的鲁棒性。 | multimodal | ||
| 6 | Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras | 提出Talk2Event基准和EventRefer框架,用于事件相机驱动的动态场景语言理解。 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Monocular Semantic Scene Completion via Masked Recurrent Networks | 提出基于掩码循环网络的单目语义场景补全方法,提升复杂场景补全效果。 | depth estimation |