cs.CV(2025-10-22)
📊 共 9 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (4 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (3)
支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting | 提出MoE-GS,利用专家混合模型提升动态高斯溅射的渲染质量与效率。 | distillation 3D gaussian splatting gaussian splatting | ||
| 2 | X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation Learning | 提出基于跨视角对比学习的X-Ego方法,用于获取团队级战术态势感知 | representation learning contrastive learning egocentric | ✅ | |
| 3 | Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks | 提出Dream4Drive框架以提升自动驾驶感知任务的合成数据生成 | world model multimodal | ✅ | |
| 4 | From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction | 提出策略世界模型,融合世界建模与轨迹规划,提升自动驾驶决策能力 | world model | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning | 提出PruneHal,通过自适应KV缓存剪枝减少多模态大语言模型中的幻觉问题 | large language model | ||
| 6 | A Flow Model with Low-Rank Transformers for Incomplete Multimodal Survival Analysis | 提出一种基于低秩Transformer的Flow模型,用于不完全多模态生存分析。 | multimodal | ||
| 7 | Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images | 提出STAR-64K数据集和两阶段训练框架,提升多模态大语言模型在结构化和抽象推理上的能力。 | large language model chain-of-thought |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses | 提出基于梯度的3DGS滤波方法,解决极端视角下的新视角合成伪影问题 | 3D gaussian splatting 3DGS gaussian splatting | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | FootFormer: Estimating Stability from Visual Input | FootFormer:一种从视觉输入估计人体稳定性的跨模态方法 | human motion | ✅ |