cs.CV(2026-02-07)
📊 共 13 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention | ViCA:提出视觉信息仅通过交叉注意力交互的高效多模态大语言模型 | large language model multimodal visual grounding | ✅ | |
| 2 | Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning | Fine-R1:通过思维链推理提升多模态LLM在细粒度视觉识别中的性能 | large language model chain-of-thought | ✅ | |
| 3 | VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation | VISOR:用于语言驱动物体导航的视觉空间物体推理 | vision-language-action VLA large language model | ||
| 4 | SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens | SIGMA:通过多属性Token选择性交错生成,实现扩散模型的多条件组合编辑。 | multimodal | ||
| 5 | Revealing the Semantic Selection Gap in DINOv3 through Training-Free Few-Shot Segmentation | FSSDINO:揭示DINOv3中语义选择差距的免训练少样本分割方法 | foundation model | ✅ | |
| 6 | LUCID-SAE: Learning Unified Vision-Language Sparse Codes for Interpretable Concept Discovery | 提出LUCID-SAE,学习统一视觉-语言稀疏编码,用于可解释的概念发现。 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Cross-View World Models | 提出跨视角世界模型,通过视角一致性学习环境的3D表征,提升智能体规划能力。 | world model egocentric | ||
| 8 | TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation | TeleBoost:用于高保真、可控和鲁棒视频生成的系统对齐框架 | reinforcement learning instruction following | ||
| 9 | Optimizing Few-Step Generation with Adaptive Matching Distillation | 提出自适应匹配蒸馏(AMD)优化少步生成模型,提升保真度和鲁棒性。 | distillation | ||
| 10 | SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads | 提出SoulX-FlashHead,实现无限实时流式高清逼真说话人头部生成 | distillation spatiotemporal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Thermal odometry and dense mapping using learned odometry and Gaussian splatting | 提出TOM-GS,结合学习的里程计与高斯溅射,实现鲁棒的热成像稠密建图。 | monocular depth gaussian splatting splatting | ||
| 12 | Looking and Listening Inside and Outside: Multimodal Artificial Intelligence Systems for Driver Safety Assessment and Intelligent Vehicle Decision-Making | 提出L-LIO框架,融合视觉与听觉信息,提升驾驶安全评估和智能车辆决策能力 | LIO scene understanding multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation | 提出IM-Animation,通过隐式运动表示实现身份解耦的角色动画 | motion representation character animation |