cs.CV(2026-01-02)
📊 共 8 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Grading Handwritten Engineering Exams with Multimodal Large Language Models | 提出多模态大语言模型以解决手写工程考试评分问题 | large language model multimodal | ||
| 2 | Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection | 探索多模态大语言模型在音频深度伪造检测中的可行性 | large language model multimodal | ||
| 3 | Modality Dominance-Aware Optimization for Embodied RGB-Infrared Perception | 提出模态支配感知优化框架,解决具身RGB-IR感知中的模态不对称问题。 | multimodal | ||
| 4 | AEGIS: Exploring the Limit of World Knowledge Capabilities for Unified Mulitmodal Models | AEGIS:探索统一多模态模型世界知识能力的极限 | multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | SafeMo: Linguistically Grounded Unlearning for Trustworthy Text-to-Motion Generation | 提出SafeMo以解决文本到运动生成中的安全性问题 | text-to-motion motion generation VQ-VAE | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction | AdaGaR:自适应Gabor表示用于动态场景重建,提升细节捕捉与时间连续性。 | depth estimation scene reconstruction | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | HyperPriv-EPN: Hypergraph Learning with Privileged Knowledge for Ependymoma Prognosis | HyperPriv-EPN:利用特权知识的超图学习用于室管膜瘤预后 | distillation privileged information multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | DynaDrag: Dynamic Drag-Style Image Editing by Motion Prediction | DynaDrag:基于运动预测的动态拖拽式图像编辑方法 | manipulation |