cs.CV(2025-04-12)
📊 共 16 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (6)
支柱二:RL算法与架构 (RL & Architecture) (2)
支柱一:机器人控制 (Robot Control) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification | SDIGLM:利用大型语言模型和多模态思维链进行结构损伤识别 | large language model chain-of-thought | ||
| 8 | REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis | REMEMBER:一种基于检索、可解释的多模态证据引导模型,用于零样本和少样本神经退行性疾病诊断。 | multimodal | ||
| 9 | DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models | 提出权重分解低秩量化感知训练(DL-QAT),高效量化大型语言模型。 | large language model | ||
| 10 | seg2med: a bridge from artificial anatomy to multimodal medical images | Seg2Med:构建人工解剖学到多模态医学影像的桥梁 | multimodal | ||
| 11 | VideoAds for Fast-Paced Video Understanding | VideoAds:用于快节奏视频理解的多模态大语言模型基准数据集 | large language model | ||
| 12 | FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment | 提出大规模人脸视频质量评估数据集FVQ-20K及基于LMM的评估方法FVQ-Rater | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks | PathVLM-R1:基于强化学习的病理视觉语言推理模型,提升诊断准确性和泛化性 | reinforcement learning multimodal | ||
| 14 | UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance | 提出UniFlowRestore,通过流匹配和提示引导实现通用视频修复框架 | flow matching |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting | BIGS:基于单目视频和3D高斯溅射的双手无类别交互重建 | bi-manual distillation 3D gaussian splatting |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Using Vision Language Models for Safety Hazard Identification in Construction | 提出基于视觉语言模型的建筑工地安全隐患识别框架,提升情境感知能力。 | spatial relationship |