cs.CV(2026-05-27)
📊 共 5 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (2)
支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | OphIn-500K: Curating Web-Scale Visual Instructions for Scaling Ophthalmic Multimodal Large Language Models | 提出OphIn-Engine与OphIn-500K,用于扩展眼科多模态大语言模型 | large language model multimodal chain-of-thought | ||
| 2 | Personal Visual Memory from Explicit and Implicit Evidence | 提出VisualMem,利用显式和隐式视觉证据增强个性化AI代理的长期视觉记忆。 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 3 | Bayesian Gated Non-Negative Contrastive Learning | 提出BayesNCL,通过贝叶斯门控非负对比学习解决表征纠缠问题。 | representation learning contrastive learning | ✅ | |
| 4 | VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning | 提出VCap,利用超几何奖励进行弱监督到强监督的视觉描述生成。 | distillation reward design |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness? | 研究多模态大模型中图像工具交互对越狱攻击鲁棒性的影响因素 | manipulation multimodal |