cs.CV(2026-05-27)

📊 共 5 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
1 OphIn-500K: Curating Web-Scale Visual Instructions for Scaling Ophthalmic Multimodal Large Language Models 提出OphIn-Engine与OphIn-500K,用于扩展眼科多模态大语言模型 large language model multimodal chain-of-thought
2 Personal Visual Memory from Explicit and Implicit Evidence 提出VisualMem,利用显式和隐式视觉证据增强个性化AI代理的长期视觉记忆。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
3 Bayesian Gated Non-Negative Contrastive Learning 提出BayesNCL,通过贝叶斯门控非负对比学习解决表征纠缠问题。 representation learning contrastive learning
4 VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning 提出VCap,利用超几何奖励进行弱监督到强监督的视觉描述生成。 distillation reward design

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
5 When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness? 研究多模态大模型中图像工具交互对越狱攻击鲁棒性的影响因素 manipulation multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页