cs.CV(2024-11-22)

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs MME-Survey:多模态大语言模型评测的全面综述,旨在促进模型评估与发展。 large language model multimodal
2 VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection VideoEspresso:通过核心帧选择实现细粒度视频推理的大规模CoT数据集 multimodal chain-of-thought
3 Large Multi-modal Models Can Interpret Features in Large Multi-modal Models 提出LMM特征解析框架,利用LMM自身能力理解其内部表征,提升模型可解释性。 multimodal
4 ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos 提出ReVisionLLM,解决小时级视频中时序定位难题 large language model
5 VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models VisGraphVar:用于评估大型视觉语言模型在图分析中变异性的基准生成器 chain-of-thought

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
6 Open-Vocabulary Online Semantic Mapping for SLAM 提出OVO:一种用于SLAM的开放词汇在线语义地图构建方法 semantic mapping semantic map open-vocabulary
7 Benchmarking the Robustness of Optical Flow Estimation to Corruptions 提出光流鲁棒性评测基准,评估模型在常见图像及时序扰动下的性能。 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
8 Context-Aware Multimodal Pretraining 提出上下文感知多模态预训练,提升视觉-语言模型在少样本学习中的适应性。 representation learning contrastive learning multimodal
9 Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization 提出基于贝叶斯优化的对抗学习框架,解决视觉语言导航中细粒度对齐问题。 contrastive learning VLN

⬅️ 返回 cs.CV 首页 · 🏠 返回主页