cs.CV(2024-11-22)
📊 共 9 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (2)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | MME-Survey:多模态大语言模型评测的全面综述,旨在促进模型评估与发展。 | large language model multimodal | ||
| 2 | VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection | VideoEspresso:通过核心帧选择实现细粒度视频推理的大规模CoT数据集 | multimodal chain-of-thought | ✅ | |
| 3 | Large Multi-modal Models Can Interpret Features in Large Multi-modal Models | 提出LMM特征解析框架,利用LMM自身能力理解其内部表征,提升模型可解释性。 | multimodal | ||
| 4 | ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos | 提出ReVisionLLM,解决小时级视频中时序定位难题 | large language model | ✅ | |
| 5 | VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models | VisGraphVar:用于评估大型视觉语言模型在图分析中变异性的基准生成器 | chain-of-thought |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Open-Vocabulary Online Semantic Mapping for SLAM | 提出OVO:一种用于SLAM的开放词汇在线语义地图构建方法 | semantic mapping semantic map open-vocabulary | ||
| 7 | Benchmarking the Robustness of Optical Flow Estimation to Corruptions | 提出光流鲁棒性评测基准,评估模型在常见图像及时序扰动下的性能。 | optical flow | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Context-Aware Multimodal Pretraining | 提出上下文感知多模态预训练,提升视觉-语言模型在少样本学习中的适应性。 | representation learning contrastive learning multimodal | ||
| 9 | Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization | 提出基于贝叶斯优化的对抗学习框架,解决视觉语言导航中细粒度对齐问题。 | contrastive learning VLN |