cs.CV（2024-11-22）

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (2)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	MME-Survey：多模态大语言模型评测的全面综述，旨在促进模型评估与发展。	large language model multimodal
2	VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection	VideoEspresso：通过核心帧选择实现细粒度视频推理的大规模CoT数据集	multimodal chain-of-thought	✅
3	Large Multi-modal Models Can Interpret Features in Large Multi-modal Models	提出LMM特征解析框架，利用LMM自身能力理解其内部表征，提升模型可解释性。	multimodal
4	ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos	提出ReVisionLLM，解决小时级视频中时序定位难题	large language model	✅
5	VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models	VisGraphVar：用于评估大型视觉语言模型在图分析中变异性的基准生成器	chain-of-thought

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Open-Vocabulary Online Semantic Mapping for SLAM	提出OVO：一种用于SLAM的开放词汇在线语义地图构建方法	semantic mapping semantic map open-vocabulary
7	Benchmarking the Robustness of Optical Flow Estimation to Corruptions	提出光流鲁棒性评测基准，评估模型在常见图像及时序扰动下的性能。	optical flow	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Context-Aware Multimodal Pretraining	提出上下文感知多模态预训练，提升视觉-语言模型在少样本学习中的适应性。	representation learning contrastive learning multimodal
9	Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization	提出基于贝叶斯优化的对抗学习框架，解决视觉语言导航中细粒度对齐问题。	contrastive learning VLN

⬅️ 返回 cs.CV 首页 · 🏠 返回主页