cs.CV(2025-02-20)
📊 共 21 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | CoSyn:利用代码引导的合成多模态数据生成,提升文本丰富图像理解能力 | large language model multimodal | ||
| 9 | Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models | 提出Multimodal RewardBench,用于全面评估视觉语言模型奖励模型 | multimodal | ✅ | |
| 10 | Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts | 提出TimeTravel基准,用于评估LMMs在历史文化文物理解上的能力。 | multimodal | ✅ | |
| 11 | PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models | 提出PLPHP以解决大规模视觉语言模型的推理效率问题 | multimodal | ||
| 12 | Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well | 提出MultiCOS框架,融合多模态信息提升伪装目标分割性能 | multimodal | ✅ | |
| 13 | Evaluating Precise Geolocation Inference Capabilities of Vision Language Models | 评估视觉语言模型在精确地理位置推断方面的能力 | foundation model | ||
| 14 | LLM-EvRep: Learning an LLM-Compatible Event Representation Using a Self-Supervised Framework | 提出LLM-EvRep,利用自监督框架学习LLM兼容的事件表示,提升事件识别性能 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining | 提出基于Triplet Mining的自监督单目深度估计方法,提升反射表面深度估计鲁棒性 | distillation depth estimation monocular depth | ||
| 16 | Vision Foundation Models in Medical Image Analysis: Advances and Challenges | 综述医学影像分析中视觉基础模型的研究进展与挑战,聚焦分割任务。 | distillation foundation model | ||
| 17 | Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing | 提出结构解耦特征场蒸馏方法,用于三维理解与编辑 | distillation | ||
| 18 | LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models | LongWriter-V:通过长文本SFT数据集和迭代DPO,实现视觉语言模型中的超长且高保真生成。 | DPO direct preference optimization | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Designing Parameter and Compute Efficient Diffusion Transformers using Distillation | 利用知识蒸馏设计参数和计算高效的Diffusion Transformer,适用于边缘设备。 | Apple Vision Pro distillation | ||
| 20 | Fostering Inclusion: A Virtual Reality Experience to Raise Awareness of Dyslexia-Related Barriers in University Settings | 提出一种基于VR的共情体验,旨在提升大学环境中对阅读障碍相关障碍的认知。 | locomotion |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Daily Land Surface Temperature Reconstruction in Landsat Cross-Track Areas Using Deep Ensemble Learning With Uncertainty Quantification | 提出DELAG深度集成学习方法,重建Landsat跨轨区域高时空分辨率地表温度,并量化不确定性。 | spatiotemporal |