cs.CV(2025-12-26)

📊 共 11 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱八:物理动画 (Physics-based Animation) (2) 支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception iSHIFT:轻量级自适应感知慢-快GUI代理,提升交互效率与精度 large language model multimodal visual grounding
2 See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning 提出双向感知塑形方法以提升多模态推理能力 multimodal
3 Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models 提出BadVSFM,针对Prompt驱动的视频分割基础模型的后门攻击框架。 foundation model
4 Perceive and Calibrate: Analyzing and Enhancing Robustness of Medical Multi-Modal Large Language Models 提出Inherent-enhanced Multi-modal Calibration (IMC)框架,提升医学多模态大语言模型在噪声环境下的鲁棒性。 large language model
5 SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis SLIM-Brain:一种数据和训练高效的fMRI分析基础模型 foundation model
6 Training-free Conditional Image Embedding Framework Leveraging Large Vision Language Models 提出DIOR:一种免训练的条件图像嵌入框架,利用大型视觉语言模型。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
7 Patch as Node: Human-Centric Graph Representation Learning for Multimodal Action Recognition 提出PAN框架,通过人体中心图表示学习实现更有效的多模态动作识别。 representation learning spatiotemporal multimodal
8 Yume-1.5: A Text-Controlled Interactive World Generation Model Yume-1.5:一种文本控制的交互式世界生成模型,提升实时性和可控性。 linear attention distillation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
9 End-to-End 3D Spatiotemporal Perception with Multimodal Fusion and V2X Collaboration 提出XET-V2X,用于V2X协同中多模态融合的端到端3D时空感知。 spatiotemporal multimodal
10 LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration LongFly:针对长程无人机视觉-语言导航,提出时空上下文融合框架 spatiotemporal VLN multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
11 Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer 提出Reloc-VGGT,利用几何约束Transformer实现鲁棒高效的视觉重定位 VGGT spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页