cs.CV(2026-02-17)

📊 共 19 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Emergent Morphing Attack Detection in Open Multi-modal Large Language Models 利用开放多模态大语言模型实现人脸融合攻击的零样本检测 large language model multimodal
2 Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models 提出R3框架,解决多模态模型生成与理解能力优化困境 multimodal
3 Concept-Enhanced Multimodal RAG: Towards Interpretable and Accurate Radiology Report Generation 提出概念增强多模态RAG框架CEMRAG,提升放射报告生成的可解释性和准确性 multimodal
4 CREMD: Crowd-Sourced Emotional Multimodal Dogs Dataset 提出CREMD数据集,用于研究不同模态信息和标注者特征对犬类情感识别的影响 multimodal
5 Effective and Robust Multimodal Medical Image Analysis 提出MAIL和Robust-MAIL网络,用于有效且鲁棒的多模态医学图像分析。 multimodal
6 Training-Free Zero-Shot Anomaly Detection in 3D Brain MRI with 2D Foundation Models 提出一种基于2D预训练模型的3D脑MRI无训练零样本异常检测方法 foundation model
7 Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation 提出检索增强框架,提升LLM在视觉-语言导航中的效率与稳定性 VLN large language model
8 Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs 提出PADE:利用内部注意力动态增强视觉核心区域,缓解LVLM幻觉问题 multimodal visual grounding
9 VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation VideoSketcher:利用预训练视频模型实现多功能序列草图生成 large language model
10 Meteorological data and Sky Images meets Neural Models for Photovoltaic Power Forecasting 结合气象数据、天空图像与深度模型,提升光伏发电功率预测精度 multimodal
11 GMAIL: Generative Modality Alignment for generated Image Learning GMAIL:生成模态对齐框架,提升生成图像在视觉-语言任务中的利用率 multimodal
12 Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs Sparrow:面向视频LLM推断加速,提出文本锚定窗口注意力机制 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
13 Semantic-Guided 3D Gaussian Splatting for Transient Object Removal 提出语义引导的3D高斯溅射方法,用于移除多视角重建中的瞬态物体 3D gaussian splatting 3DGS gaussian splatting
14 DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles DAV-GSWT:利用扩散先验和主动视图采样,高效生成高保真高斯溅射Wang Tiles 3D gaussian splatting gaussian splatting splatting
15 NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy NeRFscopy:提出基于神经辐射场的内窥镜体内时变组织三维重建方法 NeRF neural radiance field
16 Criteria-first, semantics-later: reproducible structure discovery in image-based sciences 提出“准则优先,语义后置”框架,解决图像科学中可复现结构发现问题 semantic mapping semantic map

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
17 Language and Geometry Grounded Sparse Voxel Representations for Holistic Scene Understanding 提出语言与几何结合的稀疏体素表示以提升场景理解 distillation scene understanding open-vocabulary
18 EventMemAgent: Hierarchical Event-Centric Memory for Online Video Understanding with Adaptive Tool Use 提出EventMemAgent,利用分层事件中心记忆和自适应工具使用解决在线视频理解问题。 reinforcement learning large language model multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
19 Automatic Funny Scene Extraction from Long-form Cinematic Videos 提出一种自动提取长视频电影中幽默场景的端到端系统,提升用户互动。 HuMoR multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页