cs.CV（2026-02-17）

📊 共 20 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (12 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (5) 支柱二：RL算法与架构 (RL & Architecture) (2 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Emergent Morphing Attack Detection in Open Multi-modal Large Language Models	利用开放多模态大语言模型实现人脸融合攻击的零样本检测	large language model multimodal
2	Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models	提出R3框架，解决多模态模型生成与理解能力优化困境	multimodal	✅
3	Concept-Enhanced Multimodal RAG: Towards Interpretable and Accurate Radiology Report Generation	提出概念增强多模态RAG框架CEMRAG，提升放射报告生成的可解释性和准确性	multimodal
4	CREMD: Crowd-Sourced Emotional Multimodal Dogs Dataset	提出CREMD数据集，用于研究不同模态信息和标注者特征对犬类情感识别的影响	multimodal
5	Effective and Robust Multimodal Medical Image Analysis	提出MAIL和Robust-MAIL网络，用于有效且鲁棒的多模态医学图像分析。	multimodal	✅
6	Training-Free Zero-Shot Anomaly Detection in 3D Brain MRI with 2D Foundation Models	提出一种基于2D预训练模型的3D脑MRI无训练零样本异常检测方法	foundation model
7	Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation	提出检索增强框架，提升LLM在视觉-语言导航中的效率与稳定性	VLN large language model
8	Revealing and Enhancing Core Visual Regions: Harnessing Internal Attention Dynamics for Hallucination Mitigation in LVLMs	提出PADE：利用内部注意力动态增强视觉核心区域，缓解LVLM幻觉问题	multimodal visual grounding
9	VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation	VideoSketcher：利用预训练视频模型实现多功能序列草图生成	large language model
10	Meteorological data and Sky Images meets Neural Models for Photovoltaic Power Forecasting	结合气象数据、天空图像与深度模型，提升光伏发电功率预测精度	multimodal
11	GMAIL: Generative Modality Alignment for generated Image Learning	GMAIL：生成模态对齐框架，提升生成图像在视觉-语言任务中的利用率	multimodal
12	Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs	Sparrow：面向视频LLM推断加速，提出文本锚定窗口注意力机制	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
13	SAM 3D Body: Robust Full-Body Human Mesh Recovery	提出SAM 3D Body以解决单图像全身3D人类网格恢复问题	sam 3D SAM 3D human mesh recovery
14	Semantic-Guided 3D Gaussian Splatting for Transient Object Removal	提出语义引导的3D高斯溅射方法，用于移除多视角重建中的瞬态物体	3D gaussian splatting 3DGS gaussian splatting
15	DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles	DAV-GSWT：利用扩散先验和主动视图采样，高效生成高保真高斯溅射Wang Tiles	3D gaussian splatting gaussian splatting splatting
16	NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy	NeRFscopy：提出基于神经辐射场的内窥镜体内时变组织三维重建方法	NeRF neural radiance field
17	Criteria-first, semantics-later: reproducible structure discovery in image-based sciences	提出“准则优先，语义后置”框架，解决图像科学中可复现结构发现问题	semantic mapping semantic map

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Language and Geometry Grounded Sparse Voxel Representations for Holistic Scene Understanding	提出语言与几何结合的稀疏体素表示以提升场景理解	distillation scene understanding open-vocabulary
19	EventMemAgent: Hierarchical Event-Centric Memory for Online Video Understanding with Adaptive Tool Use	提出EventMemAgent，利用分层事件中心记忆和自适应工具使用解决在线视频理解问题。	reinforcement learning large language model multimodal	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Automatic Funny Scene Extraction from Long-form Cinematic Videos	提出一种自动提取长视频电影中幽默场景的端到端系统，提升用户互动。	HuMoR multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页