cs.CV(2026-01-26)

📊 共 18 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (7 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (7) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
1 A multimodal vision foundation model for generalizable knee pathology OrthoFoundation:用于膝关节病理泛化的多模态视觉基础模型 contrastive learning foundation model multimodal
2 Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting Splat-Portrait:基于高斯溅射的通用说话人头部生成方法 distillation gaussian splatting splatting
3 GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning 提出GenAgent以解决多模态生成与理解的高成本问题 reinforcement learning multimodal
4 HomoFM: Deep Homography Estimation with Flow Matching HomoFM:利用流匹配的深度单应性估计,提升精度与鲁棒性 flow matching multimodal
5 QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding 提出QualiRAG,一种免训练的检索增强生成框架,用于视觉质量理解。 reinforcement learning spatiotemporal multimodal
6 Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning 提出MinkUNeXt-VINE,利用Matryoshka表征学习实现低成本LiDAR在葡萄园中的高效定位 representation learning
7 \textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation NaVIDA:通过逆动力学增强的视觉-语言导航框架,提升导航稳定性和泛化性。 policy learning VLN

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception 提出Q-Bench-Portrait,用于评估多模态大语言模型在人像图像质量感知方面的能力。 large language model multimodal
9 AGSP-DSA: An Adaptive Graph Signal Processing Framework for Robust Multimodal Fusion with Dynamic Semantic Alignment 提出AGSP-DSA框架,通过动态语义对齐实现鲁棒的多模态数据融合 multimodal
10 DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment DisasterInsight:提出一个多模态基准,用于功能感知和有依据的灾害评估。 multimodal
11 Fair-Eye Net: A Fair, Trustworthy, Multimodal Integrated Glaucoma Full Chain AI System Fair-Eye Net:一个公平、可信的多模态集成青光眼全链AI系统 multimodal
12 MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models MindCine:利用大规模预训练模型实现多模态脑电到视频的重建 multimodal
13 V-Loop: Visual Logical Loop Verification for Hallucination Detection in Medical Visual Question Answering 提出V-Loop以解决医疗视觉问答中的幻觉检测问题 large language model multimodal
14 ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks ARMOR:利用智能体推理编排和重参数化方法,实现鲁棒对抗攻击 large language model

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
15 Spatial-Conditioned Reasoning in Long-Egocentric Videos 通过空间条件推理提升长时第一视角视频的视觉导航能力 egocentric
16 Agentic Very Long Video Understanding EGAgent:基于实体场景图的Agentic超长视频理解框架 egocentric large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
17 Multimodal Privacy-Preserving Entity Resolution with Fully Homomorphic Encryption 提出基于全同态加密的多模态隐私保护实体识别框架,解决高合规性行业数据异构难题。 OMOMO multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 PPISP: Physically-Plausible Compensation and Control of Photometric Variations in Radiance Field Reconstruction 提出PPISP,通过物理可信的ISP补偿与控制解决辐射场重建中的光度变化问题 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页