cs.CV(2026-04-28)

📊 共 22 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 Combating Visual Neglect and Semantic Drift in Large Multimodal Models for Enhanced Cross-Modal Retrieval 提出SSA-ME框架,通过显著性感知建模解决LMMs在跨模态检索中的视觉忽视和语义漂移问题。 representation learning contrastive learning multimodal
2 OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding 提出OmniVTG数据集和自校正CoT训练范式,提升开放世界视频时序定位性能 reinforcement learning large language model multimodal
3 TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual Media TopoMamba:面向异构医学视觉媒体分割的拓扑感知扫描与融合框架 Mamba SSM
4 A Systematic Post-Train Framework for Video Generation 提出视频生成后训练框架,提升生成质量、时序一致性与指令遵循能力。 reinforcement learning RLHF instruction following
5 Improving Diversity in Black-box Few-shot Knowledge Distillation 提出自适应多样性黑盒少样本知识蒸馏方法,提升学生模型精度 distillation
6 Vision SmolMamba: Spike-Guided Token Pruning for Energy-Efficient Spiking State-Space Vision Models 提出Vision SmolMamba,通过脉冲引导的token剪枝实现高效脉冲状态空间视觉模型 Mamba
7 DualGeo: A Dual-View Framework for Worldwide Image Geo-localization DualGeo:用于全球图像地理定位的双视角框架,提升定位精度。 contrastive learning multimodal
8 The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation 经典知识蒸馏方法在语义分割任务上表现出惊人的有效性 distillation
9 DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing DDA-Thinker:解耦双原子强化学习,用于推理驱动的图像编辑 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
10 Toward Multimodal Conversational AI for Age-Related Macular Degeneration OcularChat:用于年龄相关性黄斑变性的多模态对话式AI large language model multimodal
11 M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering 提出M$^3$-VQA基准,用于评估多模态大语言模型在细粒度多实体多跳推理上的能力。 large language model multimodal
12 Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models 提出基于再生成的图像优化方法,增大修改空间,提升统一多模态模型性能。 multimodal
13 The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents 提出基于递归稀疏推理的混合专家扩散模型,提升多模态图像生成性能 multimodal
14 SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring 提出SIEVES,通过视觉证据评分实现选择性预测的泛化能力提升。 large language model multimodal
15 QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding QCalEval:首个量子校准图理解的视觉-语言模型评测基准 multimodal
16 GeoSearch: Augmenting Worldwide Geolocalization with Web-Scale Reverse Image Search and Image Matching GeoSearch:利用Web级反向图像搜索增强全球地理定位 multimodal

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
17 IAM: Identity-Aware Human Motion and Shape Joint Generation 提出IAM:身份感知的人体运动与体型联合生成框架,提升运动真实性。 motion generation human motion human motion generation
18 HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation 提出HuM-Eval以解决人类动作视频评估的挑战 motion generation human motion human motion generation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 Rapid tracking through strongly scattering media with physics-informed neuromorphic speckle analysis 提出基于物理信息的神经形态散斑分析,实现强散射介质中的快速目标追踪 motion tracking motion estimation spatiotemporal
20 Personalized Cross-Modal Emotional Correlation Learning for Speech-Preserving Facial Expression Manipulation 提出个性化跨模态情感关联学习算法,用于语音保留的面部表情操控。 manipulation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
21 Generalizable Human Gaussian Splatting via Multi-view Semantic Consistency 提出基于多视角语义一致性的通用人体高斯溅射方法,提升稀疏视角下的渲染质量。 gaussian splatting splatting

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 Exploring Remote Photoplethysmography for Neonatal Pain Detection from Facial Videos 提出基于rPPG的新生儿面部视频疼痛非接触式检测方法 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页