cs.CV(2026-06-05)

📊 共 26 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization 提出AnchorWorld以解决交互式世界建模的可控性问题 world model world models egocentric
2 EgoPressDiff: Multimodal Video Diffusion for Egocentric UV-Domain Hand-Pressure Estimation 提出EgoPressDiff以解决手部接触压力估计问题 MAE egocentric multimodal
3 Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis 提出Multi-FRuGaL框架以解决癌症诊断中的多模态数据缺失问题 representation learning multimodal
4 MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism 提出MemDreamer以解决长视频理解中的感知与推理问题 dreamer spatiotemporal multimodal
5 STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation 提出STREAM框架以解决数字病理图像生成中的条件崩溃问题 flow matching foundation model
6 VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation 提出VideoSEG-O3框架以解决视频目标分割中的推理问题 reinforcement learning chain-of-thought
7 Lighting-Aware Representation Learning under Controllable Lighting Variation 提出照明感知表示学习框架以解决光照变化问题 representation learning contrastive learning
8 Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment 提出Native3D以解决传统3D场景生成中的2D适配问题 contrastive learning spatial relationship

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
9 GuideCAD: A Lightweight Multimodal Framework for 3D CAD Model Generation via Prefix Embedding 提出GuideCAD以解决3D CAD模型生成的计算资源问题 large language model multimodal
10 From Vision to Text: A Compact Multimodal Approach for Robust, Cross-Domain Presentation Attack Detection on ID Cards 提出紧凑的多模态模型以解决身份证的跨域展示攻击检测问题 multimodal
11 Unified Safe In-context Image Generation in Multimodal Diffusion Transformers via Restricting Unsafe Information Flows 提出统一视觉安全调节器以解决多模态扩散变换器中的安全生成问题 multimodal
12 Seeing Without Exposing: Adaptive Privacy Control for Open-World, Context-Hungry MLLMs 提出Anchored Privacy Drifting以解决多模态大语言模型隐私问题 large language model multimodal
13 Textual Supervision Enhances Geospatial Representations in Vision-Language Models 通过文本监督提升视觉语言模型的地理空间表示能力 foundation model multimodal
14 When Recovery Matters: The Blind Spot of Surrogate Privacy in MLLM Editing 提出SPPE以解决多模态大语言模型编辑中的隐私保护问题 large language model multimodal
15 SVHighlights: Towards Extremely Long Sport Video Highlight Detection 提出SVHighlights以解决长视频高亮检测问题 large language model multimodal
16 Don't Pause: Streaming Video-Language Synchrony for Online Video Understanding 提出Streaming Video-Language Synchrony以解决实时视频理解中的同步问题 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
17 EvoGS: Constructing Continuous-Layered Gaussian Splatting with Evolution Tree for Scalable 3D Streaming 提出EvoGS以解决3D流媒体中的冗余和质量过渡问题 3D gaussian splatting gaussian splatting splatting
18 Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors 提出Stream3D-VLM以解决在线3D空间理解问题 scene understanding multimodal
19 CL-CLIP: CLIP-Based Continual Learning Framework with Cost-Volume Category Decoupling for Object Detection 提出CL-CLIP框架以解决持续对象检测中的灾难性遗忘问题 open-vocabulary open vocabulary

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
20 LARA: Latent Action Representation Alignment for Vision-Language-Action Models 提出LARA框架以解决VLA模型训练中的数据不足问题 manipulation vision-language-action VLA
21 Detecting Temporally Localized Manipulations in Authentic Video Streams 提出针对真实视频流的局部操控检测方法 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
22 Watch, Remember, Reason: Human-View Video Understanding with MLLMs 提出人视角的多模态大语言模型以解决视频理解问题 egocentric large language model multimodal
23 OpenGlass: Open-Source Smart Glasses for On-Device Event-Based Gesture Recognition 提出OpenGlass以解决智能眼镜的手势识别问题 egocentric multimodal

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
24 DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection 提出DRIFT以解决AI生成图像检测中的鲁棒性差距问题 physically plausible foundation model
25 LUCID: Learning Unified Control for Image Deflaring and Exposure Mastery in Nighttime Photography 提出LUCID框架以解决夜间摄影中的光晕和曝光问题 classifier-free guidance

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
26 Spatial-Temporal Decoupled Adapter for Micro-gesture Online Recognition 提出空间-时间解耦适配器以解决微手势在线识别问题 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页