cs.CV（2026-06-05）

📊 共 26 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (2) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization	提出AnchorWorld以解决交互式世界建模的可控性问题	world model world models egocentric
2	EgoPressDiff: Multimodal Video Diffusion for Egocentric UV-Domain Hand-Pressure Estimation	提出EgoPressDiff以解决手部接触压力估计问题	MAE egocentric multimodal	✅
3	Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis	提出Multi-FRuGaL框架以解决癌症诊断中的多模态数据缺失问题	representation learning multimodal
4	MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism	提出MemDreamer以解决长视频理解中的感知与推理问题	dreamer spatiotemporal multimodal
5	STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation	提出STREAM框架以解决数字病理图像生成中的条件崩溃问题	flow matching foundation model
6	VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation	提出VideoSEG-O3框架以解决视频目标分割中的推理问题	reinforcement learning chain-of-thought	✅
7	Lighting-Aware Representation Learning under Controllable Lighting Variation	提出照明感知表示学习框架以解决光照变化问题	representation learning contrastive learning
8	Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment	提出Native3D以解决传统3D场景生成中的2D适配问题	contrastive learning spatial relationship

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
9	GuideCAD: A Lightweight Multimodal Framework for 3D CAD Model Generation via Prefix Embedding	提出GuideCAD以解决3D CAD模型生成的计算资源问题	large language model multimodal	✅
10	From Vision to Text: A Compact Multimodal Approach for Robust, Cross-Domain Presentation Attack Detection on ID Cards	提出紧凑的多模态模型以解决身份证的跨域展示攻击检测问题	multimodal
11	Unified Safe In-context Image Generation in Multimodal Diffusion Transformers via Restricting Unsafe Information Flows	提出统一视觉安全调节器以解决多模态扩散变换器中的安全生成问题	multimodal	✅
12	Seeing Without Exposing: Adaptive Privacy Control for Open-World, Context-Hungry MLLMs	提出Anchored Privacy Drifting以解决多模态大语言模型隐私问题	large language model multimodal
13	Textual Supervision Enhances Geospatial Representations in Vision-Language Models	通过文本监督提升视觉语言模型的地理空间表示能力	foundation model multimodal
14	When Recovery Matters: The Blind Spot of Surrogate Privacy in MLLM Editing	提出SPPE以解决多模态大语言模型编辑中的隐私保护问题	large language model multimodal
15	SVHighlights: Towards Extremely Long Sport Video Highlight Detection	提出SVHighlights以解决长视频高亮检测问题	large language model multimodal
16	Don't Pause: Streaming Video-Language Synchrony for Online Video Understanding	提出Streaming Video-Language Synchrony以解决实时视频理解中的同步问题	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
17	EvoGS: Constructing Continuous-Layered Gaussian Splatting with Evolution Tree for Scalable 3D Streaming	提出EvoGS以解决3D流媒体中的冗余和质量过渡问题	3D gaussian splatting gaussian splatting splatting	✅
18	Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors	提出Stream3D-VLM以解决在线3D空间理解问题	scene understanding multimodal
19	CL-CLIP: CLIP-Based Continual Learning Framework with Cost-Volume Category Decoupling for Object Detection	提出CL-CLIP框架以解决持续对象检测中的灾难性遗忘问题	open-vocabulary open vocabulary

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
20	LARA: Latent Action Representation Alignment for Vision-Language-Action Models	提出LARA框架以解决VLA模型训练中的数据不足问题	manipulation vision-language-action VLA
21	Detecting Temporally Localized Manipulations in Authentic Video Streams	提出针对真实视频流的局部操控检测方法	manipulation	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Watch, Remember, Reason: Human-View Video Understanding with MLLMs	提出人视角的多模态大语言模型以解决视频理解问题	egocentric large language model multimodal	✅
23	OpenGlass: Open-Source Smart Glasses for On-Device Event-Based Gesture Recognition	提出OpenGlass以解决智能眼镜的手势识别问题	egocentric multimodal

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
24	DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection	提出DRIFT以解决AI生成图像检测中的鲁棒性差距问题	physically plausible foundation model
25	LUCID: Learning Unified Control for Image Deflaring and Exposure Mastery in Nighttime Photography	提出LUCID框架以解决夜间摄影中的光晕和曝光问题	classifier-free guidance

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Spatial-Temporal Decoupled Adapter for Micro-gesture Online Recognition	提出空间-时间解耦适配器以解决微手势在线识别问题	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页