cs.CV(2026-05-01)

📊 共 24 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱四:生成式动作 (Generative Motion) (2) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (2 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations 提出LIMSSR框架,解决训练时多模态数据不完整情况下的序列到评分推理问题 large language model multimodal
2 UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors UniVidX:基于扩散先验的统一多模态视频生成框架 multimodal
3 Let ViT Speak: Generative Language-Image Pre-training GenLIP:面向多模态大语言模型的生成式语言-图像预训练框架 large language model multimodal
4 High-Speed Vision Improves Zero-Shot Semantic Understanding of Human Actions 高帧率视频提升零样本语义理解在高速人类动作识别中的性能 large language model
5 Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs 提出持久视觉记忆(PVM)模块,解决LVLM深度生成中的视觉信号稀释问题。 multimodal
6 Make Your LVLM KV Cache More Lightweight LightKV:通过提示引导的跨模态压缩,减少LVLM中KV缓存的内存占用。 large language model
7 BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis BlenderRAG:通过检索增强的代码合成实现高保真3D对象生成 multimodal
8 MMAudio-LABEL: Audio Event Labeling via Audio Generation for Silent Video 提出MMAudio-LABEL框架,通过联合生成音频和事件标签,提升无声视频的音频事件标注性能。 multimodal
9 Scaling Video Understanding via Compact Latent Multi-Agent Collaboration 提出MACF:通过紧凑潜在多智能体协作实现可扩展的视频理解 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
10 2D-SuGaR: Surface-Aware Gaussian Splatting for Geometrically Accurate Mesh Reconstruction 提出2D-SuGaR,利用单目深度和法向量先验提升2D高斯溅射的几何重建精度。 monocular depth 3D gaussian splatting 3DGS
11 GOR-IS: 3D Gaussian Object Removal in the Intrinsic Space 提出GOR-IS,在内参空间实现3D高斯模型的物体移除与光照一致性修复。 3D gaussian splatting 3DGS gaussian splatting
12 Foundation AI Models for Aerosol Optical Depth Estimation from PACE Satellite Data 提出ViTCG,利用Transformer和通道分组进行气溶胶光学厚度估计,显著降低误差。 depth estimation foundation model
13 Modeling Subjective Urban Perception with Human Gaze 提出Place Pulse-Gaze数据集和Gaze-Guided框架,利用人类注视建模主观城市感知。 scene understanding PULSE multimodal
14 Pose-Aware Diffusion for 3D Generation 提出姿态感知扩散模型PAD,用于生成姿态对齐的3D物体,解决空间错位和变换歧义问题。 monocular depth scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
15 Beyond Heuristics: Learnable Density Control for 3D Gaussian Splatting LeGS:基于强化学习的可学习密度控制,提升3D高斯溅射渲染质量 reinforcement learning 3D gaussian splatting 3DGS
16 Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis 提出FAST和SFP,用于压缩CT图像的资源高效医学图像分析 contrastive learning distillation spatiotemporal
17 Posterior Augmented Flow Matching 提出后验增强Flow Matching,解决高维图像生成中Flow Collapse问题 flow matching
18 Online Self-Calibration Against Hallucination in Vision-Language Models 提出OSCAR框架,在线自校准视觉-语言模型中的幻觉问题 direct preference optimization multimodal

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
19 PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation PhysiGen:集成碰撞感知物理约束,实现高保真的人人交互生成 motion synthesis penetration multi-person interaction
20 Robust Fusion of Object-Level V2X for Learned 3D Object Detection 提出噪声感知训练策略,提升V2X融合3D目标检测在噪声环境下的鲁棒性 penetration

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
21 Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation 提出Colorful-Noise,通过无训练的低频噪声操控实现彩色条件图像生成。 manipulation
22 From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models 提出iERF中心统一框架,实现视觉模型局部、全局和机制可解释性 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
23 MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation 提出MMAudioReverbs,利用视频引导的声学建模进行解混响和房间脉冲响应估计 PULSE
24 CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection CMTA:利用跨模态时间伪影实现通用AI生成视频检测 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页