cs.CV(2026-05-06)

📊 共 37 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (14 🔗7) 支柱二:RL算法与架构 (RL & Architecture) (12 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (14 篇)

#题目一句话要点标签🔗
1 Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting Ilov3Splat:基于高斯溅射的实例级开放词汇3D场景理解框架 3D gaussian splatting gaussian splatting splatting
2 ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection ScriptHOI:通过学习脚本化状态转移实现开放词汇人-物交互检测 open-vocabulary open vocabulary affordance
3 ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting ULF-Loc:提出无偏 Landmark 特征,增强基于 3D 高斯 Splatting 的视觉定位鲁棒性 3D gaussian splatting 3DGS gaussian splatting
4 Aes3D: Aesthetic Assessment in 3D Gaussian Splatting 提出Aes3D框架,用于3D高斯溅射场景的美学评估。 3D gaussian splatting 3DGS gaussian splatting
5 QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes QuadBox:利用几何感知包围盒加速3D高斯溅射渲染 3D gaussian splatting 3DGS gaussian splatting
6 Anny-Fit: All-Age Human Mesh Recovery Anny-Fit:提出适用于全年龄段的多人三维人体网格重建优化框架 metric depth human mesh recovery HMR
7 Syn4D: A Multiview Synthetic 4D Dataset Syn4D:用于动态场景四维重建的多视角合成数据集 3D reconstruction scene reconstruction scene understanding
8 CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography CARD:用于复杂地形下稠密3D重建的多模态汽车数据集 depth estimation 3D reconstruction
9 Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes Ground4D:面向非结构化越野场景的空间约束前馈4D重建 gaussian splatting splatting TAMP
10 Open-Source Image Editing Models Are Zero-Shot Vision Learners 评估开源图像编辑模型的零-shot视觉学习能力 depth estimation monocular depth scene understanding
11 Example-Based Object Detection 提出EBOD,利用错误样本抑制开放词汇目标检测中的重复误检,无需重训练。 open-vocabulary open vocabulary feature matching
12 Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding InfoCoordiBridge:面向可靠自动驾驶场景理解的神经符号架构 scene understanding
13 Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection 提出奖励引导语义演化(RGSE),解决测试时自适应目标检测中的语义不对齐问题。 open-vocabulary open vocabulary
14 Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy HiPR:基于高度引导的投影重参数化方法,提升相机-LiDAR融合的Occupancy预测性能 height map

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
15 LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore) LoViF 2026 PhyScore挑战赛:4D世界模型整体质量评估 world model world models physically plausible
16 OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents OpenSearch-VL:开源多模态搜索Agent训练方案,提升复杂问题解决能力。 reinforcement learning multimodal visual grounding
17 DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring DART:用于全面绳索状态监测的视觉-语言基础模型 JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture
18 Deep Reprogramming Distillation for Medical Foundation Models 提出深度重编程蒸馏(DRD)框架,用于医学预训练模型在下游任务上的高效迁移和轻量化部署。 distillation foundation model
19 D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models 提出D-OPSD,用于持续调优步进式蒸馏扩散模型,保持其少步推理能力。 policy learning distillation multimodal
20 Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation 提出直接乘积流匹配(DP-FM),解耦跨模态对齐的径向和角度动态,提升少样本自适应性能。 flow matching classifier-free guidance
21 Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation 提出基于几何感知的状态空间模型BatMIL,用于提升全切片病理图像的分类精度。 state space model representation learning
22 FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching 提出FlowDIS以解决细粒度图像分割问题 flow matching MAE
23 SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression 提出SAMIC:一种轻量级语义感知Mamba图像压缩方法,提升感知质量和压缩效率。 Mamba state space model
24 Chaotic Contrastive Learning for Robust Texture Classification 提出混沌对比学习框架,提升纹理分类在复杂环境下的鲁棒性 contrastive learning
25 Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding 提出MB2L框架,通过多层双向生物模仿学习提升脑电图到图像的视觉解码性能 representation learning contrastive learning
26 Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring 提出一种自适应曝光控制框架,用于车载非接触式心率监测,提升动态光照下的性能。 predictive model MAE

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
27 MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education 提出MIRAGE以解决医学教育中图像检索与生成问题 large language model multimodal
28 Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data 提出LoRA方法高效适配地理空间基础模型,用于Sentinel-2数据火灾范围制图 foundation model
29 DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning 提出DiffCap-Bench,用于全面、鲁棒地评估图像差异描述任务中的多模态大语言模型 large language model multimodal
30 PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World PhysForge:生成具有物理属性的3D资产,用于交互式虚拟世界 embodied AI
31 CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization 提出CPCANet,通过深度展开公共主成分分析实现领域泛化 zero-shot transfer
32 When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise 研究视觉语言模型在旋转和噪声下的关系幻觉问题 multimodal
33 From Priors to Perception: Grounding Video-LLMs in Physical Reality 提出PACC和VARC,提升视频大语言模型在物理现实中的推理能力 large language model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
34 Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling 提出Contact Matrix,通过精确交互建模增强舞蹈动作合成效果 motion synthesis motion generation contact-aware

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
35 InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery InterMesh:显式交互感知的端到端多人人体网格重建 human-object interaction human mesh recovery HMR

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
36 Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection Angle-I2P:基于角度一致性分层注意力的跨模态外点剔除方法 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
37 Velox: Learning Representations of 4D Geometry and Appearance Velox:提出一种学习4D几何和外观表示的框架,用于动态场景理解。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页