cs.CV（2026-05-06）

📊 共 37 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (14 🔗7) 支柱二：RL算法与架构 (RL & Architecture) (12 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱四：生成式动作 (Generative Motion) (1) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting	Ilov3Splat：基于高斯溅射的实例级开放词汇3D场景理解框架	3D gaussian splatting gaussian splatting splatting	✅
2	ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection	ScriptHOI：通过学习脚本化状态转移实现开放词汇人-物交互检测	open-vocabulary open vocabulary affordance
3	ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting	ULF-Loc：提出无偏 Landmark 特征，增强基于 3D 高斯 Splatting 的视觉定位鲁棒性	3D gaussian splatting 3DGS gaussian splatting
4	Aes3D: Aesthetic Assessment in 3D Gaussian Splatting	提出Aes3D框架，用于3D高斯溅射场景的美学评估。	3D gaussian splatting 3DGS gaussian splatting
5	QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes	QuadBox：利用几何感知包围盒加速3D高斯溅射渲染	3D gaussian splatting 3DGS gaussian splatting	✅
6	Anny-Fit: All-Age Human Mesh Recovery	Anny-Fit：提出适用于全年龄段的多人三维人体网格重建优化框架	metric depth human mesh recovery HMR	✅
7	Syn4D: A Multiview Synthetic 4D Dataset	Syn4D：用于动态场景四维重建的多视角合成数据集	3D reconstruction scene reconstruction scene understanding
8	CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography	CARD：用于复杂地形下稠密3D重建的多模态汽车数据集	depth estimation 3D reconstruction	✅
9	Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes	Ground4D：面向非结构化越野场景的空间约束前馈4D重建	gaussian splatting splatting TAMP	✅
10	Open-Source Image Editing Models Are Zero-Shot Vision Learners	评估开源图像编辑模型的零-shot视觉学习能力	depth estimation monocular depth scene understanding
11	Example-Based Object Detection	提出EBOD，利用错误样本抑制开放词汇目标检测中的重复误检，无需重训练。	open-vocabulary open vocabulary feature matching	✅
12	Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding	InfoCoordiBridge：面向可靠自动驾驶场景理解的神经符号架构	scene understanding
13	Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection	提出奖励引导语义演化（RGSE），解决测试时自适应目标检测中的语义不对齐问题。	open-vocabulary open vocabulary
14	Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy	HiPR：基于高度引导的投影重参数化方法，提升相机-LiDAR融合的Occupancy预测性能	height map	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
15	LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)	LoViF 2026 PhyScore挑战赛：4D世界模型整体质量评估	world model world models physically plausible
16	OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents	OpenSearch-VL：开源多模态搜索Agent训练方案，提升复杂问题解决能力。	reinforcement learning multimodal visual grounding
17	DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring	DART：用于全面绳索状态监测的视觉-语言基础模型	JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture
18	Deep Reprogramming Distillation for Medical Foundation Models	提出深度重编程蒸馏(DRD)框架，用于医学预训练模型在下游任务上的高效迁移和轻量化部署。	distillation foundation model
19	D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models	提出D-OPSD，用于持续调优步进式蒸馏扩散模型，保持其少步推理能力。	policy learning distillation multimodal
20	Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation	提出直接乘积流匹配(DP-FM)，解耦跨模态对齐的径向和角度动态，提升少样本自适应性能。	flow matching classifier-free guidance
21	Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation	提出基于几何感知的状态空间模型BatMIL，用于提升全切片病理图像的分类精度。	state space model representation learning
22	FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching	提出FlowDIS以解决细粒度图像分割问题	flow matching MAE	✅
23	SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression	提出SAMIC：一种轻量级语义感知Mamba图像压缩方法，提升感知质量和压缩效率。	Mamba state space model	✅
24	Chaotic Contrastive Learning for Robust Texture Classification	提出混沌对比学习框架，提升纹理分类在复杂环境下的鲁棒性	contrastive learning
25	Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding	提出MB2L框架，通过多层双向生物模仿学习提升脑电图到图像的视觉解码性能	representation learning contrastive learning
26	Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring	提出一种自适应曝光控制框架，用于车载非接触式心率监测，提升动态光照下的性能。	predictive model MAE

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
27	MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education	提出MIRAGE以解决医学教育中图像检索与生成问题	large language model multimodal
28	Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data	提出LoRA方法高效适配地理空间基础模型，用于Sentinel-2数据火灾范围制图	foundation model	✅
29	DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning	提出DiffCap-Bench，用于全面、鲁棒地评估图像差异描述任务中的多模态大语言模型	large language model multimodal
30	PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World	PhysForge：生成具有物理属性的3D资产，用于交互式虚拟世界	embodied AI
31	CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization	提出CPCANet，通过深度展开公共主成分分析实现领域泛化	zero-shot transfer	✅
32	When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise	研究视觉语言模型在旋转和噪声下的关系幻觉问题	multimodal
33	From Priors to Perception: Grounding Video-LLMs in Physical Reality	提出PACC和VARC，提升视频大语言模型在物理现实中的推理能力	large language model

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
34	Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling	提出Contact Matrix，通过精确交互建模增强舞蹈动作合成效果	motion synthesis motion generation contact-aware

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
35	InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery	InterMesh：显式交互感知的端到端多人人体网格重建	human-object interaction human mesh recovery HMR

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
36	Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection	Angle-I2P：基于角度一致性分层注意力的跨模态外点剔除方法	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
37	Velox: Learning Representations of 4D Geometry and Appearance	Velox：提出一种学习4D几何和外观表示的框架，用于动态场景理解。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页