cs.CV(2026-04-24)

📊 共 30 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (11) 支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
1 EvFlow-GS: Event Enhanced Motion Deblurring with Optical Flow for 3D Gaussian Splatting EvFlow-GS:利用事件相机与光流增强的3D高斯溅射运动去模糊 3D gaussian splatting 3DGS 3D reconstruction
2 Flow4DGS-SLAM: Optical Flow-Guided 4D Gaussian Splatting SLAM 提出光流引导的4D高斯溅射SLAM,解决动态环境下SLAM重建难题 3D gaussian splatting 3DGS gaussian splatting
3 PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement PAGaS:像素对齐的单自由度高斯溅射用于深度优化 stereo depth 3D reconstruction gaussian splatting
4 NRGS: Neural Regularization for Robust 3D Semantic Gaussian Splatting 提出神经正则化方法NRGS,提升3D语义高斯溅射的鲁棒性与准确性 gaussian splatting splatting foundation model
5 ReLIC-SGG: Relation Lattice Completion for Open-Vocabulary Scene Graph Generation 提出ReLIC-SGG框架,解决开放词汇场景图生成中关系不完整问题。 open-vocabulary open vocabulary
6 CAGE-SGG: Counterfactual Active Graph Evidence for Open-Vocabulary Scene Graph Generation 提出基于反事实主动图证据的CAGE-SGG框架,解决开放词汇场景图生成中的可靠性问题。 open-vocabulary open vocabulary
7 Depth-Aware Rover: A Study of Edge AI and Monocular Vision for Real-World Implementation 提出基于边缘AI和单目视觉的月球车导航方案,适用于真实环境部署 depth estimation monocular depth metric depth
8 Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond 提出Holo360D以解决全景3D重建中的轨迹不连续问题 3D reconstruction
9 Long-tail Internet photo reconstruction 提出MegaDepth-X数据集和稀疏采样策略,提升长尾互联网照片三维重建效果 3D reconstruction foundation model
10 Railway Artificial Intelligence Learning Benchmark (RAIL-BENCH): A Benchmark Suite for Perception in the Railway Domain RAIL-BENCH:铁路领域首个感知学习基准测试套件,促进自动驾驶列车发展 visual odometry
11 GenMatter: Perceiving Physical Objects with Generative Matter Models GenMatter:提出基于生成物质模型的物理对象感知方法,统一解决多种场景下的运动分割问题。 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
12 MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models 提出MTT-Bench,利用多模态大语言模型预测小鼠社会支配等级 large language model foundation model multimodal
13 Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset 提出UniVLT:基于开放域视觉-语言数据集的统一交通基础模型,提升城市交通安全。 foundation model multimodal
14 Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction? 探索自然域预训练模型在加速心脏MRI重建中的有效性 foundation model
15 CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding 提出CGC框架,提升MLLM在细粒度多图理解中的性能,解决空间幻觉等问题。 large language model multimodal chain-of-thought
16 Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data 提出多模态扩散模型,用于偏振光与低分辨率EBSD数据互补增强。 multimodal
17 Towards Temporal Compositional Reasoning in Long-Form Sports Videos 提出SportsTime基准和CoTR方法,解决长时体育视频中时序组合推理难题 large language model multimodal
18 SS3D: End2End Self-Supervised 3D from Web Videos 提出SS3D,一种基于网络视频的端到端自监督3D估计预训练框架 zero-shot transfer
19 CharTide: Data-Centric Chart-to-Code Generation via Tri-Perspective Tuning and Inquiry-Driven Evolution CharTide:通过三视角调优和查询驱动演化实现数据为中心的图表到代码生成 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
20 Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings 提出RIME框架,通过重写驱动的多模态嵌入,提升生成式多模态嵌入的检索性能。 reinforcement learning large language model multimodal
21 OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space OccDirector:提出一种语言驱动的4D occupancy空间行为与交互生成框架,用于自动驾驶仿真。 world model world models physically plausible
22 PoseFM: Relative Camera Pose Estimation Through Flow Matching PoseFM:通过Flow Matching实现相对相机位姿估计,提升单目视觉里程计的鲁棒性。 flow matching visual odometry
23 Distilling Vision Transformers for Distortion-Robust Representation Learning 提出一种基于知识蒸馏的视觉Transformer,提升模型在图像失真下的鲁棒性 representation learning distillation
24 PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views 提出PASR,通过姿态感知和分析-合成优化,解决单视角遮挡下的3D形状检索问题。 contrastive learning foundation model
25 Efficient Diffusion Distillation via Embedding Loss 提出Embedding Loss,加速扩散模型蒸馏,提升生成质量,降低计算资源需求。 distillation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
26 SpaMEM: Benchmarking Dynamic Spatial Reasoning via Perception-Memory Integration in Embodied Environments 提出SpaMEM基准,评估具身环境中基于感知-记忆整合的动态空间推理能力。 egocentric large language model multimodal
27 EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges EV-CLIP:高效视觉提示适配CLIP,解决弱光、视角变化下的少样本动作识别 egocentric

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
28 Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models 提出基于Transformer的交互运动生成模型,解决人际互动场景下的动作预测问题 motion generation human motion human motion generation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
29 Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis 构建用于会话姿态分析的多模态双人互动数据集Inter-Stance dyadic interaction multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
30 FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing 提出FlowAnchor,稳定编辑信号,实现免反演的视频编辑 structure preservation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页