cs.CV(2026-04-21)

📊 共 36 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (14 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6) 支柱四:生成式动作 (Generative Motion) (3) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (14 篇)

#题目一句话要点标签🔗
1 InHabit: Leveraging Image Foundation Models for Scalable 3D Human Placement InHabit:利用图像基础模型实现可扩展的3D人体放置 scene reconstruction physically plausible human-scene interaction
2 An Object-Centered Data Acquisition Method for 3D Gaussian Splatting using Mobile Phones 提出一种基于手机的物体中心3D高斯溅射数据采集方法 3D gaussian splatting 3DGS gaussian splatting
3 GRAFT: Geometric Refinement and Fitting Transformer for Human Scene Reconstruction 提出GRAFT,通过几何优化和拟合Transformer实现高质量人体-场景重建 scene reconstruction physically plausible penetration
4 BALTIC: A Benchmark and Cross-Domain Strategy for 3D Reconstruction Across Air and Underwater Domains Under Varying Illumination BALTIC:针对水空跨域和变光照条件下的三维重建基准与策略 3D gaussian splatting 3D reconstruction gaussian splatting
5 AdaGScale: Viewpoint-Adaptive Gaussian Scaling in 3D Gaussian Splatting to Reduce Gaussian-Tile Pairs AdaGScale:视角自适应高斯缩放,减少3D高斯溅射中的高斯-瓦片对数量 3D gaussian splatting gaussian splatting splatting
6 Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval Diff-SBSR:学习多模态特征增强的扩散模型,用于零样本草图的三维形状检索 open-vocabulary open vocabulary multimodal
7 CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation CoCo-SAM3:利用概念冲突解决开放词汇语义分割问题 open-vocabulary open vocabulary
8 TransSplat: Unbalanced Semantic Transport for Language-Driven 3DGS Editing TransSplat:通过非平衡语义传输实现语言驱动的3DGS编辑 3D gaussian splatting 3DGS gaussian splatting
9 Evaluation of Winning Solutions of 2025 Low Power Computer Vision Challenge LPCVC 2025挑战赛优胜方案评测:推动低功耗计算机视觉发展 depth estimation monocular depth open-vocabulary
10 RAFT-MSF++: Temporal Geometry-Motion Feature Fusion for Self-Supervised Monocular Scene Flow RAFT-MSF++:时序几何-运动特征融合的自监督单目场景流估计 scene flow
11 Paparazzo: Active Mapping of Moving 3D Objects Paparazzo:主动映射移动3D物体,实现动态场景精确重建 3D reconstruction scene understanding
12 Face Anything: 4D Face Reconstruction from Any Image Sequence 提出基于规范面部点预测的4D人脸重建方法,解决动态人脸重建中的几何和对应关系歧义问题。 depth estimation
13 TESO: Online Tracking of Essential Matrix by Stochastic Optimization TESO:基于随机优化的本质矩阵在线跟踪,用于立体相机长期标定。 stereo depth
14 Explore Like Humans: Autonomous Exploration with Online SG-Memo Construction for Embodied Agents ABot-Explorer:利用在线SG-Memo构建实现类人自主探索 affordance

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
15 Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks 提出StepSTEM基准,用于细粒度评估多模态LLM在STEM任务中的推理链 large language model multimodal chain-of-thought
16 Seeing Candidates at Scale: Multimodal LLMs for Visual Political Communication on Instagram 利用多模态LLM分析Instagram政治宣传:提升视觉政治传播分析能力 large language model multimodal
17 Benchmarking Vision Foundation Models for Domain-Generalizable Face Anti-Spoofing 提出基于自监督视觉Transformer的人脸反欺骗高效基线方法 foundation model multimodal
18 DR-MMSearchAgent: Deepening Reasoning in Multimodal Search Agents DR-MMSearchAgent:通过加深推理解决多模态搜索Agent中的交互崩溃问题。 multimodal
19 How Far Are Video Models from True Multimodal Reasoning? 提出CLVG-Bench评估框架,揭示视频模型在多模态推理上的局限性 multimodal
20 A Multi-Agent Framework with Structured Reasoning and Reflective Refinement for Multimodal Empathetic Response Generation 提出一种多智能体框架,通过结构化推理和反思精炼提升多模态情感共鸣回复生成效果。 multimodal
21 Bridging Foundation Models and ASTM Metallurgical Standards for Automated Grain Size Estimation from Microscopy Images 提出一种基于Cellpose-SAM的自动化晶粒尺寸估计方法,桥接基础模型与ASTM标准。 foundation model
22 Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval 提出Air-Know,解决Composed Image Retrieval中的噪声三元组对应问题 large language model multimodal
23 Deep sprite-based image models: An analysis 提出深度Sprite图像分解模型,解决图像中重复模式识别难题,实现可解释的无监督分割。 foundation model
24 DINO Eats CLIP: Adapting Beyond Knowns for Open-set 3D Object Retrieval 提出DINO Eats CLIP框架,通过动态多视角融合和虚拟特征合成,提升开放集3D物体检索性能。 foundation model
25 The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation 提出稳定性-多样性平衡机制,提升视觉-语言导航中自提升Agent的性能 VLN

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
26 SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model SpanVLA:通过负样本恢复学习和高效动作桥接,提升视觉-语言-动作模型性能 flow matching vision-language-action VLA
27 AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model AnyRecon:利用视频扩散模型实现任意视角下的三维重建 distillation 3D reconstruction geometric consistency
28 PanDA: Unsupervised Domain Adaptation for Multimodal 3D Panoptic Segmentation in Autonomous Driving PanDA:面向自动驾驶多模态3D全景分割的无监督领域自适应框架 representation learning multimodal
29 Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding 提出Volume Transformer (Volt),用于提升3D场景理解的通用性和可扩展性。 distillation scene understanding
30 HP-Edit: A Human-Preference Post-Training Framework for Image Editing HP-Edit:面向图像编辑的人类偏好后训练框架,提升生成质量。 reinforcement learning RLHF DPO
31 PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment PortraitDirector:提出一种用于可控和实时面部重演的分层解耦框架 distillation motion latent

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
32 EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation 提出EgoMotion框架,解决以视觉语言为条件的自我中心视角人体运动生成难题。 motion synthesis motion generation physically plausible
33 CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation CoInteract:通过空间结构化协同生成实现物理一致的人-物交互视频合成 physically plausible penetration human-object interaction
34 A Network-Aware Evaluation of Distributed Energy Resource Control in Smart Distribution Systems 针对智能配电系统中分布式能源控制的网络感知评估框架 penetration

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
35 EgoSelf: From Memory to Personalized Egocentric Assistant EgoSelf:构建个性化第一人称视角助手,利用图记忆实现长期用户行为建模。 egocentric first-person view

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
36 Generative Texture Filtering 提出一种生成式纹理滤波方法,利用预训练生成模型提升纹理去除效果。 structure preservation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页