cs.CV(2025-03-30)

📊 共 18 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (5) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
1 ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning ReasonGrounder:基于LVLM引导的分层特征Splatting用于开放词汇3D视觉定位与推理 3D gaussian splatting gaussian splatting splatting
2 Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction 提出基于空间条件预测的3D高斯溅射压缩方法,显著降低存储和传输成本 3D gaussian splatting 3DGS gaussian splatting
3 PhysPose: Refining 6D Object Poses with Physical Constraints PhysPose:通过物理约束优化6D物体姿态估计,提升真实场景应用效果 scene reconstruction scene understanding penetration
4 Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries 提出基于模糊边缘表示的深度学习方法,解决光子受限图像的深度估计问题 depth estimation
5 Multiview Image-Based Localization 提出一种混合多视图图像定位方法,提升定位精度、效率和内存占用 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
6 Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model DFI-OmniStereo:利用预训练深度模型提升全景立体匹配精度 MAE depth estimation monocular depth
7 BoundMatch: Boundary detection applied to semi-supervised segmentation BoundMatch:提出一种结合边界检测的半监督语义分割框架,提升分割精度。 teacher-student foundation model
8 Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning 研究图像增强对CLIP模型表征的影响,揭示视觉语言模型表征学习的内在机制。 representation learning
9 Reinforcement Learning-based Token Pruning in Vision Transformers: A Markov Game Approach 提出基于强化学习的ViT Token剪枝方法,提升推理速度 reinforcement learning
10 ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models ViT-Linearizer:通过知识蒸馏将二次复杂度ViT模型转化为线性复杂度视觉模型 Mamba distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
11 OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model OpenDriveVLA:基于大型视觉语言动作模型的端到端自动驾驶 vision-language-action large language model multimodal
12 EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing 提出EagleVision,一种面向遥感图像对象级属性理解的多模态大语言模型。 large language model multimodal
13 Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging 利用视觉-语言基础模型揭示医学影像中隐藏的属性关系 foundation model
14 KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters KernelDNA:通过解耦的朴素适配器实现动态卷积核共享,提升效率。 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
15 MoCha: Towards Movie-Grade Talking Character Synthesis MoCha:面向电影级对话角色合成,实现逼真、可控的全身角色动画生成 character animation
16 OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition OwlSight:一种鲁棒的暗光视频人体行为识别光照自适应框架 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
17 VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior VLIPP:利用视觉语言信息物理先验,实现物理上合理的视频生成 physically plausible chain-of-thought

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
18 Learning Predictive Visuomotor Coordination 提出基于预测的视觉运动协调表示(VCR),用于预测头部姿态、视线和上身运动。 egocentric egocentric vision multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页