cs.CV（2025-03-30）

📊 共 18 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (5) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱八：物理动画 (Physics-based Animation) (2) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning	ReasonGrounder：基于LVLM引导的分层特征Splatting用于开放词汇3D视觉定位与推理	3D gaussian splatting gaussian splatting splatting
2	Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction	提出基于空间条件预测的3D高斯溅射压缩方法，显著降低存储和传输成本	3D gaussian splatting 3DGS gaussian splatting
3	PhysPose: Refining 6D Object Poses with Physical Constraints	PhysPose：通过物理约束优化6D物体姿态估计，提升真实场景应用效果	scene reconstruction scene understanding penetration
4	Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries	提出基于模糊边缘表示的深度学习方法，解决光子受限图像的深度估计问题	depth estimation
5	Multiview Image-Based Localization	提出一种混合多视图图像定位方法，提升定位精度、效率和内存占用	scene reconstruction

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model	DFI-OmniStereo：利用预训练深度模型提升全景立体匹配精度	MAE depth estimation monocular depth
7	BoundMatch: Boundary detection applied to semi-supervised segmentation	BoundMatch：提出一种结合边界检测的半监督语义分割框架，提升分割精度。	teacher-student foundation model
8	Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning	研究图像增强对CLIP模型表征的影响，揭示视觉语言模型表征学习的内在机制。	representation learning	✅
9	Reinforcement Learning-based Token Pruning in Vision Transformers: A Markov Game Approach	提出基于强化学习的ViT Token剪枝方法，提升推理速度	reinforcement learning	✅
10	ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models	ViT-Linearizer：通过知识蒸馏将二次复杂度ViT模型转化为线性复杂度视觉模型	Mamba distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
11	OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model	OpenDriveVLA：基于大型视觉语言动作模型的端到端自动驾驶	vision-language-action large language model multimodal
12	EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing	提出EagleVision，一种面向遥感图像对象级属性理解的多模态大语言模型。	large language model multimodal	✅
13	Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging	利用视觉-语言基础模型揭示医学影像中隐藏的属性关系	foundation model
14	KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters	KernelDNA：通过解耦的朴素适配器实现动态卷积核共享，提升效率。	large language model

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
15	MoCha: Towards Movie-Grade Talking Character Synthesis	MoCha：面向电影级对话角色合成，实现逼真、可控的全身角色动画生成	character animation
16	OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition	OwlSight：一种鲁棒的暗光视频人体行为识别光照自适应框架	spatiotemporal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior	VLIPP：利用视觉语言信息物理先验，实现物理上合理的视频生成	physically plausible chain-of-thought	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Learning Predictive Visuomotor Coordination	提出基于预测的视觉运动协调表示(VCR)，用于预测头部姿态、视线和上身运动。	egocentric egocentric vision multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页