cs.CV(2025-02-20)

📊 共 21 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models GS-Cache:用于大规模高斯溅射模型的缓存加速推理框架 3D gaussian splatting 3DGS gaussian splatting
2 OrchardDepth: Precise Metric Depth Estimation of Orchard Scene from Monocular Camera Images OrchardDepth:单目相机果园场景精确度量深度估计 depth estimation monocular depth metric depth
3 Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion 提出一种单目透明物体深度估计与分割的迭代语义几何融合框架 depth estimation monocular depth
4 OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving 提出OG-Gaussian,利用Occupancy Grid重建自动驾驶场景,降低成本并提升效率。 3D gaussian splatting 3DGS gaussian splatting
5 Learning Temporal 3D Semantic Scene Completion via Optical Flow Guidance FlowScene:利用光流引导的时序3D语义场景补全方法 optical flow
6 CrossOver: 3D Scene Cross-Modal Alignment CrossOver:提出一种灵活的跨模态3D场景对齐框架,用于解决多模态数据不完整和未对齐问题。 scene understanding
7 LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera 提出LXLv2以解决LiDAR排除的3D目标检测问题 occupancy grid

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation CoSyn:利用代码引导的合成多模态数据生成,提升文本丰富图像理解能力 large language model multimodal
9 Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models 提出Multimodal RewardBench,用于全面评估视觉语言模型奖励模型 multimodal
10 Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts 提出TimeTravel基准,用于评估LMMs在历史文化文物理解上的能力。 multimodal
11 PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models 提出PLPHP以解决大规模视觉语言模型的推理效率问题 multimodal
12 Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well 提出MultiCOS框架,融合多模态信息提升伪装目标分割性能 multimodal
13 Evaluating Precise Geolocation Inference Capabilities of Vision Language Models 评估视觉语言模型在精确地理位置推断方面的能力 foundation model
14 LLM-EvRep: Learning an LLM-Compatible Event Representation Using a Self-Supervised Framework 提出LLM-EvRep,利用自监督框架学习LLM兼容的事件表示,提升事件识别性能 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
15 Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining 提出基于Triplet Mining的自监督单目深度估计方法,提升反射表面深度估计鲁棒性 distillation depth estimation monocular depth
16 Vision Foundation Models in Medical Image Analysis: Advances and Challenges 综述医学影像分析中视觉基础模型的研究进展与挑战,聚焦分割任务。 distillation foundation model
17 Structurally Disentangled Feature Fields Distillation for 3D Understanding and Editing 提出结构解耦特征场蒸馏方法,用于三维理解与编辑 distillation
18 LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models LongWriter-V:通过长文本SFT数据集和迭代DPO,实现视觉语言模型中的超长且高保真生成。 DPO direct preference optimization

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 Designing Parameter and Compute Efficient Diffusion Transformers using Distillation 利用知识蒸馏设计参数和计算高效的Diffusion Transformer,适用于边缘设备。 Apple Vision Pro distillation
20 Fostering Inclusion: A Virtual Reality Experience to Raise Awareness of Dyslexia-Related Barriers in University Settings 提出一种基于VR的共情体验,旨在提升大学环境中对阅读障碍相关障碍的认知。 locomotion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 Daily Land Surface Temperature Reconstruction in Landsat Cross-Track Areas Using Deep Ensemble Learning With Uncertainty Quantification 提出DELAG深度集成学习方法,重建Landsat跨轨区域高时空分辨率地表温度,并量化不确定性。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页