cs.CV(2025-04-25)

📊 共 21 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models PerfCam:利用3D高斯溅射和视觉模型的产线数字孪生框架 3D gaussian splatting gaussian splatting splatting
2 Interpretable Affordance Detection on 3D Point Clouds with Probabilistic Prototypes 提出基于概率原型学习的三维点云可解释Affordance检测方法 affordance affordance detection
3 RGS-DR: Deferred Reflections and Residual Shading in 2D Gaussian Splatting RGS-DR:基于2D高斯溅射的延迟反射和残差着色,提升高光效果和材质编辑性。 gaussian splatting splatting
4 STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting 提出STP4D以解决文本到4D生成中的时空一致性问题 gaussian splatting splatting
5 Dense Geometry Supervision for Underwater Depth Estimation 提出水下深度估计的密集几何监督方法,解决水下场景单目深度估计难题 depth estimation monocular depth
6 A Review of 3D Object Detection with Vision-Language Models 综述性分析:基于视觉-语言模型的3D目标检测研究进展 open-vocabulary open vocabulary multimodal
7 LaRI: Layered Ray Intersections for Single-view 3D Geometric Reasoning 提出LaRI:一种用于单视图3D几何推理的分层射线相交方法 depth estimation
8 LiDAR-Guided Monocular 3D Object Detection for Long-Range Railway Monitoring 提出LiDAR引导的单目3D目标检测方法,用于远距离铁路监控 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
9 Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization 提出UV-CoT,通过偏好优化实现无监督视觉思维链推理,提升多模态大模型的视觉理解能力。 large language model multimodal chain-of-thought
10 SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models SORT3D:利用大语言模型进行零样本3D场景理解的空间对象中心推理工具箱 large language model
11 A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection 提出一种混合级联融合网络,利用LiDAR和RGB信息提升3D目标检测性能。 multimodal
12 Revisiting Data Auditing in Large Vision-Language Models 揭示大视觉语言模型数据审计中成员推理的局限性,并探索可行场景 generalist agent large language model visual grounding
13 SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology 提出SSL4Eco全球季节性数据集,提升生态地学基础模型在下游任务的表现。 foundation model
14 From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval 提出两阶段框架以解决零-shot组合图像检索问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
15 S3MOT: Monocular 3D Object Tracking with Selective State Space Model S3MOT:基于选择性状态空间模型的单目3D目标跟踪 state space model contrastive learning spatiotemporal
16 Co-Training with Active Contrastive Learning and Meta-Pseudo-Labeling on 2D Projections for Deep Semi-Supervised Learning 提出active-DeepFA,结合主动对比学习与元伪标签,提升半监督图像分类在小样本生物图像上的性能。 contrastive learning teacher-student
17 TSCL:Multi-party loss Balancing scheme for deep learning Image steganography based on Curriculum learning 提出TSCL:一种基于课程学习的深度学习图像隐写多方损失平衡方案 curriculum learning

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
18 Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation Eval3D:一种可解释的细粒度3D生成评估工具 geometric consistency large language model foundation model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
19 ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding ActionArt:提出细粒度人本视频理解的多模态大模型方法 human-object interaction multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
20 RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation 提出RSRNav,通过推理空间关系解决图像目标导航中的方向信息缺失和视角不一致问题 egocentric spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization 提出COCO-Inpaint基准,用于图像修复篡改检测与定位研究。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页