cs.CV(2025-06-18)

📊 共 21 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Demystifying the Visual Quality Paradox in Multimodal Large Language Models 提出VQ-TTT以解决多模态大语言模型的视觉质量悖论问题 large language model multimodal
2 Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning 提出MRG-LLM以解决医学影像报告生成问题 large language model multimodal
3 DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder 提出DM-FNet以解决多模态医学图像融合质量不足的问题 multimodal
4 MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering 提出MEGC2025以解决微表情识别与理解问题 large language model multimodal
5 ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections 提出ReSeDis以解决大规模图像集合中的基于描述的物体搜索问题 multimodal visual grounding
6 Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning 提出ViMaR以解决视觉语言模型生成低置信度问题 visual grounding
7 A Strong View-Free Baseline Approach for Single-View Image Guided Point Cloud Completion 提出一种无视角图像引导的单视图点云补全基线方法 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
8 RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories 提出RA-NeRF以解决复杂轨迹下相机姿态估计问题 3D gaussian splatting 3DGS gaussian splatting
9 BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion 提出重建无关的在线框架以解决实时3D物体检测问题 open-vocabulary open vocabulary embodied AI
10 RaCalNet: Radar Calibration Network for Sparse-Supervised Metric Depth Estimation 提出RaCalNet以解决稀疏监督下的深度估计问题 depth estimation monocular depth metric depth
11 MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning 提出MapFM以解决高精度地图生成问题 semantic map foundation model
12 Implicit 3D scene reconstruction using deep learning towards efficient collision understanding in autonomous driving 提出基于深度学习的隐式3D场景重建方法以提升自动驾驶中的碰撞理解 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 Show-o2: Improved Native Unified Multimodal Models 提出Show-o2以提升多模态理解与生成能力 flow matching multimodal
14 Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation 提出弱监督部分对比学习以解决视觉语言导航中的动态视角问题 contrastive learning embodied AI VLN
15 video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models 提出video-SALMONN 2以解决视频描述与问答问题 DPO large language model
16 FedWSIDD: Federated Whole Slide Image Classification via Dataset Distillation 提出FedWSIDD以解决WSI分类中的隐私与资源异构问题 predictive model distillation

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
17 HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization 提出HOIDiNi以解决人机交互生成中的真实感与物理准确性问题 human-object interaction HOI
18 Privacy-Preserving Chest X-ray Classification in Latent Space with Homomorphically Encrypted Neural Inference 提出同态加密神经推理框架以保护胸部X光图像隐私 OMOMO

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
19 GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects 提出GenHOI以解决4D人机交互合成中的物体泛化问题 motion synthesis contact-aware human-object interaction

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
20 Unsupervised Pelage Pattern Unwrapping for Animal Re-identification 提出几何感知纹理映射以解决动物重识别中的皮毛模式扭曲问题 feature matching geometric consistency

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 FindingDory: A Benchmark to Evaluate Memory in Embodied Agents 提出FindingDory基准以评估具身智能体的记忆能力 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页