cs.CV(2025-04-02)

📊 共 19 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis 提出GS-Diff,利用扩散模型引导高斯溅射,解决大规模无约束3D重建与新视角合成问题。 monocular depth 3D gaussian splatting 3DGS
2 BOGausS: Better Optimized Gaussian Splatting BOGausS:通过优化训练流程,显著减小3D高斯模型的尺寸且不损失质量。 3D gaussian splatting 3DGS gaussian splatting
3 UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting UAVTwin:利用高斯溅射为无人机创建神经数字孪生,实现数据增强 3D gaussian splatting 3DGS gaussian splatting
4 Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting GaussianLSS:基于高斯溅射的深度不确定性估计,提升BEV感知性能。 depth estimation gaussian splatting splatting
5 FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking FIORD:用于3D场景重建和基准测试的鱼眼室内外数据集,包含激光雷达真值 gaussian splatting splatting NeRF
6 CoMatcher: Multi-View Collaborative Feature Matching 提出CoMatcher,解决复杂场景下多视角协同特征匹配问题 scene understanding feature matching
7 Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness Ross3D:通过3D感知重建视觉指令调优,提升3D场景理解能力 scene understanding multimodal
8 Scene-Centric Unsupervised Panoptic Segmentation 提出场景中心无监督全景分割方法,无需目标中心训练数据,提升复杂场景理解。 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
9 GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning GMAI-VL-R1:利用强化学习提升多模态医学推理能力 reinforcement learning multimodal
10 UniViTAR: Unified Vision Transformer with Native Resolution UniViTAR:面向多模态统一和原生分辨率的视觉Transformer基础模型 curriculum learning distillation foundation model
11 SpaceR: Reinforcing MLLMs in Video Spatial Reasoning SpaceR:通过强化学习提升MLLM在视频空间推理中的能力 reinforcement learning large language model multimodal
12 Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation 提出基于Transformer-GCN双流模型的单目3D人体姿态估计方法,提升泛化性。 representation learning distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
13 Aligned Better, Listen Better for Audio-Visual Large Language Models Dolphin:提出对齐更优、听觉更佳的音视频大语言模型 large language model multimodal
14 Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities 提出UniRES++,统一解决多粒度指代表达分割任务,并构建大规模数据集MRES-32M。 large language model multimodal visual grounding
15 AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization 提出AdPO以增强大规模视觉语言模型的对抗鲁棒性 large language model
16 On Data Synthesis and Post-training for Visual Abstract Reasoning 提出数据合成与后训练方法,显著提升大模型在抽象视觉推理任务上的性能 multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
17 FreSca: Scaling in Frequency Space Enhances Diffusion Models FreSca:频域空间缩放增强扩散模型,实现精细解耦控制 manipulation depth estimation classifier-free guidance

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
18 LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering LSC-ADL:通过半自动聚类生成的生活日志活动标注数据集,提升检索解释性。 egocentric egocentric vision

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
19 A Diffusion-Based Framework for Occluded Object Movement DiffOOM:基于扩散模型的图像遮挡物体移动框架 latent optimization

⬅️ 返回 cs.CV 首页 · 🏠 返回主页