cs.CV(2025-04-02)
📊 共 19 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (8)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1)
支柱一:机器人控制 (Robot Control) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis | 提出GS-Diff,利用扩散模型引导高斯溅射,解决大规模无约束3D重建与新视角合成问题。 | monocular depth 3D gaussian splatting 3DGS | ||
| 2 | BOGausS: Better Optimized Gaussian Splatting | BOGausS:通过优化训练流程,显著减小3D高斯模型的尺寸且不损失质量。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 3 | UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting | UAVTwin:利用高斯溅射为无人机创建神经数字孪生,实现数据增强 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 4 | Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting | GaussianLSS:基于高斯溅射的深度不确定性估计,提升BEV感知性能。 | depth estimation gaussian splatting splatting | ||
| 5 | FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking | FIORD:用于3D场景重建和基准测试的鱼眼室内外数据集,包含激光雷达真值 | gaussian splatting splatting NeRF | ||
| 6 | CoMatcher: Multi-View Collaborative Feature Matching | 提出CoMatcher,解决复杂场景下多视角协同特征匹配问题 | scene understanding feature matching | ||
| 7 | Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness | Ross3D:通过3D感知重建视觉指令调优,提升3D场景理解能力 | scene understanding multimodal | ||
| 8 | Scene-Centric Unsupervised Panoptic Segmentation | 提出场景中心无监督全景分割方法,无需目标中心训练数据,提升复杂场景理解。 | scene understanding |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | GMAI-VL-R1:利用强化学习提升多模态医学推理能力 | reinforcement learning multimodal | ✅ | |
| 10 | UniViTAR: Unified Vision Transformer with Native Resolution | UniViTAR:面向多模态统一和原生分辨率的视觉Transformer基础模型 | curriculum learning distillation foundation model | ||
| 11 | SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | SpaceR:通过强化学习提升MLLM在视频空间推理中的能力 | reinforcement learning large language model multimodal | ✅ | |
| 12 | Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation | 提出基于Transformer-GCN双流模型的单目3D人体姿态估计方法,提升泛化性。 | representation learning distillation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Aligned Better, Listen Better for Audio-Visual Large Language Models | Dolphin:提出对齐更优、听觉更佳的音视频大语言模型 | large language model multimodal | ||
| 14 | Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | 提出UniRES++,统一解决多粒度指代表达分割任务,并构建大规模数据集MRES-32M。 | large language model multimodal visual grounding | ✅ | |
| 15 | AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization | 提出AdPO以增强大规模视觉语言模型的对抗鲁棒性 | large language model | ||
| 16 | On Data Synthesis and Post-training for Visual Abstract Reasoning | 提出数据合成与后训练方法,显著提升大模型在抽象视觉推理任务上的性能 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | FreSca: Scaling in Frequency Space Enhances Diffusion Models | FreSca:频域空间缩放增强扩散模型,实现精细解耦控制 | manipulation depth estimation classifier-free guidance |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering | LSC-ADL:通过半自动聚类生成的生活日志活动标注数据集,提升检索解释性。 | egocentric egocentric vision |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | A Diffusion-Based Framework for Occluded Object Movement | DiffOOM:基于扩散模型的图像遮挡物体移动框架 | latent optimization |