cs.CV(2025-12-31)
📊 共 19 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression | Splatwizard:用于3D高斯溅射压缩的综合基准测试工具包 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 9 | FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM | FoundationSLAM:利用深度基础模型实现端到端稠密视觉SLAM | visual SLAM geometric consistency foundation model | ||
| 10 | Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark | 提出Spatial4D-Bench,用于全面评估多模态大语言模型在4D空间智能方面的能力。 | scene understanding spatial relationship spatiotemporal | ||
| 11 | Projection-based Adversarial Attack using Physics-in-the-Loop Optimization for Monocular Depth Estimation | 提出基于物理环路优化的投影对抗攻击,用于单目深度估计 | depth estimation monocular depth | ||
| 12 | HaineiFRDM: Explore Diffusion to Restore Defects in Fast-Movement Films | 提出HaineiFRDM,利用扩散模型修复快速移动影片中的缺陷。 | optical flow |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning | UniC-Lift:通过对比学习实现统一的3D实例分割 | contrastive learning 3D gaussian splatting 3DGS | ||
| 14 | TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model | TeleWorld:基于4D世界模型的动态多模态实时合成框架 | world model distillation scene reconstruction | ||
| 15 | VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition | VideoCuRL:提出正交难度分解的视频课程强化学习,提升视频理解能力。 | reinforcement learning optical flow spatiotemporal | ||
| 16 | PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation | 提出PhyGDPO框架,通过物理感知的群体偏好优化实现物理一致的文本生成视频。 | direct preference optimization chain-of-thought | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands | ShowUI-$π$:提出基于Flow的生成模型,实现GUI界面的灵巧操作。 | manipulation dexterous hand dexterous manipulation | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression | 提出一种分层矢量量化隐变量的感知低分辨率视频压缩方法,适用于带宽受限场景。 | VQ-VAE spatiotemporal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction | GaMO:基于几何感知的多视角扩散外绘用于稀疏视角3D重建 | geometric consistency | ✅ |