cs.CV(2026-02-14)
📊 共 18 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱四:生成式动作 (Generative Motion) (2)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱八:物理动画 (Physics-based Animation) (2)
支柱一:机器人控制 (Robot Control) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Skeleton2Stage: Reward-Guided Fine-Tuning for Physically Plausible Dance Generation | 提出Skeleton2Stage,通过强化学习微调扩散模型,提升舞蹈生成中物理合理性 | reinforcement learning reward design motion synthesis | ✅ | |
| 8 | Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings | 提出Embed-RL框架以解决多模态嵌入中的推理驱动问题 | reinforcement learning large language model multimodal | ||
| 9 | A generalizable foundation model for intraoperative understanding across surgical procedures | ZEN:一种通用的术中理解基础模型,可跨多种外科手术泛化 | representation learning distillation scene understanding | ||
| 10 | Prior-guided Hierarchical Instance-pixel Contrastive Learning for Ultrasound Speckle Noise Suppression | 提出先验引导的分层实例-像素对比学习方法,用于超声散斑噪声抑制。 | contrastive learning |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | T2MBench: A Benchmark for Out-of-Distribution Text-to-Motion Generation | 提出T2MBench基准,用于评估文本到动作生成模型在分布外场景的泛化能力 | text-to-motion motion generation | ||
| 12 | VAR-3D: View-aware Auto-Regressive Model for Text-to-3D Generation via a 3D Tokenizer | 提出VAR-3D模型,通过视角感知自回归方法提升文本到3D生成的质量和一致性 | VQ-VAE |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Nighttime Autonomous Driving Scene Reconstruction with Physically-Based Gaussian Splatting | 提出基于物理的高斯点云重建方法以解决夜间自动驾驶场景重建问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 14 | Joint Orientation and Weight Optimization for Robust Watertight Surface Reconstruction via Dirichlet-Regularized Winding Fields | 提出DiWR,通过Dirichlet正则化Winding场实现鲁棒的封闭曲面重建 | 3D gaussian splatting gaussian splatting splatting |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation | EchoTorrent:面向快速、稳定和流式多模态视频生成的新框架 | spatiotemporal multimodal | ||
| 16 | Low-Pass Filtering Improves Behavioral Alignment of Vision Models | 低通滤波显著提升视觉模型与人类视觉行为的一致性 | spatiotemporal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos | 提出基于多尺度动态高斯序列的单目视频4D重建方法 | manipulation physically plausible foundation model |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | RPGD: RANSAC-P3P Gradient Descent for Extrinsic Calibration in 3D Human Pose Estimation | 提出RPGD框架,用于3D人体姿态估计中稳健的外参标定 | human motion |