cs.CV(2026-05-13)
📊 共 38 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (14 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (7)
支柱四:生成式动作 (Generative Motion) (5 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗2)
支柱一:机器人控制 (Robot Control) (3)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | GuardMarkGS: Unified Ownership Tracing and Edit Deterrence for 3D Gaussian Splatting | GuardMarkGS:针对3D高斯溅射的统一所有权追踪与编辑威慑框架 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 16 | HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization | 提出HarmoGS,通过冲突感知梯度调和实现复杂场景下鲁棒的3D高斯溅射 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 17 | OCH3R: Object-Centric Holistic 3D Reconstruction | OCH3R:单目RGB图像物体中心整体3D重建框架 | depth estimation monocular depth metric depth | ||
| 18 | Z-Order Transformer for Feed-Forward Gaussian Splatting | 提出基于Z-Order Transformer的前馈高斯溅射方法,加速高质量新视角合成。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 19 | Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs | 提出SurgMLLM,通过多模态大语言模型统一手术场景理解中的推理与分割。 | scene understanding large language model multimodal | ||
| 20 | RoSplat: Robust Feed-Forward Pixel-wise Gaussian Splatting for Varying Input Views and High-Resolution Rendering | RoSplat:提出鲁棒的前馈像素级高斯溅射,解决视角变化和高分辨率渲染问题 | 3D gaussian splatting gaussian splatting splatting | ||
| 21 | PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World | 提出PanoWorld,通过球面空间交叉注意力提升MLLM在360°全景图像中的空间理解能力 | scene understanding multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | Coordinating Multiple Conditions for Trajectory-Controlled Human Motion Generation | 提出CMC框架,解决轨迹控制人体动作生成中多条件冲突与表示不一致问题 | text-to-motion motion generation human motion | ||
| 23 | Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation | 提出基于超网络的低秩适应风格化文本到动作生成方法 | motion diffusion model motion diffusion text-to-motion | ||
| 24 | ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin | 提出ArcVQ-VAE,通过球形向量量化框架提升图像建模中离散表示的质量。 | VQ-VAE | ✅ | |
| 25 | HetScene: Heterogeneity-Aware Diffusion for Dense Indoor Scene Generation | HetScene:异构感知扩散模型用于稠密室内场景生成 | physically plausible embodied AI | ||
| 26 | AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects | AssemblyBench:用于复杂工业对象物理感知装配的合成数据集与AssemblyDyno模型 | physically plausible multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting | 提出SCOUP,解耦语言表示学习与3D高斯优化,实现高效3D语言高斯溅射 | representation learning 3D gaussian splatting gaussian splatting | ||
| 28 | STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition | 提出STAR框架,通过语义时序自适应表示学习解决小样本动作识别中的语义时序错位问题。 | Mamba representation learning large language model | ✅ | |
| 29 | BrainAnytime: Anatomy-Aware Cross-Modal Pretraining for Brain Image Analysis with Arbitrary Modality Availability | BrainAnytime:解剖结构感知的跨模态预训练,用于任意模态脑影像分析 | masked autoencoder distillation foundation model | ✅ | |
| 30 | AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation | AnyFlow:基于流图蒸馏的任意步数视频扩散模型,解决一致性蒸馏模型在多步采样时性能下降的问题。 | distillation | ||
| 31 | GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models | 提出GRIP-VLM,通过强化学习进行组相对重要性剪枝,提升视觉-语言模型的效率。 | reinforcement learning multimodal |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 32 | Real2Sim: A Physics-driven and Editable Gaussian Splatting Framework for Autonomous Driving Scenes | 提出Real2Sim以解决自动驾驶场景生成中的现实差距问题 | real2sim policy learning gaussian splatting | ||
| 33 | Flow Augmentation and Knowledge Distillation for Lightweight Face Presentation Attack Detection | 提出基于光流增强和知识蒸馏的轻量级人脸活体检测方法 | manipulation distillation optical flow | ||
| 34 | CoGE: Sim-to-Real Online Geometric Estimation for Monocular Colonoscopy | CoGE:用于单目结肠镜的Sim-to-Real在线几何估计框架 | sim-to-real depth estimation scene reconstruction |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 35 | Weakly-Supervised Spatiotemporal Anomaly Detection | 提出一种弱监督时空异常检测方法,仅使用视频级别标签进行训练。 | spatiotemporal | ||
| 36 | DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution | DiffST:面向真实世界时空视频超分辨率的时空感知扩散模型 | spatiotemporal | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 37 | EgoForce: Robust Online Egocentric Motion Reconstruction via Diffusion Forcing | EgoForce:通过扩散强制实现鲁棒的在线第一人称视角运动重建 | egocentric motion reconstruction |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 38 | Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation | 提出Seg-Agent,实现无需训练的测试时多模态推理语言引导分割 | spatial relationship large language model multimodal |