cs.CV(2026-01-28)
📊 共 22 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱四:生成式动作 (Generative Motion) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models | 提出MARE,通过多模态对齐与强化学习,实现可解释的Deepfake检测。 | reinforcement learning RLHF multimodal | ||
| 10 | Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification | 统一视觉编码与视觉Token技术,为多模态大模型及具身智能提供高效压缩方案 | representation learning embodied AI large language model | ||
| 11 | MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis | MMSF:用于WSI分类和生存分析的多任务多模态监督框架 | Mamba multimodal | ||
| 12 | Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework | 提出LEAF框架,解耦感知与校准,实现标签高效的图像质量评估 | distillation large language model multimodal | ||
| 13 | Advancing Open-source World Models | LingBot-World:开源高保真、长时记忆、实时交互的世界模型 | world model | ||
| 14 | RAW-Flow: Advancing RGB-to-RAW Image Reconstruction with Deterministic Latent Flow Matching | 提出RAW-Flow,通过确定性隐空间流匹配实现高质量RGB到RAW图像重建 | flow matching |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Open-Vocabulary Functional 3D Human-Scene Interaction Generation | 提出FunHSI框架,实现开放词汇的功能性3D人-场景交互生成 | open-vocabulary open vocabulary physically plausible | ||
| 16 | FreeFix: Boosting 3D Gaussian Splatting via Fine-Tuning-Free Diffusion Models | FreeFix:通过免微调扩散模型提升3D高斯溅射渲染质量 | 3D gaussian splatting gaussian splatting splatting | ||
| 17 | GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction | GVGS:高斯可见性感知多视图几何,用于精确表面重建 | monocular depth 3D gaussian splatting gaussian splatting | ✅ | |
| 18 | Physically Guided Visual Mass Estimation from a Single RGB Image | 提出一种物理引导的单RGB图像物体质量估计框架,提升质量预测精度。 | depth estimation monocular depth |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization | 提出CURVE框架,通过不确定性引导的正则化学习因果不变表示,提升场景理解的鲁棒性。 | sim-to-real scene understanding zero-shot transfer | ||
| 20 | Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance | 提出 Quartet of Diffusions,通过部件和对称性引导实现结构感知的点云生成。 | manipulation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | HINT: Hierarchical Interaction Modeling for Autoregressive Multi-Human Motion Generation | HINT:用于自回归多人运动生成的层级交互建模框架 | motion generation human motion |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models | 提出SpatialGenEval基准与SpatialT2I数据集,提升文本生成图像模型空间智能 | spatial relationship foundation model |