cs.CV(2026-01-09)
📊 共 21 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (7 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱一:机器人控制 (Robot Control) (3)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes | SceneAlign:通过场景图对齐多模态推理,提升复杂视觉场景下的推理忠实性。 | direct preference optimization large language model multimodal | ||
| 10 | LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting | 提出LayerGS,通过2D高斯溅射分解和修复分层3D人体Avatar,实现高质量虚拟试穿。 | distillation gaussian splatting splatting | ✅ | |
| 11 | LatentVLA: Efficient Vision-Language Models for Autonomous Driving via Latent Action Prediction | LatentVLA:基于自监督隐空间动作预测的高效自动驾驶视觉-语言模型 | distillation vision-language-action VLA | ||
| 12 | SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More | 提出SketchVL,通过细粒度信用分配优化策略,提升图表理解能力。 | reinforcement learning large language model multimodal | ||
| 13 | Boosting Latent Diffusion Models via Disentangled Representation Alignment | 提出Send-VAE,通过解耦表示对齐提升潜在扩散模型的生成质量与训练效率。 | representation learning classifier-free guidance foundation model | ||
| 14 | Adaptive Disentangled Representation Learning for Incomplete Multi-View Multi-Label Classification | 提出自适应解耦表示学习(ADRL)方法,解决不完整多视图多标签分类问题。 | representation learning | ||
| 15 | Compressing image encoders via latent distillation | 提出基于潜在空间蒸馏的图像编码器压缩方法,适用于资源受限场景 | distillation |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | FeatureSLAM: Feature-enriched 3D gaussian splatting SLAM in real time | FeatureSLAM:实时特征增强的3D高斯溅射SLAM系统 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 17 | GS-DMSR: Dynamic Sensitive Multi-scale Manifold Enhancement for Accelerated High-Quality 3D Gaussian Splatting | GS-DMSR:动态敏感多尺度流形增强加速高质量3D高斯溅射 | 3D gaussian splatting gaussian splatting splatting | ||
| 18 | GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras | GeoSurDepth:面向环视相机的空间几何一致性自监督深度估计 | depth estimation scene understanding foundation model |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting | GaussianSwap:基于3D高斯溅射的可动画视频人脸替换框架 | manipulation 3D gaussian splatting gaussian splatting | ||
| 20 | SceneFoundry: Generating Interactive Infinite 3D Worlds | SceneFoundry:提出一种语言引导的扩散框架,用于生成可交互的无限3D场景。 | manipulation embodied AI | ||
| 21 | Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals | Goal Force:提出基于力向量的视频生成模型,实现物理条件下的目标导向控制 | manipulation world model |