cs.CV(2026-05-01)
📊 共 24 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱四:生成式动作 (Generative Motion) (2)
支柱一:机器人控制 (Robot Control) (2)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | 2D-SuGaR: Surface-Aware Gaussian Splatting for Geometrically Accurate Mesh Reconstruction | 提出2D-SuGaR,利用单目深度和法向量先验提升2D高斯溅射的几何重建精度。 | monocular depth 3D gaussian splatting 3DGS | ||
| 11 | GOR-IS: 3D Gaussian Object Removal in the Intrinsic Space | 提出GOR-IS,在内参空间实现3D高斯模型的物体移除与光照一致性修复。 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 12 | Foundation AI Models for Aerosol Optical Depth Estimation from PACE Satellite Data | 提出ViTCG,利用Transformer和通道分组进行气溶胶光学厚度估计,显著降低误差。 | depth estimation foundation model | ||
| 13 | Modeling Subjective Urban Perception with Human Gaze | 提出Place Pulse-Gaze数据集和Gaze-Guided框架,利用人类注视建模主观城市感知。 | scene understanding PULSE multimodal | ||
| 14 | Pose-Aware Diffusion for 3D Generation | 提出姿态感知扩散模型PAD,用于生成姿态对齐的3D物体,解决空间错位和变换歧义问题。 | monocular depth scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Beyond Heuristics: Learnable Density Control for 3D Gaussian Splatting | LeGS:基于强化学习的可学习密度控制,提升3D高斯溅射渲染质量 | reinforcement learning 3D gaussian splatting 3DGS | ✅ | |
| 16 | Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis | 提出FAST和SFP,用于压缩CT图像的资源高效医学图像分析 | contrastive learning distillation spatiotemporal | ||
| 17 | Posterior Augmented Flow Matching | 提出后验增强Flow Matching,解决高维图像生成中Flow Collapse问题 | flow matching | ✅ | |
| 18 | Online Self-Calibration Against Hallucination in Vision-Language Models | 提出OSCAR框架,在线自校准视觉-语言模型中的幻觉问题 | direct preference optimization multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation | PhysiGen:集成碰撞感知物理约束,实现高保真的人人交互生成 | motion synthesis penetration multi-person interaction | ||
| 20 | Robust Fusion of Object-Level V2X for Learned 3D Object Detection | 提出噪声感知训练策略,提升V2X融合3D目标检测在噪声环境下的鲁棒性 | penetration |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation | 提出Colorful-Noise,通过无训练的低频噪声操控实现彩色条件图像生成。 | manipulation | ||
| 22 | From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models | 提出iERF中心统一框架,实现视觉模型局部、全局和机制可解释性 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation | 提出MMAudioReverbs,利用视频引导的声学建模进行解混响和房间脉冲响应估计 | PULSE | ||
| 24 | CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection | CMTA:利用跨模态时间伪影实现通用AI生成视频检测 | spatiotemporal | ✅ |