cs.CV(2025-08-18)
📊 共 26 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (8 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2)
支柱一:机器人控制 (Robot Control) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models | 提出多模态连续思维链以解决多模态推理问题 | multimodal chain-of-thought | ✅ | |
| 10 | Holistic Evaluation of Multimodal LLMs on Spatial Intelligence | 提出EASI以全面评估多模态LLMs在空间智能上的表现 | multimodal | ||
| 11 | Omni Survey for Multimodality Analysis in Visual Object Tracking | 提出多模态视觉目标跟踪的全景调查以解决数据整合问题 | multimodal | ||
| 12 | Multi-source Multimodal Progressive Domain Adaption for Audio-Visual Deception Detection | 提出多源多模态渐进领域适应框架以解决音视频欺骗检测问题 | multimodal | ✅ | |
| 13 | ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images | 提出ViDA-UGC以解决UGC图像质量评估不足问题 | large language model multimodal chain-of-thought | ||
| 14 | Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection | 提出DriftBench以解决GenAI驱动的新闻多样性对虚假信息检测的挑战 | multimodal | ||
| 15 | DianJin-OCR-R1: Enhancing OCR Capabilities via a Reasoning-and-Tool Interleaved Vision-Language Model | 提出DianJin-OCR-R1以解决OCR任务中的幻觉问题 | large language model |
🔬 支柱一:机器人控制 (Robot Control) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Foundation Model for Skeleton-Based Human Action Understanding | 提出统一骨架基础模型以解决人类动作理解问题 | humanoid humanoid robot representation learning | ||
| 17 | Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score | 提出双对比去噪评分以解决文本到图像编辑问题 | manipulation contrastive learning structure preservation | ||
| 18 | Precise Action-to-Video Generation Through Visual Action Prompts | 提出视觉动作提示以解决动作到视频生成的精度与通用性问题 | manipulation human-object interaction HOI | ✅ | |
| 19 | IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion | 提出IGFuse以解决3D场景重建中的遮挡与覆盖问题 | manipulation scene reconstruction | ||
| 20 | Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping | 提出Odo以解决人形编辑中的形状保留问题 | manipulation SMPL |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | InnerGS: Internal Scenes Rendering via Factorized 3D Gaussian Splatting | 提出InnerGS以重建内部场景,解决传统方法的局限性 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 22 | Quantifying and Alleviating Co-Adaptation in Sparse-View 3D Gaussian Splatting | 提出新策略以缓解稀疏视图3D高斯点云的共适应问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 23 | DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation | 提出DMS以解决自监督深度估计中的视差模糊问题 | depth estimation monocular depth | ||
| 24 | IntelliCap: Intelligent Guidance for Consistent View Sampling | 提出IntelliCap以解决图像采集中的引导问题 | 3D gaussian splatting gaussian splatting splatting |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | EgoTwin: Dreaming Body and View in First Person | 提出EgoTwin以解决第一人称视频生成与人体运动建模问题 | motion generation egocentric first-person view |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation | 提出紧凑注意力机制以加速视频生成 | spatiotemporal | ✅ |