cs.CV(2025-11-07)
📊 共 27 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (8 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱四:生成式动作 (Generative Motion) (3)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization | PreResQ-R1:通过解耦偏好-响应策略优化,实现视觉质量评估的细粒度排序和评分强化学习 | reinforcement learning large language model multimodal | ||
| 13 | DeepEyesV2: Toward Agentic Multimodal Model | DeepEyesV2:面向具身智能的多模态模型,提升工具调用能力 | reinforcement learning multimodal | ||
| 14 | Visual Spatial Tuning | 提出视觉空间调优(VST)框架,提升视觉语言模型(VLM)的空间感知和推理能力。 | reinforcement learning spatial relationship vision-language-action | ||
| 15 | Cross-domain EEG-based Emotion Recognition with Contrastive Learning | 提出EmotionCLIP以解决跨域EEG情感识别问题 | contrastive learning multimodal | ✅ | |
| 16 | MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification | MUSE:用于细胞核检测和分类的多尺度密集自蒸馏方法 | distillation foundation model | ||
| 17 | TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning | 提出TimeSearch-R,通过自验证强化学习进行长视频理解的自适应时序搜索。 | reinforcement learning Ego4D | ✅ | |
| 18 | Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale | 提出Long Grounded Thoughts框架,用于大规模合成高质量视觉推理链数据,提升视觉语言模型性能。 | offline RL multimodal | ||
| 19 | Another BRIXEL in the Wall: Towards Cheaper Dense Features | 提出BRIXEL,通过知识蒸馏降低密集特征计算成本,提升下游任务性能。 | distillation foundation model | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting | CLM:消除3D高斯溅射的GPU内存瓶颈,实现大规模场景渲染 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 21 | 4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos | 4D3R:提出运动感知神经重建与渲染框架,解决单目视频动态场景的新视角合成问题。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 22 | Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges | Splatography:稀疏多视角动态高斯溅射,应对电影制作挑战 | gaussian splatting splatting | ||
| 23 | No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation | 提出PITTA:一种无需姿态估计的、实例感知的单目深度估计测试时自适应框架 | depth estimation monocular depth |
🔬 支柱四:生成式动作 (Generative Motion) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | Pressure2Motion: Hierarchical Human Motion Reconstruction from Ground Pressure with Text Guidance | Pressure2Motion:提出一种基于地面压力和文本引导的分层人体运动重建算法。 | physically plausible human motion | ||
| 25 | Dense Motion Captioning | 提出Dense Motion Captioning任务与CompMo数据集,并构建DEMO模型用于3D人体运动理解与描述。 | text-to-motion motion generation human motion | ||
| 26 | Learning Fourier shapes to probe the geometric world of deep neural networks | 提出基于傅里叶形状的框架,用于探究深度神经网络的几何世界 | physically plausible |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | DeepForgeSeal: Latent Space-Driven Semi-Fragile Watermarking for Deepfake Detection Using Multi-Agent Adversarial Reinforcement Learning | 提出DeepForgeSeal,利用潜空间水印和对抗强化学习进行深度伪造检测。 | manipulation reinforcement learning |