cs.CV(2025-10-03)
📊 共 30 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (12 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (5)
支柱二:RL算法与架构 (RL & Architecture) (5)
支柱一:机器人控制 (Robot Control) (5 🔗1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting | 提出语义引导的动态3D高斯溅射运动控制方法,解决单目视频动态重建中的控制点分配难题。 | 3D gaussian splatting gaussian splatting splatting | ||
| 14 | Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes | 利用多模态LLM高效微调,解决低数据量下的目标检测问题 | scene understanding large language model | ||
| 15 | ROGR: Relightable 3D Objects using Generative Relighting | ROGR:利用生成式光照重构可重新光照的3D物体模型 | NeRF neural radiance field | ||
| 16 | FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min | FSFSplatter:提出快速表面重建方法,仅用稀疏视图在2分钟内构建场景。 | gaussian splatting splatting | ||
| 17 | Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles | 提出基于潜空间集成的随机共振对抗攻击防御方法,无需训练且适用多种任务。 | optical flow |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models | LEAML:面向多模态大语言模型,实现标签高效的领域外视觉任务自适应 | distillation large language model multimodal | ||
| 19 | Training-Free Out-Of-Distribution Segmentation With Foundation Models | 提出一种免训练的异常分割方法,利用预训练模型进行域外检测。 | representation learning foundation model | ||
| 20 | Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval | 提出Retrv-R1,一种基于推理驱动的多模态大语言模型框架,用于通用且高效的多模态检索。 | reinforcement learning multimodal | ||
| 21 | PEaRL: Pathway-Enhanced Representation Learning for Gene and Pathway Expression Prediction from Histology | PEaRL:通过通路增强表示学习,从组织学图像预测基因和通路表达 | representation learning contrastive learning multimodal | ||
| 22 | Smart-GRPO: Smartly Sampling Noise for Efficient RL of Flow-Matching Models | Smart-GRPO:优化噪声采样,提升Flow-Matching模型强化学习效率 | reinforcement learning flow matching |
🔬 支柱一:机器人控制 (Robot Control) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields | 研究几何信息在神经辐射场语义蒸馏中的作用,并提出SPINE框架实现无初始猜测的辐射场反演。 | manipulation distillation gaussian splatting | ||
| 24 | SketchPlan: Diffusion Based Drone Planning From Human Sketches | SketchPlan:基于扩散模型的无人机规划,从人类草图生成飞行路径 | sim-to-real 3D gaussian splatting gaussian splatting | ||
| 25 | Mask2IV: Interaction-Centric Video Generation via Mask Trajectories | Mask2IV:通过Mask轨迹实现交互中心视频生成,无需密集Mask标注。 | manipulation affordance human-object interaction | ||
| 26 | Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime! | 提出DragStream,实现基于拖拽的流式交互视频编辑,支持任意对象、任意时刻的精细控制。 | manipulation | ||
| 27 | MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding | 提出MaskCD,通过图像头掩码对比解码缓解LVLM幻觉问题 | manipulation multimodal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | MoGIC: Boosting Motion Generation via Intention Understanding and Visual Context | MoGIC:通过意图理解和视觉上下文增强运动生成 | text-driven motion motion synthesis motion generation | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | ReeMark: Reeb Graphs for Simulating Patterns of Life in Spatiotemporal Trajectories | 提出ReeMark,利用Reeb图模拟时空轨迹中的生活模式,用于城市规划等。 | spatiotemporal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion | GeoComplete:提出几何感知扩散模型,用于参考图像驱动的图像补全,显著提升几何一致性。 | geometric consistency |