cs.CV(2025-10-03)
📊 共 13 篇论文
🎯 兴趣领域导航
支柱一:机器人控制 (Robot Control) (4)
支柱九:具身大模型 (Embodied Foundation Models) (3)
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱二:RL算法与架构 (RL & Architecture) (2)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱一:机器人控制 (Robot Control) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields | 研究几何信息在神经辐射场语义蒸馏中的作用,并提出无初始猜测的辐射场反演框架SPINE。 | manipulation distillation gaussian splatting | ||
| 2 | SketchPlan: Diffusion Based Drone Planning From Human Sketches | SketchPlan:基于扩散模型的无人机规划,从人类草图生成飞行路径 | sim-to-real 3D gaussian splatting gaussian splatting | ||
| 3 | Mask2IV: Interaction-Centric Video Generation via Mask Trajectories | 提出Mask2IV以解决复杂交互视频生成问题 | manipulation affordance human-object interaction | ||
| 4 | Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime! | 提出DragStream,实现基于拖拽的流式交互视频编辑,支持任意对象、任意时刻的精细控制。 | manipulation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis | 提出GAS-MIL框架,用于数字病理图像分析中集成多个预训练模型。 | foundation model multimodal | ||
| 6 | Domain Generalization for Semantic Segmentation: A Survey | 领域泛化语义分割综述:分析现有方法并展望基于预训练模型的新方向 | foundation model | ||
| 7 | Spatial-ViLT: Enhancing Visual Spatial Reasoning through Multi-Task Learning | Spatial-ViLT通过多任务学习增强视觉空间推理能力 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes | 利用多模态LLM在低数据量下高效微调,提升目标检测性能 | scene understanding large language model | ||
| 9 | ROGR: Relightable 3D Objects using Generative Relighting | ROGR:利用生成式光照重构可重新光照的3D物体模型 | NeRF neural radiance field | ||
| 10 | FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min | FSFSplatter:提出快速表面重建方法,仅用稀疏视图在2分钟内构建场景。 | gaussian splatting splatting |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models | LEAML:面向多模态大语言模型,高效适应领域外视觉任务 | distillation large language model multimodal | ||
| 12 | PEaRL: Pathway-Enhanced Representation Learning for Gene and Pathway Expression Prediction from Histology | PEaRL:通过通路增强表示学习,从组织学图像预测基因和通路表达 | representation learning contrastive learning multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Physics Knowledge in Frontier Models: A Diagnostic Study of Failure Modes | 通过构建细粒度诊断测试,揭示前沿视觉-语言模型在物理推理上的失效模式。 | motion prediction |