cs.CV(2025-10-03)

📊 共 13 篇论文

🎯 兴趣领域导航

支柱一:机器人控制 (Robot Control) (4) 支柱九:具身大模型 (Embodied Foundation Models) (3) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
1 Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields 研究几何信息在神经辐射场语义蒸馏中的作用,并提出无初始猜测的辐射场反演框架SPINE。 manipulation distillation gaussian splatting
2 SketchPlan: Diffusion Based Drone Planning From Human Sketches SketchPlan:基于扩散模型的无人机规划,从人类草图生成飞行路径 sim-to-real 3D gaussian splatting gaussian splatting
3 Mask2IV: Interaction-Centric Video Generation via Mask Trajectories 提出Mask2IV以解决复杂交互视频生成问题 manipulation affordance human-object interaction
4 Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime! 提出DragStream,实现基于拖拽的流式交互视频编辑,支持任意对象、任意时刻的精细控制。 manipulation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
5 GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis 提出GAS-MIL框架,用于数字病理图像分析中集成多个预训练模型。 foundation model multimodal
6 Domain Generalization for Semantic Segmentation: A Survey 领域泛化语义分割综述:分析现有方法并展望基于预训练模型的新方向 foundation model
7 Spatial-ViLT: Enhancing Visual Spatial Reasoning through Multi-Task Learning Spatial-ViLT通过多任务学习增强视觉空间推理能力 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
8 Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes 利用多模态LLM在低数据量下高效微调,提升目标检测性能 scene understanding large language model
9 ROGR: Relightable 3D Objects using Generative Relighting ROGR:利用生成式光照重构可重新光照的3D物体模型 NeRF neural radiance field
10 FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min FSFSplatter:提出快速表面重建方法,仅用稀疏视图在2分钟内构建场景。 gaussian splatting splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
11 LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models LEAML:面向多模态大语言模型,高效适应领域外视觉任务 distillation large language model multimodal
12 PEaRL: Pathway-Enhanced Representation Learning for Gene and Pathway Expression Prediction from Histology PEaRL:通过通路增强表示学习,从组织学图像预测基因和通路表达 representation learning contrastive learning multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
13 Physics Knowledge in Frontier Models: A Diagnostic Study of Failure Modes 通过构建细粒度诊断测试,揭示前沿视觉-语言模型在物理推理上的失效模式。 motion prediction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页