cs.CV（2025-10-03）

📊 共 13 篇论文

🎯 兴趣领域导航

支柱一：机器人控制 (Robot Control) (4) 支柱九：具身大模型 (Embodied Foundation Models) (3) 支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱二：RL算法与架构 (RL & Architecture) (2) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields	研究几何信息在神经辐射场语义蒸馏中的作用，并提出无初始猜测的辐射场反演框架SPINE。	manipulation distillation gaussian splatting
2	SketchPlan: Diffusion Based Drone Planning From Human Sketches	SketchPlan：基于扩散模型的无人机规划，从人类草图生成飞行路径	sim-to-real 3D gaussian splatting gaussian splatting
3	Mask2IV: Interaction-Centric Video Generation via Mask Trajectories	提出Mask2IV以解决复杂交互视频生成问题	manipulation affordance human-object interaction
4	Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!	提出DragStream，实现基于拖拽的流式交互视频编辑，支持任意对象、任意时刻的精细控制。	manipulation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
5	GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis	提出GAS-MIL框架，用于数字病理图像分析中集成多个预训练模型。	foundation model multimodal
6	Domain Generalization for Semantic Segmentation: A Survey	领域泛化语义分割综述：分析现有方法并展望基于预训练模型的新方向	foundation model
7	Spatial-ViLT: Enhancing Visual Spatial Reasoning through Multi-Task Learning	Spatial-ViLT通过多任务学习增强视觉空间推理能力	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes	利用多模态LLM在低数据量下高效微调，提升目标检测性能	scene understanding large language model
9	ROGR: Relightable 3D Objects using Generative Relighting	ROGR：利用生成式光照重构可重新光照的3D物体模型	NeRF neural radiance field
10	FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min	FSFSplatter：提出快速表面重建方法，仅用稀疏视图在2分钟内构建场景。	gaussian splatting splatting

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
11	LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models	LEAML：面向多模态大语言模型，高效适应领域外视觉任务	distillation large language model multimodal
12	PEaRL: Pathway-Enhanced Representation Learning for Gene and Pathway Expression Prediction from Histology	PEaRL：通过通路增强表示学习，从组织学图像预测基因和通路表达	representation learning contrastive learning multimodal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Physics Knowledge in Frontier Models: A Diagnostic Study of Failure Modes	通过构建细粒度诊断测试，揭示前沿视觉-语言模型在物理推理上的失效模式。	motion prediction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页