cs.CV(2026-01-16)

📊 共 12 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱一:机器人控制 (Robot Control) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
1 Generative Scenario Rollouts for End-to-End Autonomous Driving 提出GeRo框架,通过生成式场景展开提升端到端自动驾驶性能。 reinforcement learning imitation learning vision-language-action
2 MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement MMedExpert-R1:通过领域自适应和临床指南强化提升多模态医学推理能力 reinforcement learning multimodal
3 FTDMamba: Frequency-Assisted Temporal Dilation Mamba for Unmanned Aerial Vehicle Video Anomaly Detection 提出FTDMamba,用于解决动态背景下无人机视频异常检测难题。 Mamba spatiotemporal
4 PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models 提出PhysRVG,通过物理感知强化学习提升视频生成模型中刚体运动的真实性。 reinforcement learning
5 SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention 提出SoLA-Vision,一种细粒度层级线性-Softmax混合注意力视觉模型,提升高分辨率图像处理的效率与精度。 linear attention representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
6 Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning 提出VIGA:通过交错多模态推理实现视觉逆向图形Agent,用于场景重建与编辑。 multimodal
7 MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models 提出MHA2MLA-VLM,实现DeepSeek经济高效的多头潜在注意力跨视觉-语言模型迁移。 multimodal
8 Map2Thought: Explicit 3D Spatial Reasoning via Metric Cognitive Maps 提出Map2Thought框架,通过度量认知地图实现3D视觉语言模型中显式的空间推理。 chain-of-thought
9 Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding 提出Think-Clip-Sample以解决长视频理解中的帧选择问题 large language model
10 SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2 SAMannot:基于SAM2的内存高效、本地化交互式视频实例分割框架 foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
11 X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning X-Distill:面向机器人视觉运动学习的跨架构视觉知识蒸馏 manipulation diffusion policy distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
12 IDDR-NGP: Incorporating Detectors for Distractor Removal with Instant Neural Radiance Field IDDR-NGP:融合检测器的Instant-NGP场景干扰物移除方法 neural radiance field

⬅️ 返回 cs.CV 首页 · 🏠 返回主页