cs.CV(2026-01-15)

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱一:机器人控制 (Robot Control) (3) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 其他 (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 DR$^2$Seg: Decomposed Two-Stage Rollouts for Efficient Reasoning Segmentation in Multimodal Large Language Models 提出DR$^2$Seg框架,提升多模态大语言模型在推理分割任务中的效率与精度。 large language model multimodal
2 Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer 提出一种缺失感知的多模态生存预测框架,用于解决非小细胞肺癌中数据缺失问题。 foundation model multimodal
3 ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding ROMA:用于交互式流式理解的实时全模态助手 large language model multimodal
4 See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection 提出基于随机patch选择的通用端到端自动驾驶方法,提升泛化性和效率。 foundation model
5 Hierarchical Refinement of Universal Multimodal Attacks on Vision-Language Models 提出层级细化的通用多模态攻击框架HRA,提升视觉-语言模型的鲁棒性 multimodal
6 Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method 提出视频异常推理任务与数据集,并设计自适应多阶段推理模型Vad-R1-Plus large language model multimodal chain-of-thought
7 V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation V-Zero:一种基于无标注数据的多模态自提升推理框架 multimodal
8 VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language Models 提出VERHallu基准评测并设计KFP策略,缓解视频大语言模型中的事件关系幻觉问题 large language model
9 Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs 提出基于层选择多模态大语言模型的细粒度人体姿态编辑评估方法 large language model multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
10 Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting 提出基于流引导3D高斯溅射的结构感知风格迁移方法,实现梵高式艺术风格的几何抽象。 3D gaussian splatting 3DGS gaussian splatting
11 RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation 提出RSATalker,用于支持多轮对话的逼真社交感知说话头生成 3D gaussian splatting 3DGS gaussian splatting
12 UEOF: A Benchmark Dataset for Underwater Event-Based Optical Flow 提出UEOF水下事件相机光流基准数据集,促进水下事件视觉研究 optical flow
13 Unleashing the Capabilities of Large Vision-Language Models for Intelligent Perception of Roadside Infrastructure 提出领域自适应框架,利用大视觉语言模型实现道路基础设施的智能感知。 open-vocabulary open vocabulary

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
14 LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning LaViT:对齐潜在视觉思维以实现多模态推理 distillation multimodal visual grounding
15 Action100M: A Large-scale Video Action Dataset 提出Action100M大规模视频动作数据集,促进视频理解和世界建模研究。 world model open-vocabulary open vocabulary
16 Inference-time Physics Alignment of Video Generative Models with Latent World Models 提出WMReward,通过推理时物理对齐提升视频生成模型的物理合理性 world model
17 Difficulty-guided Sampling: Bridging the Target Gap between Dataset Distillation and Downstream Tasks 提出难度引导采样(DGS)以弥合数据集蒸馏与下游任务之间的目标差距。 distillation

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
18 FlowAct-R1: Towards Interactive Humanoid Video Generation FlowAct-R1:面向实时交互的人形视频生成框架,实现高保真和低延迟的平衡。 humanoid distillation
19 RAG-3DSG: Enhancing 3D Scene Graphs with Re-Shot Guided Retrieval-Augmented Generation 提出RAG-3DSG,通过重拍引导检索增强生成提升3D场景图质量。 manipulation open-vocabulary open vocabulary
20 EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing EditEmoTalk:提出可控的语音驱动3D面部动画框架,支持连续表情编辑 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
21 Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge 优化多模态LLM用于以自我为中心的视频理解,解决HD-EPIC VQA挑战 egocentric large language model multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion 提出动态跨层注入CLI,解决视觉-语言模型中视觉特征瓶颈问题。 AMP large language model multimodal

📄 其他

#题目一句话要点标签🔗
23 CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos CoMoVi:提出协同生成框架,同步生成3D人体动作和逼真视频

⬅️ 返回 cs.CV 首页 · 🏠 返回主页