cs.CV(2026-01-22)
📊 共 24 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (9 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (5)
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Opening the Black Box: Preliminary Insights into Affective Modeling in Multimodal Foundation Models | 提出系统性研究以揭示多模态基础模型中的情感建模机制 | foundation model multimodal | ||
| 11 | Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs | 提出BVS框架,通过语义无关输入破解多模态大语言模型有害图像生成限制 | large language model multimodal | ||
| 12 | Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing | 提出EDIR:一个基于图像编辑的细粒度组合图像检索评测基准。 | multimodal | ||
| 13 | VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning | VideoThinker:构建基于LLM引导工具推理的Agentic视频大语言模型 | large language model | ||
| 14 | Zero-Shot Product Attribute Labeling with Vision-Language Models: A Three-Tier Evaluation Framework | 提出三层评估框架,利用视觉-语言模型实现零样本产品属性标注 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling | ThermoSplat:基于特征调制和几何解耦的跨模态3D高斯溅射重建 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 16 | EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis | EVolSplat4D:高效的体素化高斯溅射方法,用于4D城市场景合成 | 3D gaussian splatting gaussian splatting splatting | ||
| 17 | LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting | 提出LL-GaussianImage,用于在2D高斯溅射压缩域内进行零样本弱光增强。 | gaussian splatting splatting | ||
| 18 | LL-GaussianMap: Zero-shot Low-Light Image Enhancement via 2D Gaussian Splatting Guided Gain Maps | 提出LL-GaussianMap,利用2D高斯溅射引导增益图实现零样本弱光图像增强。 | gaussian splatting splatting |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Region-aware Spatiotemporal Modeling with Collaborative Domain Generalization for Cross-Subject EEG Emotion Recognition | 提出基于区域感知的时空建模与协同领域泛化框架,用于跨被试脑电情绪识别。 | spatiotemporal | ✅ | |
| 20 | PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation | PyraTok:用于视频理解和生成的语言对齐金字塔式分词器 | spatiotemporal zero-shot transfer |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Masked Modeling for Human Motion Recovery Under Occlusions | 提出MoRo:一种基于掩码建模的遮挡鲁棒人体运动恢复框架 | human motion |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling | Skywork UniPic 3.0:提出基于序列建模的统一多图合成框架,实现高质量图像融合。 | human-object interaction HOI multimodal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Event-VStream: Event-Driven Real-Time Understanding for Long Video Streams | Event-VStream:事件驱动的长视频流实时理解框架 | Ego4D large language model multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models | 提出DTP框架,通过剪枝干扰token提升视觉-语言动作模型在机器人操作任务中的成功率。 | manipulation VLA |