cs.CV(2026-02-13)

📊 共 30 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗6) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗4) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models 提出ADSC,利用LLM注意力机制自压缩视觉tokens,提升多模态大模型的效率。 large language model multimodal
2 Multimodal Classification via Total Correlation Maximization 提出TCMax,通过最大化总相关解决多模态分类中的模态竞争问题。 multimodal
3 Reliable Thinking with Images 提出RTWI以解决多模态大语言模型中带噪声的图像推理问题 large language model multimodal chain-of-thought
4 WISE: A Multimodal Search Engine for Visual Scenes, Audio, Objects, Faces, Speech, and Metadata WISE:一个用于视觉场景、音频、对象、人脸、语音和元数据的多模态搜索引擎 multimodal
5 VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph 提出VimRAG,通过多模态记忆图解决RAG中长程视觉上下文推理难题 multimodal
6 CBEN -- A Multimodal Machine Learning Dataset for Cloud Robust Remote Sensing Image Understanding 提出CBEN数据集,用于提升云遮挡下遥感图像理解的多模态机器学习鲁棒性 multimodal
7 PLLM: Pseudo-Labeling Large Language Models for CAD Program Synthesis 提出PLLM,利用伪标签自训练CAD程序生成,解决无配对数据问题。 large language model
8 Human-Aligned MLLM Judges for Fine-Grained Image Editing Evaluation: A Benchmark, Framework, and Analysis 提出基于MLLM的细粒度图像编辑评估框架,解决传统指标粗糙、缺乏可解释性问题。 large language model multimodal
9 Thinking Like a Radiologist: A Dataset for Anatomy-Guided Interleaved Vision Language Reasoning in Chest X-ray Interpretation 提出MMRad-IVL-22K数据集,用于解剖学引导的胸部X光片判读中的交错视觉语言推理。 multimodal chain-of-thought
10 Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions 提出ASID-1M数据集与ASID-Captioner模型,提升通用视频多模态大模型在细粒度理解上的性能。 instruction following
11 Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding 提出基于分层推测解码的文档解析VLM无训练加速方法,提升长文档处理效率。 multimodal
12 QuEPT: Quantized Elastic Precision Transformers with One-Shot Calibration for Multi-Bit Switching QuEPT:一种用于Transformer的多比特切换的量化弹性精度单次校准方案。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
13 Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation Curriculum-DPO++:通过数据和模型课程学习优化文本到图像生成 reinforcement learning RLHF DPO
14 Benchmarking Video Foundation Models for Remote Parkinson's Disease Screening 利用视频基础模型进行远程帕金森病筛查的基准测试研究 representation learning foundation model
15 Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation 提出MLLMEmbed-ReID,通过自适应SVD蒸馏实现边缘端跨模态ReID distillation large language model
16 Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting 提出AD-LiST-JEPA,用于LiDAR占用补全和预测的自监督世界模型 world model spatiotemporal
17 Frequency-Enhanced Hilbert Scanning Mamba for Short-Term Arctic Sea Ice Concentration Prediction 提出频率增强Hilbert扫描Mamba框架,用于短时北极海冰浓度预测。 Mamba spatiotemporal
18 SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning SpargeAttention2:通过混合Top-k+Top-p掩码和蒸馏微调实现可训练的稀疏注意力,加速视频扩散模型。 distillation
19 Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening 提出运动先验蒸馏方法,解决生成式视频插帧中的时序不连贯问题。 distillation
20 LiDAR-Anchored Collaborative Distillation for Robust 2D Representations 提出LiDAR锚定的协同蒸馏,增强2D表征在恶劣天气下的鲁棒性 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
21 GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction 提出GSM-GS,结合几何约束的单/多视角高斯溅射表面重建方法 3D gaussian splatting gaussian splatting splatting
22 RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads RoadscapesQA:提出一个用于印度道路场景视觉问答的多任务多模态数据集。 scene understanding multimodal
23 Unbiased Gradient Estimation for Event Binning via Functional Backpropagation 提出基于泛函反向传播的事件分箱无偏梯度估计方法 optical flow motion estimation
24 Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision 提出ConverSeg数据集和ConverSeg-Net,解决会话图像分割中抽象概念的像素级定位问题。 affordance
25 LongStream: Long-Sequence Streaming Autoregressive Visual Geometry LongStream:提出解耦的自回归视觉几何模型,实现长序列流式三维重建 scene reconstruction
26 DynaGuide: A Generalizable Dynamic Guidance Framework for Unsupervised Semantic Segmentation DynaGuide:一种通用的动态引导框架,用于无监督语义分割。 scene understanding

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace 单目无标记动作捕捉实现上肢可达工作空间定量评估 markerless motion capture

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
28 Represent Micro-Doppler Signature in Orders 提出基于切比雪夫时间图的微多普勒特征表征方法,用于穿墙雷达人体活动识别。 human motion

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
29 FedHENet: A Frugal Federated Learning Framework for Heterogeneous Environments FedHENet:一种面向异构环境的节能联邦学习框架 OMOMO

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
30 Adaptive Scaling with Geometric and Visual Continuity of completed 3D objects 提出一种基于部件感知的自适应缩放框架,用于编辑和变形3D补全对象。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页