cs.CV(2026-02-08)

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱八:物理动画 (Physics-based Animation) (3 🔗2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation EasyTune:一种高效的步进式微调方法,用于扩散模型驱动的运动生成。 preference learning motion generation
2 MambaFusion: Adaptive State-Space Fusion for Multimodal 3D Object Detection MambaFusion:面向多模态3D目标检测的自适应状态空间融合 Mamba SSM multimodal
3 Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement 综述MLLM在图表理解中的应用:演进、局限与认知增强 reinforcement learning large language model multimodal
4 ViT-5: Vision Transformers for The Mid-2020s ViT-5:通过架构改进,为2020年代中期视觉任务提供更优的Vision Transformer骨干网络。 representation learning foundation model
5 MIND: Benchmarking Memory Consistency and Action Control in World Models MIND:用于评估世界模型记忆一致性和动作控制的综合性基准测试 world model
6 Geometry-Aware Rotary Position Embedding for Consistent Video World Model 提出ViewRope,通过几何感知旋转位置编码提升视频世界模型长期一致性 world model
7 PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification 提出PAND:提示感知邻域蒸馏,用于轻量级细粒度图像分类 distillation
8 Robustness of Vision Language Models Against Split-Image Harmful Input Attacks 提出SIVA攻击,揭示视觉语言模型在分割图像恶意输入下的脆弱性 RLHF distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
9 SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models 提出SPD-Faith Bench诊断多模态大语言模型CoT推理的忠实性问题,并提出SAGE框架提升。 large language model multimodal chain-of-thought
10 MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance MCIE-E1:基于多模态LLM和空间引导的复杂指令图像编辑方法 large language model multimodal instruction following
11 Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs 研究量化对多模态LLM在VQA任务中可靠性的影响,提出结合选择器置信度估计的优化方案。 large language model multimodal
12 MMLSv2: A Multimodal Dataset for Martian Landslide Detection in Remote Sensing Imagery MMLSv2:用于火星遥感影像中滑坡检测的多模态数据集 multimodal
13 VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval VidVec:利用视频MLLM嵌入实现视频-文本检索,无需额外视觉训练。 large language model foundation model multimodal
14 Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video SAGE:利用互联网视频弱监督,实现3D几何基础模型的可扩展自适应 foundation model
15 Rethinking Practical and Efficient Quantization Calibration for Vision-Language Models 提出TLQ框架,解决视觉-语言模型量化校准中视觉和文本token差异问题 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
16 Integrating Specialized and Generic Agent Motion Prediction with Dynamic Occupancy Grid Maps 提出结合动态占据栅格地图的通用与专用Agent运动预测框架,提升复杂场景下的预测精度。 occupancy grid scene flow motion prediction
17 Open-Text Aerial Detection: A Unified Framework For Aerial Visual Grounding And Detection 提出OTA-Det统一框架,解决开放文本空中检测与遥感视觉定位难题 scene understanding open-vocabulary open vocabulary
18 Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling Picasso:基于物理约束采样的整体场景重建方法 scene reconstruction physically plausible penetration
19 Dynamic Black-hole Emission Tomography with Physics-informed Neural Fields 提出PI-DEF,利用物理信息神经场进行动态黑洞发射层析成像 NeRF

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
20 FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging FlashVID:提出一种免训练的树形时空Token融合方法,高效加速视频大语言模型推理。 spatiotemporal large language model
21 Weak to Strong: VLM-Based Pseudo-Labeling as a Weakly Supervised Training Strategy in Multimodal Video-based Hidden Emotion Understanding Tasks 提出基于VLM伪标签的弱监督学习框架,用于多模态视频隐藏情感理解任务 spatiotemporal multimodal chain-of-thought
22 VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping VFace:一种基于扩散模型的免训练视频人脸替换方法 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 PhysDrape: Learning Explicit Forces and Collision Constraints for Physically Realistic Garment Draping PhysDrape:通过显式力和碰撞约束学习物理真实的服装悬垂 penetration

⬅️ 返回 cs.CV 首页 · 🏠 返回主页