cs.CV（2026-02-08）

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱八：物理动画 (Physics-based Animation) (3 🔗2) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation	EasyTune：一种高效的步进式微调方法，用于扩散模型驱动的运动生成。	preference learning motion generation	✅
2	MambaFusion: Adaptive State-Space Fusion for Multimodal 3D Object Detection	MambaFusion：面向多模态3D目标检测的自适应状态空间融合	Mamba SSM multimodal
3	Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement	综述MLLM在图表理解中的应用：演进、局限与认知增强	reinforcement learning large language model multimodal
4	ViT-5: Vision Transformers for The Mid-2020s	ViT-5：通过架构改进，为2020年代中期视觉任务提供更优的Vision Transformer骨干网络。	representation learning foundation model
5	MIND: Benchmarking Memory Consistency and Action Control in World Models	MIND：用于评估世界模型记忆一致性和动作控制的综合性基准测试	world model	✅
6	Geometry-Aware Rotary Position Embedding for Consistent Video World Model	提出ViewRope，通过几何感知旋转位置编码提升视频世界模型长期一致性	world model
7	PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification	提出PAND：提示感知邻域蒸馏，用于轻量级细粒度图像分类	distillation	✅
8	Robustness of Vision Language Models Against Split-Image Harmful Input Attacks	提出SIVA攻击，揭示视觉语言模型在分割图像恶意输入下的脆弱性	RLHF distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models	提出SPD-Faith Bench诊断多模态大语言模型CoT推理的忠实性问题，并提出SAGE框架提升。	large language model multimodal chain-of-thought	✅
10	MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance	MCIE-E1：基于多模态LLM和空间引导的复杂指令图像编辑方法	large language model multimodal instruction following
11	Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs	研究量化对多模态LLM在VQA任务中可靠性的影响，提出结合选择器置信度估计的优化方案。	large language model multimodal
12	MMLSv2: A Multimodal Dataset for Martian Landslide Detection in Remote Sensing Imagery	MMLSv2：用于火星遥感影像中滑坡检测的多模态数据集	multimodal	✅
13	VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval	VidVec：利用视频MLLM嵌入实现视频-文本检索，无需额外视觉训练。	large language model foundation model multimodal
14	Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video	SAGE：利用互联网视频弱监督，实现3D几何基础模型的可扩展自适应	foundation model
15	Rethinking Practical and Efficient Quantization Calibration for Vision-Language Models	提出TLQ框架，解决视觉-语言模型量化校准中视觉和文本token差异问题	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Integrating Specialized and Generic Agent Motion Prediction with Dynamic Occupancy Grid Maps	提出结合动态占据栅格地图的通用与专用Agent运动预测框架，提升复杂场景下的预测精度。	occupancy grid scene flow motion prediction
17	Open-Text Aerial Detection: A Unified Framework For Aerial Visual Grounding And Detection	提出OTA-Det统一框架，解决开放文本空中检测与遥感视觉定位难题	scene understanding open-vocabulary open vocabulary
18	Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling	Picasso：基于物理约束采样的整体场景重建方法	scene reconstruction physically plausible penetration
19	Dynamic Black-hole Emission Tomography with Physics-informed Neural Fields	提出PI-DEF，利用物理信息神经场进行动态黑洞发射层析成像	NeRF

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
20	FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging	FlashVID：提出一种免训练的树形时空Token融合方法，高效加速视频大语言模型推理。	spatiotemporal large language model	✅
21	Weak to Strong: VLM-Based Pseudo-Labeling as a Weakly Supervised Training Strategy in Multimodal Video-based Hidden Emotion Understanding Tasks	提出基于VLM伪标签的弱监督学习框架，用于多模态视频隐藏情感理解任务	spatiotemporal multimodal chain-of-thought
22	VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping	VFace：一种基于扩散模型的免训练视频人脸替换方法	spatiotemporal	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	PhysDrape: Learning Explicit Forces and Collision Constraints for Physically Realistic Garment Draping	PhysDrape：通过显式力和碰撞约束学习物理真实的服装悬垂	penetration

⬅️ 返回 cs.CV 首页 · 🏠 返回主页