cs.CV（2025-12-25）

📊 共 24 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (11) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding	提出Omni-Weather统一多模态模型，解决天气生成与理解分离的问题。	foundation model multimodal chain-of-thought
2	TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References	TrackTeller：提出时序多模态3D定位方法，解决行为依赖的对象指代问题	multimodal language conditioned
3	Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models	提出Scene-VLM，利用视觉-语言模型进行多模态视频场景分割，显著提升长视频理解能力。	multimodal
4	A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets	提出A-QCF-Net，解决非配对CT/MRI肝脏肿瘤分割问题，实现跨模态知识迁移。	multimodal
5	UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture	UniPercept：面向美学、质量、结构和纹理的统一感知级图像理解框架	large language model multimodal visual grounding
6	Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification	冻结编码器的参数高效训练提升多模态胸部X光分类性能	multimodal
7	The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency	提出B&J骨科临床推理基准，揭示视觉-语言模型在临床能力上的显著差距	large language model foundation model multimodal
8	TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant	提出TAME框架及LCMP基准，解决多模态大语言模型在个性化长程对话中的难题。	large language model multimodal
9	FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound	提出Fetal-Gauge胎儿超声视觉-语言基准，评估并提升VLM在产前诊断中的性能。	multimodal visual grounding
10	LLM-Free Image Captioning Evaluation in Reference-Flexible Settings	提出无LLM的图像描述评估指标Pearl，提升参考灵活场景下的评估性能	large language model
11	Hierarchy-Aware Fine-Tuning of Vision-Language Models	提出层级感知微调框架，高效提升视觉-语言模型在层级分类任务上的性能。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction	提出时空解耦混合专家网络ST-MoE，用于提升多人运动预测的精度与效率。	Mamba human motion motion prediction	✅
13	BertsWin: Resolving Topological Sparsity in 3D Masked Autoencoders via Component-Balanced Structural Optimization	BertsWin：通过组件平衡结构优化解决3D掩码自编码器中的拓扑稀疏性问题	masked autoencoder MAE spatial relationship
14	Self-Supervised Masked Autoencoders with Dense-Unet for Coronary Calcium Removal in limited CT Data	提出Dense-MAE，利用自监督学习去除有限CT数据中的冠状动脉钙化伪影。	masked autoencoder MAE
15	UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation	提出UltraLBM-UNet，用于资源受限场景下的高精度皮肤病灶分割	Mamba distillation	✅
16	AstraNav-World: World Model for Foresight Control and Consistency	AstraNav-World：用于预见性控制和一致性的世界模型，提升具身导航性能。	world model
17	Towards Long-window Anchoring in Vision-Language Model Distillation	提出LAid，通过知识蒸馏提升视觉-语言模型长文本窗口处理能力。	distillation

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Learning Dynamic Scene Reconstruction with Sinusoidal Geometric Priors	提出SirenPose，结合正弦几何先验学习动态场景重建，提升时空一致性。	scene reconstruction spatiotemporal
19	ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields	ShinyNeRF：提出一种神经辐射场方法，用于数字化各向异性外观。	NeRF neural radiance field
20	A Three-Level Alignment Framework for Large-Scale 3D Retrieval and Controlled 4D Generation	提出Uni4D框架以解决大规模3D检索与4D生成问题	open-vocabulary open vocabulary multimodal
21	Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective	从动态系统角度分析VGGT中Attention崩塌机制，揭示其根本原因。	VGGT

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement	提出ALARM框架，利用LMM Agent自提升实现无标签有害Meme检测。	HuMoR multimodal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	GeCo: A Differentiable Geometric Consistency Metric for Video Generation	提出GeCo，用于检测视频生成中几何形变和遮挡不一致性的人工痕迹。	geometric consistency

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG	提出改进的TSception模型，用于脑电信号分析驾驶员疲劳和精神负荷，提升稳定性和泛化性。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页