cs.CV(2025-12-25)

📊 共 24 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding 提出Omni-Weather统一多模态模型,解决天气生成与理解分离的问题。 foundation model multimodal chain-of-thought
2 TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References TrackTeller:提出时序多模态3D定位方法,解决行为依赖的对象指代问题 multimodal language conditioned
3 Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models 提出Scene-VLM,利用视觉-语言模型进行多模态视频场景分割,显著提升长视频理解能力。 multimodal
4 A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets 提出A-QCF-Net,解决非配对CT/MRI肝脏肿瘤分割问题,实现跨模态知识迁移。 multimodal
5 UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture UniPercept:面向美学、质量、结构和纹理的统一感知级图像理解框架 large language model multimodal visual grounding
6 Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification 冻结编码器的参数高效训练提升多模态胸部X光分类性能 multimodal
7 The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency 提出B&J骨科临床推理基准,揭示视觉-语言模型在临床能力上的显著差距 large language model foundation model multimodal
8 TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant 提出TAME框架及LCMP基准,解决多模态大语言模型在个性化长程对话中的难题。 large language model multimodal
9 FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound 提出Fetal-Gauge胎儿超声视觉-语言基准,评估并提升VLM在产前诊断中的性能。 multimodal visual grounding
10 LLM-Free Image Captioning Evaluation in Reference-Flexible Settings 提出无LLM的图像描述评估指标Pearl,提升参考灵活场景下的评估性能 large language model
11 Hierarchy-Aware Fine-Tuning of Vision-Language Models 提出层级感知微调框架,高效提升视觉-语言模型在层级分类任务上的性能。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
12 Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction 提出时空解耦混合专家网络ST-MoE,用于提升多人运动预测的精度与效率。 Mamba human motion motion prediction
13 BertsWin: Resolving Topological Sparsity in 3D Masked Autoencoders via Component-Balanced Structural Optimization BertsWin:通过组件平衡结构优化解决3D掩码自编码器中的拓扑稀疏性问题 masked autoencoder MAE spatial relationship
14 Self-Supervised Masked Autoencoders with Dense-Unet for Coronary Calcium Removal in limited CT Data 提出Dense-MAE,利用自监督学习去除有限CT数据中的冠状动脉钙化伪影。 masked autoencoder MAE
15 UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation 提出UltraLBM-UNet,用于资源受限场景下的高精度皮肤病灶分割 Mamba distillation
16 AstraNav-World: World Model for Foresight Control and Consistency AstraNav-World:用于预见性控制和一致性的世界模型,提升具身导航性能。 world model
17 Towards Long-window Anchoring in Vision-Language Model Distillation 提出LAid,通过知识蒸馏提升视觉-语言模型长文本窗口处理能力。 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
18 Learning Dynamic Scene Reconstruction with Sinusoidal Geometric Priors 提出SirenPose,结合正弦几何先验学习动态场景重建,提升时空一致性。 scene reconstruction spatiotemporal
19 ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields ShinyNeRF:提出一种神经辐射场方法,用于数字化各向异性外观。 NeRF neural radiance field
20 A Three-Level Alignment Framework for Large-Scale 3D Retrieval and Controlled 4D Generation 提出Uni4D框架以解决大规模3D检索与4D生成问题 open-vocabulary open vocabulary multimodal
21 Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective 从动态系统角度分析VGGT中Attention崩塌机制,揭示其根本原因。 VGGT

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
22 From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement 提出ALARM框架,利用LMM Agent自提升实现无标签有害Meme检测。 HuMoR multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 GeCo: A Differentiable Geometric Consistency Metric for Video Generation 提出GeCo,用于检测视频生成中几何形变和遮挡不一致性的人工痕迹。 geometric consistency

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
24 Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG 提出改进的TSception模型,用于脑电信号分析驾驶员疲劳和精神负荷,提升稳定性和泛化性。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页