| 19 |
Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift |
提出多模态蒸馏方法,提升3D语义分割在域偏移下的性能 |
distillation foundation model multimodal |
✅ |
|
| 20 |
UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification |
提出UAM:一种用于肿瘤细胞分类的多模态统一注意力-Mamba骨干网络 |
Mamba foundation model multimodal |
|
|
| 21 |
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models |
提出MMT-ARD,通过多模态多教师对抗蒸馏提升视觉-语言模型的鲁棒性。 |
distillation multimodal |
✅ |
|
| 22 |
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle |
提出FireScope,利用链式思考Oracle预测野火风险,提升跨洲泛化能力与可解释性。 |
reinforcement learning multimodal chain-of-thought |
|
|
| 23 |
Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations |
利用BiomedCLIP,针对不同人群的乳腺影像可解释AI方法 |
contrastive learning foundation model |
|
|
| 24 |
MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment |
提出MCMoE模型,通过专家混合补全缺失模态,提升不完整多模态动作质量评估性能。 |
representation learning multimodal |
✅ |
|
| 25 |
Counterfactual World Models via Digital Twin-conditioned Video Diffusion |
提出CWMDT,通过数字孪生和视频扩散模型实现反事实世界建模 |
world model large language model |
|
|
| 26 |
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning |
MolSight:结合SMILES预训练、多粒度学习和强化学习的光学化学结构识别方法 |
reinforcement learning large language model |
|
|
| 27 |
RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion |
提出RL-AD-Net,通过强化学习引导潜在空间自适应位移,优化点云补全的局部几何一致性。 |
reinforcement learning geometric consistency |
|
|
| 28 |
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios |
提出R-AVST数据集和AVST-Zero模型,增强视频LLM在复杂视听场景下的时空推理能力 |
reinforcement learning large language model multimodal |
|
|
| 29 |
Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers |
SPECTRE:用于体积CT图像Transformer的自监督和跨模态预训练 |
contrastive learning distillation foundation model |
|
|
| 30 |
Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets? |
Target-Bench:评估世界模型在语义目标下的无地图路径规划能力 |
world model |
|
|
| 31 |
Importance-Weighted Non-IID Sampling for Flow Matching Models |
提出重要性加权非独立同分布采样方法,提升Flow Matching模型输出期望的估计精度。 |
flow matching |
|
|
| 32 |
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination |
提出Video-R4,通过视觉沉思增强文本丰富视频推理能力 |
reinforcement learning multimodal |
✅ |
|
| 33 |
DReX: Pure Vision Fusion of Self-Supervised and Convolutional Representations for Image Complexity Prediction |
DReX:融合自监督和卷积表征的纯视觉图像复杂度预测模型 |
MAE multimodal |
|
|
| 34 |
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models |
提出Neighbor GRPO,通过对比学习优化Flow模型,提升生成质量与效率。 |
flow matching contrastive learning |
|
|
| 35 |
Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton |
提出Parts-Mamba模型,增强骨骼动作识别在遮挡场景下的性能 |
Mamba |
|
|