| 1 |
From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion |
综述:AI驱动的定量遥感反演,从物理模型到基础模型 |
foundation model multimodal |
|
|
| 2 |
Unreal is all you need: Multimodal ISAC Data Simulation with Only One Engine |
Great-X:基于Unreal Engine的多模态ISAC数据高效仿真平台 |
foundation model multimodal |
✅ |
|
| 3 |
F3-Net: Foundation Model for Full Abnormality Segmentation of Medical Images with Flexible Input Modality Requirement |
F3-Net:用于医学图像全异常分割的、支持灵活模态输入的Foundation模型 |
foundation model multimodal |
|
|
| 4 |
Understanding Driving Risks using Large Language Models: Toward Elderly Driver Assessment |
利用大型语言模型理解驾驶风险,探索其在老年驾驶员评估中的应用 |
large language model multimodal |
|
|
| 5 |
Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution Entanglement |
提出SDIR和CADE模块,解决多模态跨癌预后中的单域泛化问题。 |
multimodal |
✅ |
|
| 6 |
Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models |
Raptor:利用预训练2D基础模型,为3D医学体数据生成可扩展的免训练嵌入。 |
foundation model |
|
|
| 7 |
Infinite Video Understanding |
提出无限视频理解概念,旨在突破现有模型在处理无限时长视频时的计算和记忆瓶颈。 |
large language model multimodal |
|
|
| 8 |
Visual Semantic Description Generation with MLLMs for Image-Text Matching |
提出基于MLLM的视觉语义描述生成方法,提升图文匹配性能。 |
large language model multimodal |
✅ |
|
| 9 |
CNeuroMod-THINGS, a densely-sampled fMRI dataset for visual neuroscience |
CNeuroMod-THINGS:一个用于视觉神经科学的密集采样fMRI数据集 |
multimodal |
|
|
| 10 |
From One to More: Contextual Part Latents for 3D Generation |
提出CoPart框架,通过上下文部件潜在表示实现可控3D生成。 |
foundation model |
|
|
| 11 |
DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images |
提出DatasetAgent,一种基于多智能体系统的真实图像数据集自动构建方法。 |
large language model |
|
|
| 12 |
A document is worth a structured record: Principled inductive bias design for document recognition |
提出一种基于结构化记录的文档识别方法,提升复杂文档的识别精度和泛化性。 |
foundation model |
|
|
| 13 |
Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models |
提出MuGCP,通过多模态互指导条件Prompt学习增强视觉-语言模型泛化能力。 |
large language model |
|
|