| 1 |
SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation |
提出SSE框架,通过语义选择和增强解决工业级数据同化中的数据过载问题 |
foundation model multimodal |
|
|
| 2 |
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model |
提出MGLMM,通过指令引导实现多粒度分割和描述,解决现有LMMs在细粒度理解和分割上的局限性。 |
large language model multimodal |
|
|
| 3 |
Validation & Exploration of Multimodal Deep-Learning Camera-Lidar Calibration models |
研究多模态深度学习模型,实现相机-激光雷达的动态标定 |
multimodal |
|
|
| 4 |
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension |
MaPPER:多模态先验引导的参数高效微调方法,用于指代表达式理解 |
multimodal |
✅ |
|
| 5 |
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder |
提出基于多模态融合的自闭症儿童临床视频理解方法 |
large language model foundation model multimodal |
|
|
| 6 |
Portrait Video Editing Empowered by Multimodal Generative Priors |
PortraitGen:基于多模态生成先验的人像视频编辑方法,实现一致且富有表现力的风格化。 |
multimodal |
✅ |
|
| 7 |
Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis |
提出物理信息潜在扩散模型,用于多模态脑部MRI合成,解决模态缺失问题。 |
multimodal |
|
|
| 8 |
AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity |
AVG-LLaVA:提出一种自适应视觉粒度的高效大型多模态模型 |
multimodal |
|
|
| 9 |
A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing |
提出自适应微调算法,用于遥感多模态模型的高质量数据集选择与优化。 |
multimodal |
|
|
| 10 |
FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs |
FullAnno:用于增强MLLM图像理解能力的数据引擎 |
large language model multimodal |
|
|
| 11 |
TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions |
提出TalkMosaic,通过多模态LLM问答交互实现交互式照片马赛克 |
multimodal |
|
|
| 12 |
Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval |
提出一种高效的通用图像检索特征提取框架,解决领域泛化性问题。 |
foundation model |
✅ |
|