| 1 |
PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents |
提出PAL-UI框架,通过主动回溯机制提升视觉GUI Agent在长程任务中的规划能力。 |
large language model multimodal |
|
|
| 2 |
A Deep Learning Pipeline for Epilepsy Genomic Analysis Using GPT-2 XL and NVIDIA H100 |
提出基于GPT-2 XL和NVIDIA H100的深度学习管线,用于癫痫基因组分析。 |
large language model |
|
|
| 3 |
Solar PV Installation Potential Assessment on Building Facades Based on Vision and Language Foundation Models |
提出SF-SPA框架,利用视觉-语言模型评估建筑立面的光伏安装潜力 |
large language model foundation model |
|
|
| 4 |
From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding |
提出视频到索引知识图谱框架,融合多模态内容分析与理解方法 |
multimodal |
|
|
| 5 |
SPUS: A Lightweight and Parameter-Efficient Foundation Model for PDEs |
SPUS:一种轻量级且参数高效的偏微分方程基础模型 |
foundation model |
|
|
| 6 |
Graph Integrated Multimodal Concept Bottleneck Model |
提出MoE-SGT,通过图Transformer和混合专家模型增强多模态概念瓶颈模型,提升复杂概念推理能力。 |
multimodal |
|
|
| 7 |
Assessing Foundation Models for Mold Colony Detection with Limited Training Data |
利用少量训练数据,评估真菌菌落检测的基础模型性能 |
foundation model |
|
|
| 8 |
CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab? |
CardioBench:评估心动超声影像基础模型泛化能力的标准化基准 |
foundation model |
|
|
| 9 |
Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs |
提出一种免训练的MLLM不确定性引导框架,用于复杂视觉任务。 |
large language model multimodal |
|
|
| 10 |
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories |
提出XMAS方法,通过跨模态对齐轨迹进行视觉语言模型高效数据选择。 |
large language model |
✅ |
|
| 11 |
IMAGEdit: Let Any Subject Transform |
IMAGEdit:提出一种免训练框架,实现任意数量视频主体的外观变换。 |
multimodal |
✅ |
|
| 12 |
KeySG: Hierarchical Keyframe-Based 3D Scene Graphs |
KeySG:基于分层关键帧的3D场景图构建,提升语义丰富性和可扩展性 |
large language model |
|
|
| 13 |
ProtoMask: Segmentation-Guided Prototype Learning |
ProtoMask:提出一种基于分割引导的原型学习方法,提升原型可解释性。 |
foundation model |
✅ |
|
| 14 |
CML-Bench: A Framework for Evaluating and Enhancing LLM-Powered Movie Scripts Generation |
CML-Bench:用于评估和提升大语言模型生成电影剧本的框架 |
large language model |
|
|
| 15 |
Disentangling Foreground and Background for vision-Language Navigation via Online Augmentation |
提出COFA,通过在线增强解耦前景与背景特征,提升视觉语言导航泛化性 |
VLN |
|
|