| 1 |
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges |
提出 ScratchEval 基准,评估大模型在视觉编程中的推理能力 |
multimodal |
✅ |
|
| 2 |
CovidLLM: A Robust Large Language Model with Missing Value Adaptation and Multi-Objective Learning Strategy for Predicting Disease Severity and Clinical Outcomes in COVID-19 Patients |
CovidLLM:一种鲁棒的大语言模型,通过缺失值自适应和多目标学习预测COVID-19患者的疾病严重程度和临床结果。 |
large language model |
|
|
| 3 |
An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation |
针对数据到文本生成,论文深入评估大型语言模型的事实一致性 |
large language model |
|
|
| 4 |
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs |
ContextualLens:利用上下文嵌入增强VLM中的幻觉检测与定位 |
large language model multimodal |
|
|
| 5 |
DIESEL -- Dynamic Inference-Guidance via Evasion of Semantic Embeddings in LLMs |
DIESEL:通过规避LLM语义嵌入实现动态推理引导,提升生成安全性。 |
large language model |
|
|
| 6 |
Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph |
提出Way-to-Specialist框架以解决LLM在专业知识推理中的不足 |
large language model |
|
|
| 7 |
Devising a Set of Compact and Explainable Spoken Language Feature for Screening Alzheimer's Disease |
提出一种基于LLM视觉能力和TF-IDF的紧凑且可解释的口语特征,用于阿尔茨海默病筛查。 |
large language model |
|
|
| 8 |
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification |
MAG-V:多智能体框架用于合成数据生成与轨迹验证,提升Agent性能。 |
large language model |
|
|