| 1 |
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments |
AgentClinic:多模态Agent基准测试,评估AI在模拟临床环境中的表现 |
large language model multimodal |
|
|
| 2 |
Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp |
揭示CLIP过滤的数据偏差:DataComp数据集的多模态分析与公平性评估 |
multimodal |
|
|
| 3 |
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots |
Plot2Code:一个综合性的基准测试,用于评估多模态大语言模型从科学绘图中生成代码的能力 |
large language model |
✅ |
|
| 4 |
News Recommendation with Category Description by a Large Language Model |
提出一种基于大语言模型自动生成类别描述的新闻推荐方法,提升推荐效果。 |
large language model |
✅ |
|
| 5 |
Divergent Creativity in Humans and Large Language Models |
对比人类与大语言模型,评估语义发散性以衡量创造力差异 |
large language model |
|
|
| 6 |
LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language |
LlamaTurk:探索低资源语言场景下,开源大语言模型的适配方法 |
large language model |
|
|
| 7 |
UCCIX: Irish-eXcellence Large Language Model |
UCCIX:面向极低资源爱尔兰语的大语言模型持续预训练框架 |
large language model |
|
|
| 8 |
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning |
MuMath-Code:结合工具使用LLM与多视角数据增强提升数学推理能力 |
large language model |
|
|
| 9 |
Evaluating large language models in medical applications: a survey |
综述医学领域大语言模型评估方法,应对医疗信息复杂性挑战。 |
large language model |
|
|
| 10 |
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness |
系统评估检索增强大语言模型在生物医学NLP中的应用、鲁棒性和自知能力 |
large language model |
|
|
| 11 |
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning |
EconLogicQA:经济领域序列推理问答基准,评估大语言模型逻辑能力 |
large language model |
✅ |
|
| 12 |
Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers |
构建俄语多模态科学论文数据集,并测试现有语言模型在自动摘要任务上的性能。 |
multimodal |
✅ |
|
| 13 |
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models |
提出高效多样本推测解码EMS-SD,加速大语言模型推理。 |
large language model |
✅ |
|
| 14 |
Interpreting Latent Student Knowledge Representations in Programming Assignments |
提出InfoOIRT模型,用于解释编程作业中学生的潜在知识表示 |
large language model |
|
|
| 15 |
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition |
PARDEN:通过重复输出来防御大语言模型的越狱攻击 |
large language model |
✅ |
|
| 16 |
Control Token with Dense Passage Retrieval |
通过控制Token增强DPR模型,解决大语言模型中的幻觉问题 |
large language model |
|
|
| 17 |
Many-Shot Regurgitation (MSR) Prompting |
提出Many-Shot Regurgitation (MSR) prompting,用于评估大型语言模型的内容复述风险。 |
large language model |
|
|
| 18 |
OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs |
OpenLLM-Ro:首个开源罗马尼亚语基础及对话大语言模型 |
large language model |
|
|
| 19 |
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation |
MCS-SQL:利用多提示和多项选择提升文本到SQL生成的性能 |
large language model |
|
|