| 1 |
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models |
SIFo基准:评估大型语言模型在顺序指令遵循方面的能力 |
large language model instruction following |
|
|
| 2 |
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models |
CMMaTH:构建中文多模态数学能力评估基准,促进基础模型发展 |
large language model foundation model multimodal |
|
|
| 3 |
Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model |
提出渐进式低秩分解(PLRD)方法,高效压缩并生成多种尺寸的大语言模型。 |
large language model foundation model |
|
|
| 4 |
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs |
提出SK-VQA:大规模合成知识生成数据集,用于训练上下文增强的多模态LLM |
multimodal |
|
|
| 5 |
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models |
ToolBeHonest:一个用于工具增强大语言模型的多层次幻觉诊断基准 |
large language model |
|
|
| 6 |
EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models |
EHRmonize:利用大语言模型从电子病历中提取医学概念,提升数据整合效率。 |
large language model |
|
|
| 7 |
LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models |
提出LEMoe以解决大型语言模型的终身编辑问题 |
large language model |
|
|
| 8 |
Simulating Financial Market via Large Language Model based Agents |
提出基于大语言模型Agent的金融市场模拟器ASFM,用于经济研究。 |
large language model |
|
|
| 9 |
Belief Revision: The Adaptability of Large Language Models Reasoning |
提出Belief-R数据集与Delta推理框架,评估大语言模型在信息演变下的信念修正能力 |
large language model |
|
|
| 10 |
Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach |
利用GPT-4检测戒电子烟意图:一种自动数据标注方法探索 |
large language model chain-of-thought |
|
|
| 11 |
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models |
提出GlobalRG基准,评估视觉-语言模型在多文化理解上的能力 |
visual grounding |
|
|
| 12 |
SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison |
SMLT-MUGC:基于文本长度的机器与用户生成内容检测及对比研究 |
large language model |
|
|
| 13 |
Scaling Synthetic Data Creation with 1,000,000,000 Personas |
提出Persona Hub,利用十亿级Persona驱动LLM生成多样化合成数据 |
large language model |
|
|
| 14 |
Evaluating Human Alignment and Model Faithfulness of LLM Rationale |
评估LLM推理的对齐性和忠实性:提示 vs. 归因方法 |
large language model |
|
|
| 15 |
AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions |
AnomaLLMy:通过低置信度单token预测检测黑盒LLM中的异常token |
large language model |
|
|
| 16 |
ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation |
ITERTL:一种迭代框架,用于微调LLM以生成RTL代码 |
large language model |
|
|
| 17 |
Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring |
提出基于思维树偏好优化的LLM校准方法,提升科学问题评分中推理生成质量。 |
large language model |
✅ |
|
| 18 |
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations |
NLPerturbator:研究代码大语言模型对自然语言变体的鲁棒性 |
large language model |
|
|
| 19 |
Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation |
提出KELLER以解决法律案例检索中的知识缺失问题 |
large language model |
|
|
| 20 |
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness |
提出MoICE,增强LLM在长文本中对不同位置信息的感知能力 |
large language model |
|
|