| 1 |
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models |
研究自提升大语言模型校准问题,提出迭代校准方法降低过自信。 |
large language model chain-of-thought |
|
|
| 2 |
CoLa: Learning to Interactively Collaborate with Large Language Models |
提出CoLa,通过自指导学习训练AI引导者,提升LLM在复杂语言任务中的协作能力 |
large language model |
|
|
| 3 |
Generative Evaluation of Complex Reasoning in Large Language Models |
提出KUMO:一种生成式评估框架,用于评估大型语言模型中的复杂推理能力。 |
large language model |
|
|
| 4 |
AD-GPT: Large Language Models in Alzheimer's Disease |
AD-GPT:面向阿尔茨海默病研究的领域特定大型语言模型 |
large language model |
|
|
| 5 |
Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models |
提出TACO Prompting框架,用于提升LLM在医学症状编码任务中的准确性和灵活性。 |
large language model |
|
|
| 6 |
Bias in Large Language Models Across Clinical Applications: A Systematic Review |
系统性综述揭示临床大语言模型中普遍存在的偏见及其对患者的潜在危害 |
large language model |
|
|
| 7 |
Noiser: Bounded Input Perturbations for Attributing Large Language Models |
提出Noiser,通过有界输入扰动提升大语言模型归因的忠实性和可回答性 |
large language model |
|
|
| 8 |
Extending CREAMT: Leveraging Large Language Models for Literary Translation Post-Editing |
利用大型语言模型提升文学翻译后编辑效率,保持创作风格 |
large language model |
|
|
| 9 |
Ontologies in Design: How Imagining a Tree Reveals Possibilites and Assumptions in Large Language Models |
提出基于本体论的设计框架,用于分析和改进大型语言模型。 |
large language model |
|
|
| 10 |
Do "New Snow Tablets" Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs |
揭示LLM在中药成分识别中过度依赖名称的缺陷,并提出RAG方法。 |
large language model |
|
|
| 11 |
Survey and Experiments on Mental Disorder Detection via Social Media: From Large Language Models and RAG to Agents |
综述与实验:基于社交媒体,利用LLM、RAG和Agent进行心理障碍检测 |
large language model |
|
|
| 12 |
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context |
提出多语言对齐调优方法以解决单语偏见问题 |
large language model instruction following |
|
|
| 13 |
CoTAL: Human-in-the-Loop Prompt Engineering for Generalizable Formative Assessment Scoring |
CoTAL:人机协同提示工程提升通用形成性评估评分 |
large language model chain-of-thought |
|
|
| 14 |
Improving Harmful Text Detection with Joint Retrieval and External Knowledge |
提出联合检索框架,融合知识图谱与预训练模型,提升有害文本检测性能。 |
large language model multimodal |
|
|
| 15 |
MegaMath: Pushing the Limits of Open Math Corpora |
MegaMath:构建大规模开放数学语料库,推动数学LLM发展 |
large language model |
|
|
| 16 |
Cultural Learning-Based Culture Adaptation of Language Models |
提出CLCA框架,通过文化学习提升语言模型在不同文化价值观上的对齐 |
large language model |
|
|
| 17 |
Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs |
研究表明LLM工作记忆能力强,但执行功能和问题解决能力不足 |
large language model |
|
|
| 18 |
HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse |
HyperRAG:通过重排序器KV-Cache复用,提升检索增强生成质量-效率权衡 |
large language model |
|
|
| 19 |
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study |
提出自去噪方法,提升大语言模型对指令扰动的鲁棒性 |
large language model |
|
|
| 20 |
Why do LLMs attend to the first token? |
研究揭示LLM关注首个token的机制,避免过度混合是关键 |
large language model |
|
|
| 21 |
SAFER: Advancing Safety Alignment via Efficient Ex-Ante Reasoning |
SAFER:通过高效的事前推理提升大型语言模型的安全性对齐 |
large language model |
|
|
| 22 |
LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems |
探索性研究:利用大型语言模型解决费米问题中的复杂推理任务 |
large language model |
|
|
| 23 |
Language Models reach higher Agreement than Humans in Historical Interpretation |
大语言模型在历史解释上比人类达成更高一致性 |
large language model |
|
|
| 24 |
Leveraging LLM For Synchronizing Information Across Multilingual Tables |
利用大型语言模型同步多语言表格信息,提升低资源语言维基百科内容质量 |
large language model |
|
|
| 25 |
DaKultur: Evaluating the Cultural Awareness of Language Models for Danish with Native Speakers |
DaKultur:利用丹麦本土人士评估语言模型对丹麦文化的感知能力 |
large language model |
|
|
| 26 |
The quasi-semantic competence of LLMs: a case study on the part-whole relation |
探讨大型语言模型的部分-整体关系理解能力 |
large language model |
|
|
| 27 |
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence |
从机制角度剖析后训练对LLM的影响:知识、真实性、拒绝与置信度 |
large language model |
✅ |
|
| 28 |
Measurement of LLM's Philosophies of Human Nature |
提出M-PHNS评估LLM的人性哲学,并用精神循环学习提升其对人类的信任。 |
large language model |
✅ |
|
| 29 |
LLMs as Deceptive Agents: How Role-Based Prompting Induces Semantic Ambiguity in Puzzle Tasks |
利用角色扮演提示,研究LLM在谜题任务中产生语义歧义的欺骗行为 |
large language model |
|
|