| 1 |
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning |
研究表明思维链主要提升数学和符号推理能力,其他任务收益有限 |
large language model chain-of-thought |
|
|
| 2 |
Human-like Affective Cognition in Foundation Models |
提出情感认知评估框架,验证大型模型在理解人类情感方面的能力 |
foundation model chain-of-thought |
|
|
| 3 |
The Factuality of Large Language Models in the Legal Domain |
评估大语言模型在法律领域的知识库真实性,并提出改进方法。 |
large language model |
|
|
| 4 |
Systematic Characterization of the Effectiveness of Alignment in Large Language Models for Categorical Decisions |
提出对齐合规指数ACI,系统评估大语言模型在分类决策中的对齐效果 |
large language model |
|
|
| 5 |
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models |
提出基于大语言模型生成心理测量的价值观评估方法GPV,用于衡量人类和AI的价值观。 |
large language model |
|
|
| 6 |
Using Large Language Models to Generate Clinical Trial Tables and Figures |
利用大型语言模型自动生成临床试验表格和图表,提升报告效率。 |
large language model |
|
|
| 7 |
Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources |
提出一种低资源日语医疗大语言模型,性能媲美十倍参数量级模型 |
large language model |
✅ |
|
| 8 |
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning |
提出TART:一个开源的、工具增强的、可解释的表格推理框架 |
large language model chain-of-thought |
✅ |
|
| 9 |
MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning |
MeTHanol:中间层思维、解码和引导推理的模块化思维语言模型 |
large language model |
✅ |
|
| 10 |
Efficacy of Synthetic Data as a Benchmark |
评估LLM生成合成数据作为NLP任务基准的有效性,揭示其在不同任务上的表现差异。 |
large language model |
|
|
| 11 |
VERA: Validation and Enhancement for Retrieval Augmented systems |
VERA:面向检索增强系统的验证与增强框架,提升生成精度 |
large language model |
|
|
| 12 |
Local Explanations and Self-Explanations for Assessing Faithfulness in black-box LLMs |
提出基于局部扰动和自解释的LLM忠实性评估方法 |
large language model |
|
|
| 13 |
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts |
MEOW:通过反向事实和记忆监督的大语言模型卸载学习方法 |
large language model |
|
|
| 14 |
Gender Representation and Bias in Indian Civil Service Mock Interviews |
揭示印度公务员模拟面试中存在的性别偏见,并提出新的数据集。 |
large language model |
|
|
| 15 |
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning |
MAgICoRe:多智能体迭代由粗到精推理框架,提升LLM数学问题求解能力 |
large language model |
|
|
| 16 |
Sampling Latent Material-Property Information From LLM-Derived Embedding Representations |
利用大语言模型嵌入表征采样潜在材料属性信息 |
large language model |
|
|
| 17 |
LLMs in Education: Novel Perspectives, Challenges, and Opportunities |
探讨大型语言模型在教育领域的应用、挑战与机遇 |
large language model |
|
|
| 18 |
LLMs + Persona-Plug = Personalized LLMs |
提出个性化LLM模型以解决用户偏好多样性问题 |
large language model |
|
|
| 19 |
Enabling Real-Time Conversations with Minimal Training Costs |
提出一种低成本双工解码方法,提升LLM实时对话能力 |
large language model |
|
|
| 20 |
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing |
提出RoleKE-Bench,揭示并缓解LLM角色扮演中角色知识错误检测的挑战 |
large language model |
|
|
| 21 |
Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation |
提出利用LLM进行API交互的框架,实现自然语言分类和合成数据生成。 |
large language model |
|
|
| 22 |
"A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs |
研究表明LLM对社会规范的理解受人物角色影响,不同角色导致文化规范解释差异 |
large language model |
|
|