| 1 |
Prompt Perturbations Reveal Human-Like Biases in Large Language Model Survey Responses |
提示扰动揭示大型语言模型在调查问卷中类人的偏差 |
large language model |
|
|
| 2 |
Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models |
利用大型语言模型进行政治人物情感分析,提出一种测量精英极化的新方法 |
large language model |
|
|
| 3 |
Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review |
利用大型语言模型和范围界定审查协议加速数据提取 |
large language model |
|
|
| 4 |
Enhancing Food-Domain Question Answering with a Multimodal Knowledge Graph: Hybrid QA Generation and Diversity Analysis |
提出融合多模态知识图谱的食物领域问答框架,提升生成质量与多样性 |
multimodal |
|
|
| 5 |
Large Language Model for Extracting Complex Contract Information in Industrial Scenes |
提出一种基于大语言模型的工业场景复杂合同信息抽取方法 |
large language model |
|
|
| 6 |
Integrating External Tools with Large Language Models to Improve Accuracy |
提出Athena框架,集成外部工具显著提升LLM在教育场景下的问题解答准确率 |
large language model |
|
|
| 7 |
InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior |
InvestAlign:解决羊群效应下LLM在投资者决策对齐中的数据稀缺问题 |
large language model |
✅ |
|
| 8 |
ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining |
ixi-GEN:通过领域自适应持续预训练提升工业界小规模LLM的效率 |
large language model foundation model |
|
|
| 9 |
CRISP: Complex Reasoning with Interpretable Step-based Plans |
CRISP:通过可解释的步骤计划进行复杂推理,提升数学推理和代码生成能力 |
large language model chain-of-thought |
|
|
| 10 |
Frontier LLMs Still Struggle with Simple Reasoning Tasks |
前沿大语言模型在简单推理任务上仍面临挑战 |
large language model |
|
|
| 11 |
Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation |
提出多智能体检索增强框架,用于生成针对健康虚假信息的循证反驳言论 |
large language model |
|
|
| 12 |
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains |
SynthTextEval:面向高风险领域的合成文本生成与评估工具包 |
large language model |
|
|
| 13 |
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs |
研究揭示:LLM认知偏差主要源于预训练,微调影响有限 |
large language model |
|
|
| 14 |
Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues |
探索LLM在对话中预测导师策略和学生表现的能力 |
large language model |
|
|
| 15 |
MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction |
提出MultiJustice数据集,用于评估LLM在多被告、多罪名法律预测中的性能 |
large language model |
✅ |
|
| 16 |
Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights |
提出开放源代码AI评估库管理框架以应对评估挑战 |
large language model |
|
|
| 17 |
Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework |
提出基于语义熵引导的自适应终止框架,提升多轮并行推理效率。 |
large language model |
|
|
| 18 |
RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation |
针对知识图谱增强的RAG系统,提出一种隐蔽的知识投毒攻击方法。 |
large language model |
|
|
| 19 |
Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams |
提出一种基于SysML的文本到模型自动生成方法,加速工程动力系统设计与部署。 |
large language model |
|
|
| 20 |
AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research |
AblationBench:用于评估AI辅助消融实验规划的基准测试套件 |
chain-of-thought |
|
|
| 21 |
Checklist Engineering Empowers Multilingual LLM Judges |
提出基于清单工程的CE-Judge框架,赋能多语言LLM评估任务。 |
large language model |
|
|
| 22 |
On the Effect of Uncertainty on Layer-wise Inference Dynamics |
研究表明LLM的不确定性预测对层间推理动态影响较小,但模型能力可能改变这一现象。 |
large language model |
|
|
| 23 |
A Mathematical Theory of Discursive Networks |
构建话语网络数学模型,通过互审机制提升大型语言模型的信息可靠性 |
large language model |
|
|
| 24 |
SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers |
SpindleKV:一种平衡浅层和深层的新型KV缓存缩减方法 |
large language model |
|
|
| 25 |
On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks |
研究表明,针对LLM语言置信度的对抗攻击能显著降低其可靠性 |
large language model |
|
|