| 1 |
Panacea: A foundation model for clinical trial search, summarization, design, and recruitment |
提出Panacea临床试验基础模型,解决临床试验多任务难题,提升搜索、总结、设计和招募效率。 |
large language model foundation model |
|
|
| 2 |
Autonomous Prompt Engineering in Large Language Models |
提出APET,利用GPT-4自主进行提示工程,提升LLM在特定任务上的性能 |
large language model chain-of-thought |
|
|
| 3 |
CharED: Character-wise Ensemble Decoding for Large Language Models |
提出CharED,一种字符级集成解码方法,提升大语言模型在多领域的性能。 |
large language model |
|
|
| 4 |
Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback |
提出基于关系元组、验证与动态反馈的框架,提升大语言模型算术推理能力 |
large language model |
✅ |
|
| 5 |
Accelerating Clinical Evidence Synthesis with Large Language Models |
TrialMind:利用大型语言模型加速临床证据合成,提升效率与准确性 |
large language model |
|
|
| 6 |
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language |
构建Persuasive-Pairs数据集,评估并基准测试大型语言模型生成说服性语言的能力 |
large language model |
|
|
| 7 |
Using Large Language Models in Public Transit Systems, San Antonio as a case study |
利用大型语言模型优化公共交通系统:以圣安东尼奥为例 |
large language model |
|
|
| 8 |
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment |
研究表明对齐后的LLM行为可由基础模型通过上下文学习复现 |
large language model |
✅ |
|
| 9 |
Evaluating Large Language Models with Psychometrics |
提出心理测量基准,评估大型语言模型在心理学维度上的表现与一致性 |
large language model |
|
|
| 10 |
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference |
CoSafe:评估多轮对话指代消解中大型语言模型的安全性 |
large language model |
|
|
| 11 |
Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models |
对大型语言模型中的人格进行自评、展示与识别的综述 |
large language model |
|
|
| 12 |
Multi-property Steering of Large Language Models with Dynamic Activation Composition |
提出动态激活组合方法,实现大语言模型多属性可控生成,提升流畅性。 |
large language model |
|
|
| 13 |
Entropy-Based Decoding for Retrieval-Augmented Large Language Models |
提出基于熵的解码方法,解决检索增强大语言模型中的干扰问题 |
large language model |
|
|
| 14 |
Enhancing Tool Retrieval with Iterative Feedback from Large Language Models |
提出基于大语言模型迭代反馈的工具检索方法,提升复杂场景下的工具选择准确性 |
large language model |
|
|
| 15 |
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting |
提出MoE-CT架构,解决LLM在持续训练中低资源语言性能下降问题 |
large language model |
|
|
| 16 |
Generative AI Systems: A Systems-based Perspective on Generative AI |
提出GenAISys:一个基于系统的视角来研究通用人工智能,关注多模态处理、内容生成和决策。 |
large language model multimodal |
|
|
| 17 |
Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective |
从特征解耦角度重新审视单义性,提出鼓励单义性提升模型能力 |
large language model |
|
|
| 18 |
Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets |
研究审查与领域自适应对机器生成推文检测的影响,揭示“伪装者”的威胁。 |
large language model |
|
|
| 19 |
Crafting Customisable Characters with LLMs: A Persona-Driven Role-Playing Agent Framework |
提出SimsChat框架,利用LLM创建可定制的角色扮演智能体 |
large language model |
✅ |
|
| 20 |
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making |
提出ThroughCut异常检测技术,评估LLM在多领域微调前的基准性能 |
large language model |
|
|
| 21 |
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems |
RAGBench:可解释的检索增强生成系统评测基准 |
large language model |
✅ |
|
| 22 |
X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation |
提出Layman's RRG框架,解决放射报告生成中评估鲁棒性不足和患者理解困难的问题 |
multimodal |
|
|
| 23 |
The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators |
ALCHEmist:通过生成程序自动标注数据,成本仅为LLM标注的1/500 |
large language model |
|
|
| 24 |
Following Length Constraints in Instructions |
提出长度约束指令跟随模型,解决现有模型长度偏见问题,并在长度控制评估中超越GPT4等模型。 |
instruction following |
|
|
| 25 |
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users |
揭示LLM对弱势用户群体的信息偏差:英语水平、教育程度与来源国的影响 |
large language model |
|
|
| 26 |
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation |
提出VarBench,通过动态变量扰动实现对语言模型的稳健基准测试。 |
large language model |
|
|
| 27 |
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach |
提出一种基于音频-文本Transformer的多模态方法,用于检测语音中的奉承行为。 |
multimodal |
|
|
| 28 |
LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic |
LLM-ARC:利用自动推理评论家增强LLM的逻辑推理能力 |
large language model |
|
|
| 29 |
Banishing LLM Hallucinations Requires Rethinking Generalization |
重新思考泛化能力以消除大语言模型幻觉 |
large language model |
|
|
| 30 |
"Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations? |
利用少量解释,LLM可近似自然语言推理中人类判断分布,提升标注效率。 |
large language model |
|
|
| 31 |
LongIns: A Challenging Long-context Instruction-based Exam for LLMs |
LongIns:一个用于评估LLM长文本理解与推理能力的指令型考试基准 |
large language model |
|
|
| 32 |
Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats |
提出IoT防御的文本到SQL框架,用于查询和分类IoT威胁,并构建了相关数据集。 |
large language model |
|
|
| 33 |
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts |
提出FrenchToxicityPrompts,用于评估和缓解法语文本中的毒性问题。 |
large language model |
|
|
| 34 |
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale |
提出FineWeb数据集,提升大规模语言模型预训练数据质量与性能 |
large language model |
|
|
| 35 |
Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft |
利用检索增强的代码生成提升Minecraft情境动作生成性能 |
large language model |
|
|
| 36 |
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark |
提出INVALSI基准以评估LLMs在意大利语的能力 |
large language model |
|
|