| 1 |
Analyzing Large language models chatbots: An experimental approach using a probability test |
通过概率测试分析大型语言模型聊天机器人的逻辑推理能力 |
large language model |
|
|
| 2 |
A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability |
提出S.C.O.R.E.框架,用于评估医疗领域大语言模型的安全性、可靠性和伦理 |
large language model |
|
|
| 3 |
Arabic Automatic Story Generation with Large Language Models |
利用大型语言模型进行阿拉伯语自动故事生成,并构建高质量训练数据集。 |
large language model |
|
|
| 4 |
Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models |
揭示大语言模型中的“知识遮蔽”现象,并提出缓解幻觉的方法 |
large language model |
|
|
| 5 |
Attribute or Abstain: Large Language Models as Long Document Assistants |
提出LAB基准,评估LLM在长文档问答中进行归因的能力,并探索不同归因方法的效果。 |
large language model |
|
|
| 6 |
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training |
综述大规模Web挖掘语料库在大型语言模型预训练中的挑战 |
large language model |
|
|
| 7 |
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models |
提出LLM评估新范式:从基准测试转向问题归因与优化建议 |
large language model |
|
|
| 8 |
Review-LLM: Harnessing Large Language Models for Personalized Review Generation |
提出Review-LLM,利用大语言模型生成个性化商品评论,解决现有方法个性化不足问题。 |
large language model |
|
|
| 9 |
Interpretable Differential Diagnosis with Dual-Inference Large Language Models |
提出Dual-Inf框架,利用双向推理LLM实现可解释的鉴别诊断 |
large language model |
|
|
| 10 |
ETM: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models |
提出ETM指标,提升大语言模型Text-to-SQL任务的评测可靠性 |
large language model |
|
|
| 11 |
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) |
针对大语言模型的事实依据与评估:实际挑战与经验总结综述 |
large language model |
|
|
| 12 |
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization |
提出RoLoRA以解决LoRA方法中的激活异常问题 |
large language model multimodal |
✅ |
|
| 13 |
Training on the Test Task Confounds Evaluation and Emergence |
揭示测试任务训练对大语言模型评估和涌现能力的影响,并提出校正方法 |
large language model |
|
|
| 14 |
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance |
Rel-A.I.:一种以交互为中心的框架,用于评估人类对语言模型输出的依赖程度 |
large language model |
|
|
| 15 |
Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing |
提出基于LLM的虚拟咨询师,用于酒精使用问题的动机访谈。 |
large language model |
|
|
| 16 |
FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios |
FsPONER:针对领域特定场景,优化小样本提示学习的命名实体识别方法 |
large language model |
|
|
| 17 |
On Leakage of Code Generation Evaluation Datasets |
揭示代码生成评估数据集泄露问题,并提出新的无污染基准测试集LBPP。 |
large language model |
✅ |
|
| 18 |
Beyond Fixed Length: Bucket Pre-training is All You Need |
提出多桶数据预训练方法,解决LLM定长预训练的数据质量和效率问题 |
large language model |
|
|
| 19 |
Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture |
提出多语言混合评估方法,揭示LLM安全对齐在多语言环境下的脆弱性 |
large language model |
|
|
| 20 |
Probability of Differentiation Reveals Brittleness of Homogeneity Bias in GPT-4 |
利用区分概率揭示GPT-4中同质性偏差的脆弱性 |
large language model |
|
|