| 1 |
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step |
T-Eval:一种逐步评估大语言模型工具利用能力的新基准 |
large language model instruction following |
✅ |
|
| 2 |
Large Language Models are Miscalibrated In-Context Learners |
揭示大语言模型上下文学习的校准问题,并提出自集成方法提升校准度 |
large language model instruction following |
|
|
| 3 |
Speech Translation with Large Language Models: An Industrial Practice |
提出LLM-ST:一种基于大型语言模型的语音翻译工业实践方案 |
large language model chain-of-thought |
✅ |
|
| 4 |
From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models |
研究揭示大型语言模型文化价值观偏向,主要受英语国家和经济发达国家影响 |
large language model |
|
|
| 5 |
Deep de Finetti: Recovering Topic Distributions from Large Language Models |
Deep de Finetti:从大型语言模型中恢复主题分布 |
large language model |
|
|
| 6 |
Typhoon: Thai Large Language Models |
Typhoon:为泰语设计的开源大型语言模型,性能媲美GPT-3.5。 |
large language model |
|
|
| 7 |
Experimenting with Large Language Models and vector embeddings in NASA SciX |
NASA SciX利用大语言模型和向量嵌入改进信息检索,降低幻觉 |
large language model |
|
|
| 8 |
Shai: A large language model for asset management |
提出面向资产管理行业的10B级大语言模型Shai,提升领域任务性能。 |
large language model |
|
|
| 9 |
Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models |
利用心理测量学方法探索大语言模型的多面人格特性 |
large language model |
|
|
| 10 |
Developing Interactive Tourism Planning: A Dialogue Robot System Powered by a Large Language Model |
提出基于大语言模型的交互式旅游规划对话机器人系统,提升旅行规划效率。 |
large language model |
|
|
| 11 |
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models |
提出BIPIA基准测试集,评估并防御大语言模型中的间接提示注入攻击 |
large language model |
|
|
| 12 |
Context-aware Decoding Reduces Hallucination in Query-focused Summarization |
提出上下文感知解码以减少查询导向摘要中的幻觉问题 |
large language model |
✅ |
|
| 13 |
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion |
参数高效微调实现LLM在文本输入中的可扩展个性化:以缩写扩展为例 |
large language model |
|
|
| 14 |
ChatGPT as a commenter to the news: can LLMs generate human-like opinions? |
评估ChatGPT生成类人新闻评论的能力:区分机器与人类评论仍具挑战 |
large language model |
|
|
| 15 |
Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations |
系统性评测面向任务型对话系统:综述评估指标、结构及其操作化方法 |
large language model |
|
|
| 16 |
SimLM: Can Language Models Infer Parameters of Physical Systems? |
SimLM:探究大语言模型在物理系统参数推断中的能力与局限性 |
large language model |
|
|
| 17 |
L-TUNING: Synchronized Label Tuning for Prompt and Prefix in LLMs |
L-Tuning:一种面向LLM提示和前缀同步标签调优方法,提升NLI任务效率与精度。 |
large language model |
|
|