| 1 |
The Dark Patterns of Personalized Persuasion in Large Language Models: Exposing Persuasive Linguistic Features for Big Five Personality Traits in LLMs Responses |
揭示大型语言模型中基于人格特质的个性化说服的“黑暗模式” |
large language model |
|
|
| 2 |
Exploring the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal |
通过MskQA和MskCal评估LLM在掩码文本处理中的局限性 |
large language model |
|
|
| 3 |
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators |
评估大型语言模型在临床决策支持中的应用:以医学计算器选择为例 |
large language model |
|
|
| 4 |
Evaluating Large Language Model Capability in Vietnamese Fact-Checking Data Generation |
评估大型语言模型在越南语事实核查数据生成中的能力 |
large language model |
|
|
| 5 |
Assessing Open-Source Large Language Models on Argumentation Mining Subtasks |
评估开源大语言模型在论证挖掘子任务上的能力 |
large language model |
|
|
| 6 |
Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages |
评估并调整大型语言模型以表示低资源语言中的民间故事 |
large language model |
|
|
| 7 |
Identifying and Decomposing Compound Ingredients in Meal Plans Using Large Language Models |
利用大型语言模型识别和分解膳食计划中的复合成分 |
large language model |
|
|
| 8 |
LBPE: Long-token-first Tokenization to Improve Large Language Models |
提出LBPE:一种长token优先的分词方法,以改善大型语言模型性能。 |
large language model |
|
|
| 9 |
Benchmarking Distributional Alignment of Large Language Models |
构建基准测试,评估大型语言模型在模拟特定人群观点分布上的对齐能力 |
large language model |
|
|
| 10 |
Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning |
提出Gap-Filling Prompting,提升小模型在数学推理中的代码辅助能力 |
large language model chain-of-thought |
|
|
| 11 |
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks |
发布Dynamic-SUPERB Phase-2:一个协同扩展的语音语言模型能力评估基准,包含180个任务 |
foundation model multimodal |
✅ |
|
| 12 |
Reasoning Robustness of LLMs to Adversarial Typographical Errors |
提出对抗性拼写错误攻击方法,评估LLM推理的鲁棒性 |
large language model chain-of-thought |
|
|
| 13 |
FactLens: Benchmarking Fine-Grained Fact Verification |
FactLens:提出细粒度事实核查基准,解决LLM幻觉问题中传统方法的不足。 |
large language model |
|
|
| 14 |
BERTrend: Neural Topic Modeling for Emerging Trends Detection |
BERTrend:用于新兴趋势检测的神经主题模型 |
large language model |
|
|
| 15 |
RefreshKV: Updating Small KV Cache During Long-form Generation |
RefreshKV:通过动态更新小KV缓存提升长文本生成性能 |
large language model |
|
|
| 16 |
Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? |
利用大语言模型作为政治真相的可靠标注器,解决政治虚假信息检测问题。 |
large language model |
|
|
| 17 |
Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 |
Papelo团队提出多跳证据追踪方法,结合LLM推理和搜索引擎检索,提升FEVER 2024任务的声明验证效果。 |
large language model |
|
|
| 18 |
One Small and One Large for Document-level Event Argument Extraction |
针对文档级事件论元抽取,提出基于小型和大型语言模型的双重优化方法。 |
large language model |
✅ |
|
| 19 |
SSSD: Simply-Scalable Speculative Decoding |
提出SSSD:一种简单可扩展的无训练推测解码方法,加速大语言模型推理。 |
large language model |
|
|
| 20 |
Assessing the Answerability of Queries in Retrieval-Augmented Code Generation |
提出RaCGEval评估基准,用于评估检索增强代码生成中查询的可回答性。 |
large language model |
|
|
| 21 |
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking |
FIRST:单Token解码加速Listwise重排序,并验证其有效性和效率。 |
large language model |
|
|
| 22 |
KyrgyzNLP: Challenges, Progress, and Future |
关注吉尔吉斯语NLP:挑战、进展与未来展望 |
large language model |
|
|
| 23 |
EUREKHA: Enhancing User Representation for Key Hackers Identification in Underground Forums |
EUREKHA:通过增强用户表示,识别地下论坛中的关键黑客。 |
large language model |
|
|
| 24 |
VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM |
VISTA:利用LLM为数学问题生成定制化自动化视觉集成系统 |
large language model |
|
|
| 25 |
Towards Low-Resource Harmful Meme Detection with LMM Agents |
提出基于LMM Agent的框架,解决低资源有害Meme检测问题 |
multimodal |
|
|