| 1 |
What Do Large Language Models Know? Tacit Knowledge as a Potential Causal-Explanatory Structure |
探讨大语言模型是否具备内隐知识,并将其作为因果解释结构 |
large language model |
|
|
| 2 |
Multilingual Contextualization of Large Language Models for Document-Level Machine Translation |
提出DocBlocks并通过多范式微调,提升LLM在文档级机器翻译中的性能。 |
large language model |
|
|
| 3 |
Large Language Models as Quasi-crystals: Coherence Without Repetition in Generative Text |
将大语言模型类比为准晶:在生成文本中实现无重复的连贯性 |
large language model |
|
|
| 4 |
Waking Up an AI: A Quantitative Framework for Prompt-Induced Phase Transition in Large Language Models |
提出量化框架,研究提示词诱导大语言模型认知相变现象 |
large language model |
|
|
| 5 |
Replicating ReLM Results: Validating Large Language Models with ReLM |
使用形式语言ReLM验证大型语言模型的记忆、偏见和零样本性能 |
large language model |
|
|
| 6 |
Leveraging Large Language Models for Multi-Class and Multi-Label Detection of Drug Use and Overdose Symptoms on Social Media |
利用大型语言模型进行社交媒体上药物滥用和过量症状的多类别和多标签检测 |
large language model |
|
|
| 7 |
An LLM-as-a-judge Approach for Scalable Gender-Neutral Translation Evaluation |
提出基于LLM的性别中立翻译评估方法,提升评估准确性和可扩展性 |
large language model chain-of-thought |
|
|
| 8 |
FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations |
提出基于有限状态机的FiSMiness框架,提升情感支持对话的长期效果。 |
large language model chain-of-thought |
|
|
| 9 |
Memorization vs. Reasoning: Updating LLMs with New Knowledge |
提出KUP基准与MCT训练方法,提升LLM对新知识的记忆与推理能力 |
large language model |
|
|
| 10 |
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment |
研究LLM提示词敏感性对信息检索相关性判断的影响,并提供数据集。 |
large language model |
✅ |
|
| 11 |
BitNet b1.58 2B4T Technical Report |
BitNet b1.58:首个开源20亿参数规模的1-bit大语言模型,兼顾性能与效率。 |
large language model |
|
|
| 12 |
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation |
提出基于熵引导的水印方案,提升LLM文本生成的可追溯性和鲁棒性 |
large language model |
|
|
| 13 |
Gauging Overprecision in LLMs: An Empirical Study |
提出评估LLM过度精确性的框架,揭示其在数值任务中的不确定性校准问题 |
large language model |
|
|
| 14 |
SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes |
Mu-SHROOM:多语言LLM幻觉检测共享任务,聚焦可观察的过度生成错误。 |
large language model |
|
|
| 15 |
Finding Flawed Fictions: Evaluating Complex Reasoning in Language Models via Plot Hole Detection |
提出FlawedFictions基准,用于评估语言模型在故事情节漏洞检测中的复杂推理能力。 |
large language model |
|
|
| 16 |
Rethinking LLM-Based Recommendations: A Personalized Query-Driven Parallel Integration |
提出Query-to-Recommendation框架,解决LLM推荐系统中的偏差和串行瓶颈问题 |
large language model |
|
|
| 17 |
Could Thinking Multilingually Empower LLM Reasoning? |
利用多语言推理提升大语言模型在复杂任务中的性能上限 |
large language model |
|
|
| 18 |
Efficient and Adaptive Simultaneous Speech Translation with Fully Unidirectional Architecture |
提出EASiST,一种全单向架构的高效自适应同步语音翻译模型。 |
large language model |
|
|
| 19 |
WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms |
WebRollback:通过显式回滚机制增强Web代理的导航能力 |
large language model |
|
|
| 20 |
Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions |
提出基于叙事身份的LLM虚拟角色构建方法,用于模拟政治倾向认知偏差。 |
large language model |
|
|