| 1 |
AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis |
AgentSkiller:通过语义集成跨领域数据合成扩展通用智能体能力 |
generalist agent large language model |
|
|
| 2 |
Decomposing Reasoning Efficiency in Large Language Models |
提出可分解推理效率的框架,揭示大语言模型推理过程中的效率瓶颈 |
large language model |
|
|
| 3 |
Contractual Deepfakes: Can Large Language Models Generate Contracts? |
批判性分析:大型语言模型在合同生成中的局限性 |
large language model |
|
|
| 4 |
Knowledge Integration Decay in Search-Augmented Reasoning of Large Language Models |
提出SAKE策略,解决LLM搜索增强推理中知识整合衰减问题 |
large language model |
|
|
| 5 |
Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts |
研究表明RAG系统通过引入外部知识能在一定程度上缓解LLM的社会偏见问题。 |
large language model chain-of-thought |
|
|
| 6 |
Covo-Audio Technical Report |
Covo-Audio:一个70亿参数的端到端音频大语言模型,实现多任务音频智能 |
foundation model instruction following |
|
|
| 7 |
Breaking the Pre-Sampling Barrier: Activation-Informed Difficulty-Aware Self-Consistency |
提出ACTSC,利用激活信息实现难度自适应的自洽解码,降低LLM推理成本。 |
large language model chain-of-thought |
|
|
| 8 |
Are Language Models Sensitive to Morally Irrelevant Distractors? |
揭示语言模型道德判断的脆弱性:无关干扰因素可显著影响其决策 |
large language model multimodal |
|
|
| 9 |
Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs |
提出预算引导的蒙特卡洛树搜索(BG-MCTS),解决LLM测试时定额token预算下的策略对齐问题。 |
large language model |
|
|
| 10 |
Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference |
DRIFT:解耦推理与隐式知识表示,提升长文本推理效率。 |
large language model |
✅ |
|
| 11 |
Maastricht University at AMIYA: Adapting LLMs for Dialectal Arabic using Fine-tuning and MBR Decoding |
利用LoRA微调和MBR解码,提升LLM在方言阿拉伯语生成和翻译中的性能 |
large language model |
|
|
| 12 |
LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval |
LEMUR:用于稳健微调多语言法律嵌入模型的检索语料库 |
large language model |
✅ |
|
| 13 |
Conceptual Cultural Index: A Metric for Cultural Specificity via Relative Generality |
提出概念文化指数CCI,用于评估句子级别的文化特异性 |
large language model |
✅ |
|
| 14 |
AmharicIR+Instr: A Two-Dataset Resource for Neural Retrieval and Instruction Tuning |
发布AmharicIR+Instr数据集,促进阿姆哈拉语神经检索和指令调优研究 |
instruction following |
|
|
| 15 |
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies |
揭示自进化AI社会的安全困境:Anthropic安全在演化中不可避免地消退 |
large language model |
|
|
| 16 |
Steer2Edit: From Activation Steering to Component-Level Editing |
提出Steer2Edit以解决大语言模型行为控制问题 |
large language model |
|
|
| 17 |
LLM Reasoning Predicts When Models Are Right: Evidence from Coding Classroom Discourse |
利用LLM推理预测模型在教育对话分析中的正确性,提升自动化标注质量 |
large language model |
|
|
| 18 |
Unsupervised Layer-Wise Dynamic Test Time Adaptation for LLMs |
提出层级动态测试时自适应方法,提升LLM在无监督环境下的性能。 |
large language model |
|
|
| 19 |
TraceMem: Weaving Narrative Memory Schemata from User Conversational Traces |
TraceMem:从用户对话轨迹中构建叙事记忆模式,提升LLM长期交互能力 |
large language model |
✅ |
|
| 20 |
MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation |
提出MILE-RefHumEval框架以解决LLM评估中的参考依赖问题 |
large language model |
|
|
| 21 |
Advancing Block Diffusion Language Models for Test-Time Scaling |
提出BACD和TCCF框架,提升块扩散语言模型在测试时推理加速和长链推理任务上的性能。 |
chain-of-thought |
|
|
| 22 |
Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs |
评估LLM中西班牙语词汇变异,揭示数字语言偏差 |
large language model |
|
|