| 1 |
Constraint Back-translation Improves Complex Instruction Following of Large Language Models |
提出约束反向翻译方法,提升大语言模型复杂指令遵循能力 |
large language model instruction following |
|
|
| 2 |
Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales? |
提出CD-CoT方法,提升大语言模型在噪声推理链提示下的鲁棒性 |
large language model chain-of-thought |
✅ |
|
| 3 |
Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer |
提出Thought Space Explorer,解决大语言模型推理中的盲点问题 |
large language model chain-of-thought |
|
|
| 4 |
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models |
针对音频大模型安全性,提出多维度红队测试方法,发现其脆弱性 |
large language model multimodal |
|
|
| 5 |
Large Language Models for Patient Comments Multi-Label Classification |
利用大型语言模型进行患者评论多标签分类,提升医疗反馈分析效率。 |
large language model chain-of-thought |
|
|
| 6 |
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments |
BitStack:一种在可变内存环境中对大语言模型进行任意大小压缩的训练方法。 |
large language model |
✅ |
|
| 7 |
Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models |
提出基于多文档聚合的成员推断攻击方法,成功攻破大型语言模型 |
large language model |
|
|
| 8 |
'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue |
提出DIAEF框架,有效检测多模态长对话中的分布外数据,提升用户体验。 |
multimodal |
|
|
| 9 |
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking |
提出JudgeRank,利用大语言模型进行推理密集型重排序,提升检索增强生成效果。 |
large language model |
|
|
| 10 |
IdeaBench: Benchmarking Large Language Models for Research Idea Generation |
IdeaBench:用于评估大语言模型生成科研想法能力的基准测试框架 |
large language model |
|
|
| 11 |
LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models |
LEAF:通过事实核查增强学习与评估,提升大型语言模型的事实性 |
large language model |
|
|
| 12 |
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective |
通过梯度分析揭示LLM快慢思考训练中层级差异 |
large language model chain-of-thought |
✅ |
|
| 13 |
What is Wrong with Perplexity for Long-context Language Modeling? |
提出LongPPL指标与LongCE损失,解决长文本建模中困惑度指标失效问题。 |
large language model |
✅ |
|
| 14 |
Rethinking Scale: The Efficacy of Fine-Tuned Open-Source LLMs in Large-Scale Reproducible Social Science Research |
微调开源LLM:提升大规模可复现社会科学研究的效率与透明度 |
large language model |
|
|
| 15 |
Schema Augmentation for Zero-Shot Domain Adaptation in Dialogue State Tracking |
提出Schema Augmentation,提升零样本对话状态跟踪的领域泛化能力 |
large language model |
|
|
| 16 |
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation |
提出RSL-SQL框架,通过鲁棒模式链接提升Text-to-SQL生成性能。 |
large language model |
✅ |
|
| 17 |
Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models |
研究表明:小模型微调大模型生成数据易产生知识不匹配,导致幻觉问题加剧 |
large language model |
|
|
| 18 |
Commonsense Knowledge Editing Based on Free-Text in LLMs |
提出DEM方法,用于编辑LLM中基于自由文本的常识知识。 |
large language model |
|
|
| 19 |
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios |
DetectRL:真实场景下大语言模型生成文本检测的基准测试 |
large language model |
✅ |
|
| 20 |
From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents |
通过优化上下文管理,提升LLM驱动的多轮Web导航Agent的泛化能力 |
large language model |
|
|
| 21 |
RESTOR: Knowledge Recovery in Machine Unlearning |
RESTOR框架:评估机器学习模型在数据遗忘中的知识恢复能力 |
large language model |
|
|
| 22 |
Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs |
针对阿拉伯文化偏见,对前沿LLM进行红队测试与安全评估 |
large language model |
|
|
| 23 |
Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language |
利用单源高质量机器翻译数据预训练多语言大语言模型,显著提升非英语推理能力。 |
large language model |
|
|
| 24 |
Language Models can Self-Lengthen to Generate Long Texts |
提出Self-Lengthen框架,利用LLM自身能力生成更长文本,无需额外数据或专有模型。 |
large language model |
✅ |
|
| 25 |
Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction |
Instruction-Tuning Llama-3-8B用于城市级长期移动预测,性能超越SOTA |
large language model |
✅ |
|
| 26 |
Pseudo-Conversation Injection for LLM Goal Hijacking |
提出伪对话注入以解决大型语言模型目标劫持问题 |
large language model |
|
|
| 27 |
On Positional Bias of Faithfulness for Long-form Summarization |
针对长文本摘要中位置偏差问题,提出评测基准与缓解策略。 |
large language model |
✅ |
|