| 1 |
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models |
DyPlan:一种基于大语言模型的动态策略规划方法,提升问答效率与性能。 |
large language model chain-of-thought |
|
|
| 2 |
Danoliteracy of Generative Large Language Models |
提出Danoliteracy基准以评估丹麦语生成模型能力 |
large language model |
|
|
| 3 |
Smaller Large Language Models Can Do Moral Self-Correction |
通过安全对齐微调,小规模语言模型也能实现道德自我修正 |
large language model |
|
|
| 4 |
On Memorization of Large Language Models in Logical Reasoning |
研究表明:大型语言模型在逻辑推理中存在对训练数据的记忆现象 |
large language model |
|
|
| 5 |
Multi-Agent Large Language Models for Conversational Task-Solving |
提出多智能体LLM框架,分析其在会话式任务解决中的优势与挑战。 |
large language model |
|
|
| 6 |
How Well Do Large Language Models Disambiguate Swedish Words? |
评估大型语言模型在瑞典语词义消歧任务中的表现 |
large language model |
|
|
| 7 |
Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model |
提出一种从零构建日语多模态数据集的方法,加速日语视觉语言模型开发。 |
multimodal |
|
|
| 8 |
Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models |
提出EZSwitch框架,结合语言学理论与LLM生成高质量Code-Switching文本 |
large language model |
|
|
| 9 |
Survey of Cultural Awareness in Language Models: Text and Beyond |
综述:语言模型中的文化意识——文本及其他 |
large language model multimodal |
|
|
| 10 |
SciPIP: An LLM-based Scientific Paper Idea Proposer |
SciPIP:一种基于LLM的科学论文选题推荐框架,提升文献检索与选题生成质量 |
large language model |
|
|
| 11 |
Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning |
对比分析LLM上下文学习的示例选择算法,揭示其效率与有效性差异 |
large language model |
✅ |
|
| 12 |
Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank |
PLANT:通过学习排序预训练注意力机制,提升极端多标签文本分类性能。 |
large language model |
✅ |
|
| 13 |
Generating Diverse Negations from Affirmative Sentences |
提出NegVerse方法,从肯定句生成多样化的否定句,提升语言模型在否定推理方面的能力。 |
large language model |
✅ |
|
| 14 |
Graph-Augmented Relation Extraction Model with LLMs-Generated Support Document |
提出一种图增强的关系抽取模型,利用LLM生成支持文档以提升性能。 |
large language model |
|
|
| 15 |
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration |
提出ACC-Collab,一种基于Actor-Critic的多Agent LLM协作学习框架 |
large language model |
|
|
| 16 |
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation |
针对LLM证明生成,论文提出直观序列数据排序以提升模型训练效果 |
large language model |
|
|
| 17 |
Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations |
提出C-RAG框架,通过对比解释提升检索增强语言模型中的批判性推理能力 |
large language model |
|
|
| 18 |
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models |
InjecGuard:通过缓解过防御提升提示注入防御模型的鲁棒性 |
large language model |
✅ |
|
| 19 |
LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions |
大规模调研揭示研究人员对LLM作为科研工具的使用情况与看法 |
large language model |
|
|
| 20 |
Prove Your Point!: Bringing Proof-Enhancement Principles to Argumentative Essay Generation |
提出PESA框架,通过论证增强原则提升论证性文章生成的逻辑性和说服力 |
large language model |
|
|
| 21 |
Neural spell-checker: Beyond words with synthetic data generation |
提出基于合成数据生成的神经拼写检查器,显著提升斯洛文尼亚语拼写纠错性能 |
large language model |
|
|
| 22 |
Collage: Decomposable Rapid Prototyping for Information Extraction on Scientific PDFs |
Collage:用于科学PDF信息提取的可分解快速原型工具 |
multimodal |
|
|
| 23 |
Leveraging Language Models and Bandit Algorithms to Drive Adoption of Battery-Electric Vehicles |
利用语言模型和Bandit算法个性化推广电动汽车 |
large language model |
|
|
| 24 |
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists |
提出FeatEng基准,评估LLM在特征工程代码生成中的能力,助力模型迭代。 |
large language model |
|
|
| 25 |
Evaluating Cultural and Social Awareness of LLM Web Agents |
CASA:评估LLM Web Agent文化和社会意识的基准测试 |
large language model |
|
|
| 26 |
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation |
提出CORAL:用于评估多轮对话检索增强生成的大规模基准 |
large language model |
|
|
| 27 |
Multi-Programming Language Sandbox for LLMs |
MPLSandbox:为LLM提供多编程语言统一反馈的沙箱环境 |
large language model |
|
|
| 28 |
Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall |
提出Long$^2$RAG基准和KPR指标,用于评估长文本检索增强生成模型的性能。 |
large language model |
|
|
| 29 |
Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application |
探讨深度学习与机器学习在自然语言处理中的应用及挑战 |
large language model |
|
|
| 30 |
EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations |
EvoCodeBench:一个动态演进的代码生成评测基准,具备领域特定评估能力 |
large language model |
|
|