| 1 |
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models |
BenchMAX:一个全面的多语言评估套件,用于评估大型语言模型 |
large language model instruction following |
|
|
| 2 |
GENERator: A Long-Context Generative Genomic Foundation Model |
提出GENERator:一个长上下文的生成式基因组基础模型 |
large language model foundation model |
✅ |
|
| 3 |
AI-VERDE: A Gateway for Egalitarian Access to Large Language Model-Based Resources For Educational Institutions |
AI-VERDE:为教育机构提供平等的LLM资源访问平台 |
large language model |
|
|
| 4 |
Large Language Models as Proxies for Theories of Human Linguistic Cognition |
利用大型语言模型作为人类语言认知理论的代理模型 |
large language model |
|
|
| 5 |
Exploring Mobile Touch Interaction with Large Language Models |
提出一种基于触摸手势的LLM交互方法,用于移动设备上的文本编辑。 |
large language model |
|
|
| 6 |
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation |
提出TURBO模型,通过目标增强的共享融合机制提升多模态讽刺解释生成效果。 |
multimodal |
|
|
| 7 |
Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation |
利用大型语言模型自动评估科学文献中动态演化的主题模型 |
large language model |
|
|
| 8 |
Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering |
提出NOVA框架,通过数据过滤减少大语言模型指令调优中的幻觉问题 |
large language model |
|
|
| 9 |
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction |
提出CodeI/O,通过代码输入输出预测提炼通用推理模式,提升大语言模型在多任务上的推理能力。 |
large language model chain-of-thought |
✅ |
|
| 10 |
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding |
提出协同推测解码(CoSD),在解码时融合多个LLM知识,提升性能。 |
large language model |
|
|
| 11 |
Hallucination, Monofacts, and Miscalibration: An Empirical Investigation |
通过控制单因素率和选择性加权,实证研究LLM幻觉现象的成因与缓解策略 |
large language model |
|
|
| 12 |
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models |
揭示Prompt几何特性:探究语言模型任务自适应的不同机制 |
large language model |
|
|
| 13 |
Auditing Prompt Caching in Language Model APIs |
通过时序审计揭示LLM API中的Prompt缓存及潜在隐私泄露风险 |
large language model |
|
|
| 14 |
WHODUNIT: Evaluation benchmark for culprit detection in mystery stories |
提出WhoDunIt数据集,评估LLM在推理小说中识别罪犯的演绎推理能力 |
large language model |
|
|
| 15 |
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More |
提出Mask-Enhanced Autoregressive Prediction (MEAP)以增强LLM上下文检索能力 |
large language model |
|
|
| 16 |
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian |
PerCul:提出波斯语文化评估数据集,用于评估LLM的文化敏感性。 |
large language model |
✅ |
|
| 17 |
Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon |
提出C-BOD框架,用于检测大型语言模型在基准测试中的过拟合现象 |
large language model |
|
|
| 18 |
RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs |
RomanLens:揭示LLM多语言能力中潜在的罗马化作用 |
large language model |
|
|
| 19 |
Entity Linking using LLMs for Automated Product Carbon Footprint Estimation |
利用大型语言模型进行实体链接,实现产品碳足迹的自动评估。 |
large language model |
|
|
| 20 |
MEMIT-Merge: Addressing MEMIT's Key-Value Conflicts in Same-Subject Batch Editing for LLMs |
MEMIT-Merge:解决MEMIT在LLM同主题批量编辑中的Key-Value冲突问题 |
large language model |
|
|
| 21 |
Small Language Model Makes an Effective Long Text Extractor |
提出SeNER,一种轻量级长文本实体抽取方法,显著提升抽取精度并降低内存占用。 |
large language model |
✅ |
|
| 22 |
A Large-Scale Benchmark for Vietnamese Sentence Paraphrases |
构建大规模高质量越南语句子释义数据集ViSP,促进越南语自然语言处理研究。 |
large language model |
|
|
| 23 |
Does Training on Synthetic Data Make Models Less Robust? |
研究表明,使用合成数据训练LLM并不会加剧其在NLI任务中的固有盲点。 |
large language model |
|
|
| 24 |
Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis |
Language-TPP:融合时序点过程与语言模型,用于增强事件分析能力。 |
large language model |
|
|