| 1 |
PRISM: A Methodology for Auditing Biases in Large Language Models |
PRISM:一种用于审计大型语言模型偏见的任务型探查方法 |
large language model |
|
|
| 2 |
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges |
综述大型语言模型及其生成内容的水印技术:机遇与挑战 |
large language model |
|
|
| 3 |
A Survey of Multimodal Sarcasm Detection |
综述性研究:全面回顾2018-2023年多模态讽刺检测方法与未来方向。 |
multimodal |
|
|
| 4 |
Delving into the Reversal Curse: How Far Can Large Language Models Generalize? |
揭示大语言模型中的反转诅咒现象,探究其泛化能力边界 |
large language model |
✅ |
|
| 5 |
Task Calibration: Calibrating Large Language Models on Inference Tasks |
提出任务校准(TC)方法,通过任务重构提升大语言模型在推理任务上的性能。 |
large language model |
|
|
| 6 |
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data |
Infinity-MM:通过大规模高质量指令数据提升多模态模型性能 |
multimodal |
✅ |
|
| 7 |
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models |
提出CCI3.0-HQ高质量中文预训练数据集,提升小模型性能。 |
large language model |
✅ |
|
| 8 |
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models |
提出ChineseSafe中文安全基准,评估大型语言模型识别中文语境下不安全内容的能力。 |
large language model |
✅ |
|
| 9 |
Large Language Models Reflect the Ideology of their Creators |
大型语言模型反映其创建者的意识形态倾向 |
large language model |
|
|
| 10 |
AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability |
AdaEDL:基于熵的下界概率自适应停止推测解码,提升大语言模型推理效率。 |
large language model |
|
|
| 11 |
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations |
提出DeCoRe,通过对比检索头解码来缓解大语言模型的幻觉问题 |
large language model instruction following |
|
|
| 12 |
BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning |
BioMistral-NLU:通过指令微调提升医学语言理解的泛化能力 |
large language model instruction following |
|
|
| 13 |
Distill Visual Chart Reasoning Ability from LLMs to MLLMs |
提出CIT方法,利用代码作为媒介,从LLM蒸馏视觉图表推理能力到MLLM |
large language model multimodal |
✅ |
|
| 14 |
WAFFLE: Finetuning Multi-Modal Model for Automated Front-End Development |
WAFFLE:微调多模态模型,实现自动化前端开发 |
large language model |
|
|
| 15 |
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design |
Read-ME:通过系统协同设计,将LLM重构为解耦路由的混合专家模型,提升推理效率。 |
large language model |
✅ |
|
| 16 |
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use |
提出Psych-ADR基准和ADRA框架,评估LLM在精神药物不良反应处理中的专家对齐度 |
large language model |
|
|
| 17 |
Dynamic Vocabulary Pruning in Early-Exit LLMs |
提出动态词汇表剪枝方法,加速早退LLM的推理效率并保持性能。 |
large language model |
|
|
| 18 |
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems |
FaultyMath:评估LLM在错误数学问题上的逻辑一致性,揭示其“盲解”本质 |
large language model |
|
|
| 19 |
From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages |
提出一种低成本、模型无关的双语LLM构建方法,提升低资源语言性能。 |
large language model |
|
|
| 20 |
Does Differential Privacy Impact Bias in Pretrained NLP Models? |
研究表明:在预训练NLP模型中,差分隐私训练会加剧模型对特定群体的偏见。 |
large language model |
|
|
| 21 |
Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch |
提出ScaleQuest,通过可扩展的自生问题合成方法提升LLM的数学推理能力 |
large language model |
|
|
| 22 |
Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization |
利用Prompt和微调小LLM实现长度可控的电话呼叫总结 |
large language model |
|
|
| 23 |
Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models |
提出LOADS方法,通过激活分布峰度优化标签集,提升生成模型零样本分类性能。 |
large language model |
|
|
| 24 |
Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions |
系统性评估LLM数据污染检测方法,揭示现有方法在实际应用中的局限性 |
large language model |
|
|
| 25 |
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code |
Bridge-Coder:利用LLM弥合低资源代码语言的语言差距 |
large language model |
|
|
| 26 |
LLMs for Extremely Low-Resource Finno-Ugric Languages |
针对极低资源芬兰-乌戈尔语族,提出LLM构建、指令调优与评测的完整方案 |
large language model |
|
|
| 27 |
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance |
利用大语言模型检测并纠正数据集中的标签错误,提升模型性能 |
large language model |
|
|
| 28 |
An LLM Agent for Automatic Geospatial Data Analysis |
提出GeoAgent框架,解决LLM在复杂地理空间数据分析中逻辑错误和幻觉问题。 |
large language model |
|
|
| 29 |
Why Does the Effective Context Length of LLMs Fall Short? |
提出STRING,通过转移旋转位置嵌入显著提升LLM长文本处理能力 |
large language model |
|
|
| 30 |
A Systematic Survey on Instructional Text: From Representation Formats to Downstream NLP Tasks |
系统性综述指令文本:从表示格式到下游NLP任务,填补复杂指令理解领域空白。 |
large language model |
|
|
| 31 |
ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis |
ToolFlow:通过自然连贯的对话合成,提升LLM的工具调用能力 |
large language model |
|
|
| 32 |
MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases |
提出MoMQ框架以解决多方言查询生成问题 |
large language model |
|
|
| 33 |
Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems |
提出CreativeMath基准,评估LLM在数学问题上提出创新解法的能力 |
large language model |
|
|