| 1 |
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG |
提出UniDoc-Bench以解决文档中心多模态检索增强生成评估问题 |
large language model multimodal |
|
|
| 2 |
Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models |
揭示大语言模型测试时缩放中候选多样性不足导致不安全输出的风险 |
large language model |
|
|
| 3 |
Rezwan: Leveraging Large Language Models for Comprehensive Hadith Text Processing: A 1.2M Corpus Development |
Rezwan:利用大型语言模型构建120万规模的圣训文本处理语料库 |
large language model |
|
|
| 4 |
Fine-Tuning Large Language Models with QLoRA for Offensive Language Detection in Roman Urdu-English Code-Mixed Text |
提出基于QLoRA微调LLaMA3的框架,提升在Roman Urdu-English混合文本中攻击性语言检测性能。 |
large language model |
|
|
| 5 |
Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles |
构建孟加拉语政治倾向性新闻基准数据集,用于评估和提升LLM的偏见检测能力。 |
large language model |
|
|
| 6 |
Annotate Rhetorical Relations with INCEpTION: A Comparison with Automatic Approaches |
利用INCEpTION工具,对比人工与自动方法,研究篇章修辞关系标注。 |
large language model |
|
|
| 7 |
Mechanistic Interpretability of Socio-Political Frames in Language Models |
探索LLM中社会政治框架的机制可解释性,揭示模型内部认知表征 |
large language model |
|
|
| 8 |
Investigating LLM Variability in Personalized Conversational Information Retrieval |
研究LLM在个性化对话信息检索中的变异性,强调多轮评估和方差报告的重要性 |
large language model |
|
|
| 9 |
Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs |
研究表明:LLM中不平衡的Few-Shot学习会影响多语言词义消歧 |
large language model |
|
|
| 10 |
Can an LLM Induce a Graph? Investigating Memory Drift and Context Length |
提出基于图诱导的LLM评估方法,揭示其在关系推理中更早出现的记忆漂移问题。 |
large language model |
|
|