| 1 |
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following |
XIFBench:一个用于评估大语言模型多语言指令遵循能力的综合基准 |
large language model instruction following |
✅ |
|
| 2 |
A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images |
提出眼科多模态大语言模型评测基准,评估眼底彩照和OCT图像分析能力 |
large language model multimodal |
|
|
| 3 |
Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings |
通过感知强度评估探索大型语言模型中的多模态感知能力 |
large language model multimodal |
|
|
| 4 |
Application of Multiple Chain-of-Thought in Contrastive Reasoning for Implicit Sentiment Analysis |
提出双重/三重反向链式推理框架,用于提升隐式情感分析性能 |
large language model chain-of-thought |
|
|
| 5 |
Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation |
构建医学影像质控数据集与评估框架,探索大语言模型在质控中的应用 |
large language model multimodal |
|
|
| 6 |
cantnlp@DravidianLangTech2025: A Bag-of-Sounds Approach to Multimodal Hate Speech Detection |
提出基于声音包方法的印地语多模态仇恨言论检测系统,探索语音数据在仇恨言论识别中的潜力。 |
multimodal |
|
|
| 7 |
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models |
提出SEAP:一种免训练的稀疏专家激活剪枝方法,释放大语言模型潜力 |
large language model |
|
|
| 8 |
Assessing the Macro and Micro Effects of Random Seeds on Fine-Tuning Large Language Models |
评估随机种子对微调大型语言模型宏观和微观层面的影响 |
large language model |
|
|
| 9 |
TCM-3CEval: A Triaxial Benchmark for Assessing Responses from Large Language Models in Traditional Chinese Medicine |
TCM-3CEval:构建中医大语言模型三轴评估基准,弥合临床需求差距 |
large language model |
|
|
| 10 |
Large Language Models Often Say One Thing and Do Another |
提出WDCT基准,揭示大语言模型“言行不一”问题,并探究对齐策略的影响。 |
large language model |
|
|
| 11 |
Bot Wars Evolved: Orchestrating Competing LLMs in a Counterstrike Against Phone Scams |
提出Bot Wars框架,利用LLM对抗电话诈骗,实现策略涌现 |
large language model chain-of-thought |
|
|
| 12 |
Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data |
针对报告摘要任务,研究有监督和无监督数据下微调LLM的有效性 |
large language model |
|
|
| 13 |
Can Memory-Augmented Language Models Generalize on Reasoning-in-a-Haystack Tasks? |
提出MemReasoner,增强LLM在复杂推理任务中的泛化能力 |
large language model |
|
|
| 14 |
Gemini Embedding: Generalizable Embeddings from Gemini |
Gemini Embedding:利用Gemini大模型生成通用文本嵌入,显著提升多语言和多模态文本表示能力 |
large language model |
|
|
| 15 |
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality |
通过重复利用高质量过滤数据集,提升大语言模型在有限计算资源下的性能 |
large language model |
|
|
| 16 |
HalluVerse25: Fine-grained Multilingual Benchmark Dataset for LLM Hallucinations |
提出HalluVerse25:一个用于评估LLM幻觉的细粒度多语言基准数据集。 |
large language model |
|
|
| 17 |
Implicit Reasoning in Transformers is Reasoning through Shortcuts |
Transformer中的隐式推理本质是基于shortcut的学习 |
large language model |
|
|
| 18 |
KSOD: Knowledge Supplement for LLMs On Demand |
提出KSOD框架,按需为LLM补充知识以提升领域任务性能。 |
large language model |
|
|
| 19 |
ZeroSumEval: An Extensible Framework For Scaling LLM Evaluation with Inter-Model Competition |
提出ZeroSumEval,通过模型间竞争扩展LLM评估框架 |
large language model |
|
|
| 20 |
TokenButler: Token Importance is Predictable |
TokenButler:提出一种可预测Token重要性的方法,缓解LLM KV-Cache瓶颈。 |
large language model |
✅ |
|
| 21 |
Language Models Fail to Introspect About Their Knowledge of Language |
研究表明大型语言模型无法有效内省其语言知识 |
large language model |
|
|
| 22 |
Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations |
通过亚洲国家人口统计分析量化开放LLM中的宗教偏见 |
large language model |
|
|
| 23 |
LLMs syntactically adapt their language use to their conversational partner |
研究表明大型语言模型在对话中会进行句法层面的语言风格调整以适应对话伙伴。 |
large language model |
|
|
| 24 |
Revisiting Noise in Natural Language Processing for Computational Social Science |
重新审视自然语言处理中的噪声,以促进计算社会科学研究。 |
large language model |
|
|
| 25 |
A Graph-based Verification Framework for Fact-Checking |
提出GraphFC框架,通过图结构化验证解决虚假信息检测中分解不足和指代歧义问题 |
large language model |
|
|
| 26 |
MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark |
提出MRCEval,一个全面、有挑战性且易于访问的机器阅读理解评测基准。 |
large language model |
|
|
| 27 |
Linguistic Knowledge Transfer Learning for Speech Enhancement |
提出跨模态知识迁移框架CMKT,利用预训练LLM提升语音增强效果。 |
large language model |
|
|
| 28 |
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words |
提出Identity Lock机制,通过身份唤醒词锁定API微调LLM,防止密钥泄露。 |
large language model |
|
|
| 29 |
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation |
DatawiseAgent:面向数据科学自动化,基于Notebook的自适应鲁棒LLM Agent框架 |
large language model |
|
|
| 30 |
Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning |
提出ImplexConv数据集和TaciTree框架,用于解决多轮个性化对话中的隐式推理问题。 |
large language model |
|
|
| 31 |
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations |
提出Bias Benchmark for Generation (BBG),用于评估长文本生成中大型语言模型的社会偏见。 |
large language model |
|
|
| 32 |
Effect of Selection Format on LLM Performance |
研究选择格式对大语言模型性能的影响,发现项目符号格式通常更优 |
large language model |
|
|
| 33 |
Enhanced Multi-Tuple Extraction for Alloys: Integrating Pointer Networks and Augmented Attention |
提出融合指针网络与增强注意力机制的多元组提取框架,用于合金材料文献信息抽取。 |
large language model |
|
|