| 1 |
VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples |
VaccineRAG:通过对抗有害RAG样本提升多模态大语言模型的免疫力 |
large language model multimodal chain-of-thought |
|
|
| 2 |
IDEAlign: Comparing Large Language Models to Human Experts in Open-ended Interpretive Annotations |
IDEAlign:通过“奇数挑一”评估LLM在开放式解释性标注任务中与人类专家的对齐程度 |
large language model |
|
|
| 3 |
Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models |
利用LLaMA 3.2-3B生成短篇小说,揭示其中关于黑人女性和白人女性的种族偏见。 |
large language model |
|
|
| 4 |
Scaling behavior of large language models in emotional safety classification across sizes and tasks |
研究LLM在情感安全分类中的规模效应,探索轻量级模型在心理健康领域的应用潜力。 |
large language model |
|
|
| 5 |
Comparative Study of Pre-Trained BERT and Large Language Models for Code-Mixed Named Entity Recognition |
对比研究预训练BERT与大语言模型在混合语命名实体识别中的应用 |
large language model |
|
|
| 6 |
An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction |
提出一种多层LLM框架下的集成方法,用于提升阿拉伯语社交健康数据中的疾病预测精度。 |
large language model |
|
|
| 7 |
E-THER: A Multimodal Dataset for Empathic AI -- Towards Emotional Mismatch Awareness |
提出E-THER多模态数据集,用于提升AI在情感不匹配感知方面的能力。 |
multimodal |
|
|
| 8 |
DeepSeek performs better than other Large Language Models in Dental Cases |
DeepSeek在大语言模型牙科病例分析中表现优于其他模型 |
large language model |
|
|
| 9 |
Behavioral Fingerprinting of Large Language Models |
提出大语言模型行为指纹框架,用于剖析模型认知与交互风格的差异。 |
large language model |
✅ |
|
| 10 |
DRAssist: Dispute Resolution Assistance using Large Language Models |
DRAssist:利用大型语言模型辅助解决汽车保险和域名争议 |
large language model |
|
|
| 11 |
Extracting OPQRST in Electronic Health Records using Large Language Models with Reasoning |
提出基于LLM的推理方法,用于从电子病历中提取OPQRST信息,提升临床决策效率。 |
large language model |
|
|
| 12 |
FActBench: A Benchmark for Fine-grained Automatic Evaluation of LLM-Generated Text in the Medical Domain |
构建医学领域LLM生成文本自动评估基准FActBench,提升事实性评估的可靠性 |
large language model chain-of-thought |
|
|
| 13 |
How Instruction-Tuning Imparts Length Control: A Cross-Lingual Mechanistic Analysis |
指令调优通过专门化模型深层组件实现跨语言长度控制 |
large language model foundation model |
|
|
| 14 |
PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture |
PalmX 2025:首个面向阿拉伯和伊斯兰文化的大语言模型评测共享任务 |
large language model |
|
|
| 15 |
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds |
MoSEs:通过混合文体专家模型与条件阈值实现不确定性感知的AI生成文本检测 |
large language model |
✅ |
|
| 16 |
SpecEval: Evaluating Model Adherence to Behavior Specifications |
SpecEval:评估大型模型对行为规范的遵循程度 |
foundation model |
|
|
| 17 |
LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue |
提出双步框架,利用LLM评估团队对话中共享心智模型的偏差。 |
large language model |
|
|
| 18 |
Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size? |
提出基础语言模型范式,探索语言能力与模型规模的解耦策略 |
large language model |
|
|
| 19 |
Avoidance Decoding for Diverse Multi-Branch Story Generation |
提出Avoidance Decoding,解决LLM故事生成中多样性不足和重复性问题。 |
large language model |
|
|
| 20 |
AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models |
提出AMBEDKAR框架,通过知识增强解码消除LLM中的多层次偏见,实现对印度宪法的稳健对齐。 |
large language model |
|
|
| 21 |
JudgeAgent: Beyond Static Benchmarks for Knowledge-Driven and Dynamic LLM Evaluation |
提出JudgeAgent,用于知识驱动和动态的大语言模型评估,突破静态基准限制 |
large language model |
✅ |
|
| 22 |
Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization |
提出CRPO:一种检索增强的对比推理框架,用于自动提示优化。 |
large language model |
|
|
| 23 |
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation |
提出Genetic Prompt,利用LLM模拟遗传算法进行条件合成数据生成,提升数据质量和多样性。 |
large language model |
|
|
| 24 |
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts |
提出RW-Steering,解决LLM在混合不当上下文中易受少量有害信息影响的问题 |
large language model |
|
|