| 1 |
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models |
MMDT:用于评估多模态大模型安全性与可信度的综合平台 |
foundation model multimodal |
✅ |
|
| 2 |
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation |
AdMiRe任务旨在提升模型在多模态语境下对习语的理解和表征能力。 |
large language model multimodal |
|
|
| 3 |
A Review on Large Language Models for Visual Analytics |
综述:大型语言模型赋能可视化分析,提升数据洞察与交互能力 |
large language model multimodal |
|
|
| 4 |
Exploring Large Language Models for Word Games:Who is the Spy? |
提出基于CoT的调度框架,提升LLM在“谁是卧底”游戏中角色推断和身份伪装能力 |
large language model chain-of-thought |
✅ |
|
| 5 |
ChatGPT or A Silent Everywhere Helper: A Survey of Large Language Models |
全面分析大型语言模型ChatGPT的架构、训练及应用 |
large language model |
|
|
| 6 |
Real-world validation of a multimodal LLM-powered pipeline for High-Accuracy Clinical Trial Patient Matching leveraging EHR data |
提出基于多模态LLM的临床试验患者匹配流程,提升准确率和效率 |
multimodal |
|
|
| 7 |
Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks |
评估开源大语言模型在医疗文本分类任务中的性能表现 |
large language model |
|
|
| 8 |
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models |
提出MASS框架,利用技能图谱进行数学领域大语言模型预训练的数据选择。 |
large language model |
|
|
| 9 |
Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation |
利用检索增强生成提升胰腺癌分期准确性 |
large language model |
|
|
| 10 |
SPILL: Domain-Adaptive Intent Clustering based on Selection and Pooling with Large Language Models |
提出SPILL,一种基于大语言模型选择和池化的领域自适应意图聚类方法 |
large language model |
|
|
| 11 |
A Foundational individual Mobility Prediction Model based on Open-Source Large Language Models |
提出基于开源大语言模型的个体移动预测基础模型,提升城市适应性和用户泛化性 |
large language model |
|
|
| 12 |
Poly-FEVER: A Multilingual Fact Verification Benchmark for Hallucination Detection in Large Language Models |
Poly-FEVER:多语言事实核查基准,用于检测大型语言模型中的幻觉 |
large language model |
✅ |
|
| 13 |
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings |
提出ContextualJudgeBench,用于评估LLM在上下文场景下的评判能力。 |
large language model instruction following |
|
|
| 14 |
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer |
MetaLadder:通过类比问题推理迁移提升数学问题求解质量 |
large language model chain-of-thought |
✅ |
|
| 15 |
Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems |
评估并缓解检索增强医学问答系统中存在的偏见问题 |
large language model chain-of-thought |
|
|
| 16 |
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context |
Solla:一种面向语音、能理解声学上下文的大语言模型 |
large language model multimodal |
|
|
| 17 |
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment |
提出AlignX框架,实现大规模个性化LLM对齐,解决用户偏好差异问题 |
large language model |
|
|
| 18 |
Deep Contrastive Unlearning for Language Models |
提出DeepCUT框架,通过对比学习优化LLM潜在空间,实现高效的机器遗忘。 |
large language model |
|
|
| 19 |
ECLAIR: Enhanced Clarification for Interactive Responses in an Enterprise AI Assistant |
ECLAIR:增强企业AI助手中交互式响应的澄清能力 |
large language model |
|
|
| 20 |
Inside-Out: Hidden Factual Knowledge in LLMs |
揭示LLM内部隐藏知识:提出内外知识评估框架,发现模型内部知识远超外部表现。 |
large language model |
|
|
| 21 |
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification |
提出基于模型编辑的LLM微调方法,高效解决面向方面的情感分类问题 |
large language model |
|
|
| 22 |
SPADE: Structured Prompting Augmentation for Dialogue Enhancement in Machine-Generated Text Detection |
提出SPADE框架以解决合成对话检测数据不足问题 |
large language model |
✅ |
|
| 23 |
LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones? |
强调阿拉伯文化多样性,呼吁NLP社区构建更具代表性的阿拉伯语LLM |
large language model |
|
|
| 24 |
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions? |
提出EmpathyAgent基准,评估和提升具身智能体在多场景下的共情行为能力 |
multimodal |
✅ |
|