| 1 |
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation |
提出MultiTrust-X基准,用于评估、分析和缓解多模态大语言模型中的信任问题。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-commerce Models |
EcomMMMU:面向电商多模态模型的视觉信息策略性利用 |
large language model multimodal |
✅ |
|
| 3 |
DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking |
DeepMEL:提出一种多智能体协作框架,用于解决多模态实体链接任务。 |
large language model multimodal |
|
|
| 4 |
WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai |
WangchanThaiInstruct:一个用于评估泰语文化感知、多任务和多领域的大型语言模型数据集。 |
large language model instruction following |
|
|
| 5 |
ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following |
提出CoLVLM-Agent框架,解决多轮视觉对话和复杂指令跟随中的上下文理解难题。 |
large language model instruction following |
|
|
| 6 |
Confidence-Modulated Speculative Decoding for Large Language Models |
提出基于置信度调制的推测解码方法,加速大语言模型的自回归推理。 |
large language model |
|
|
| 7 |
EMNLP: Educator-role Moral and Normative Large Language Models Profiling |
EMNLP:构建教育者角色道德规范大语言模型评估框架,评估伦理风险 |
large language model |
✅ |
|
| 8 |
Dream 7B: Diffusion Large Language Models |
Dream 7B:提出一种基于扩散的更强大的开放域大语言模型 |
large language model |
|
|
| 9 |
Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models |
综述论文:对大型语言模型中基于证据的文本生成方法进行归因、引用和引用的研究 |
large language model |
|
|
| 10 |
A Survey on Large Language Model Benchmarks |
系统性综述大语言模型评测基准,分析现有问题并提出未来设计方向。 |
large language model |
|
|
| 11 |
Self-Guided Function Calling in Large Language Models via Stepwise Experience Recall |
提出SEER,通过逐步经验回忆实现大语言模型中的自引导函数调用 |
large language model |
|
|
| 12 |
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis |
提出ReasonZoo基准,剖析工具集成推理对大语言模型推理能力的提升 |
large language model chain-of-thought |
|
|
| 13 |
Retrieval-Augmented Review Generation for Poisoning Recommender Systems |
提出RAGAN框架,通过检索增强生成高质量评论,提升推荐系统数据投毒攻击的有效性和隐蔽性。 |
foundation model multimodal |
|
|
| 14 |
Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs |
Annif系统在GermEval-2025 LLMs4Subjects任务中,通过高效LLM增强传统XMTC方法,获得第一名。 |
large language model |
|
|
| 15 |
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training |
研究词汇频率不平衡对语言模型预训练的影响,揭示扩大词表主要降低高频词的不确定性。 |
large language model |
|
|
| 16 |
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning |
SparK:查询感知的非结构化稀疏化KV缓存通道剪枝,提升长文本LLM推理效率。 |
large language model |
✅ |
|
| 17 |
SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation |
SurGE:用于科学综述生成的基准测试与评估框架 |
large language model |
✅ |
|
| 18 |
Trained Miniatures: Low cost, High Efficacy SLMs for Sales & Marketing |
提出“训练微型模型”,以低成本高效地进行销售和营销领域的文本生成。 |
large language model |
|
|
| 19 |
Subjective Behaviors and Preferences in LLM: Language of Browsing |
提出HeTLM以更好地捕捉用户的主观行为与偏好 |
large language model |
|
|
| 20 |
When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models |
提出MCR-BENCH基准,揭示大型音频语言模型在不一致多模态输入中存在的文本偏见问题。 |
multimodal |
✅ |
|
| 21 |
Conflict-Aware Soft Prompting for Retrieval-Augmented Generation |
提出CARE模型,通过冲突感知软提示缓解RAG中的上下文-记忆冲突问题 |
large language model |
|
|
| 22 |
SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Modeling |
SemToken:面向长文本高效建模的语义感知分词方法 |
large language model |
|
|
| 23 |
Dancing with Deer: A Constructional Perspective on MWEs in the Era of LLMs |
利用构式语法视角,研究大型语言模型时代下多词表达的理解与泛化。 |
large language model |
|
|
| 24 |
SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking |
提出SafetyFlow,一个全自动Agent-Flow系统,用于大规模语言模型(LLM)安全基准测试。 |
large language model |
|
|
| 25 |
AmbiSQL: Interactive Ambiguity Detection and Resolution for Text-to-SQL |
AmbiSQL:交互式歧义检测与消解,提升Text-to-SQL准确率 |
large language model |
✅ |
|
| 26 |
Are Checklists Really Useful for Automatic Evaluation of Generative Tasks? |
研究检查列表在生成任务自动评估中的有效性,并提出选择性使用策略。 |
large language model |
✅ |
|
| 27 |
Identifying and Answering Questions with False Assumptions: An Interpretable Approach |
提出一种可解释的方法,用于识别并回答带有错误假设的问题。 |
large language model |
|
|