| 1 |
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models |
EvoMoE:多模态大语言模型中基于专家演化的混合专家模型 |
large language model multimodal |
|
|
| 2 |
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates |
提出多模态对抗组合性基准MAC,利用LLM生成欺骗性文本以评估CLIP的脆弱性。 |
large language model multimodal |
|
|
| 3 |
Spatial Knowledge Graph-Guided Multimodal Synthesis |
提出SKG2DATA,利用空间知识图谱引导多模态数据合成,提升MLLM的空间感知能力。 |
large language model multimodal |
✅ |
|
| 4 |
From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models |
综述:基于大语言模型的假设发现与规则学习研究进展 |
large language model instruction following |
|
|
| 5 |
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models |
提出Think-Bench评估大模型推理效率与思维链质量,解决过度推理问题 |
large language model chain-of-thought |
|
|
| 6 |
Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction |
提出基于多模态LLM的语音数字表型方法,用于多任务心理健康预测 |
large language model multimodal |
|
|
| 7 |
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian |
FAMA:首个面向英语和意大利语的大规模开源语音基础模型 |
foundation model |
|
|
| 8 |
Large Language Models Often Know When They Are Being Evaluated |
研究表明大型语言模型具备一定程度的评估感知能力 |
large language model |
|
|
| 9 |
Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model |
提出IOHFuseLM,利用多模态语言模型预测稀疏的术中低血压事件。 |
multimodal |
✅ |
|
| 10 |
Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning |
提出MERRY:一个用于通用知识图谱推理的基座模型,有效提升了KG内部和外部任务的性能。 |
foundation model |
|
|
| 11 |
Evaluating the Retrieval Robustness of Large Language Models |
评估大型语言模型在检索增强生成中的检索鲁棒性 |
large language model |
|
|
| 12 |
Structured Memory Mechanisms for Stable Context Representation in Large Language Models |
提出结构化记忆机制,增强大语言模型在长文本和多轮对话中的上下文表示能力。 |
large language model |
|
|
| 13 |
Talent or Luck? Evaluating Attribution Bias in Large Language Models |
提出认知基础的偏见评估框架以解决LLMs的归因偏见问题 |
large language model |
|
|
| 14 |
Can Large Language Models Match the Conclusions of Systematic Reviews? |
MedEvidence基准测试揭示大型语言模型在系统评价结论匹配方面与临床专家存在差距 |
large language model |
|
|
| 15 |
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese |
揭示大语言模型在简体中文和繁体中文上的偏差,并构建开源评测基准。 |
large language model |
✅ |
|
| 16 |
Precise In-Parameter Concept Erasure in Large Language Models |
PISCES:通过参数空间精确编辑,从大语言模型中擦除特定概念。 |
large language model |
|
|
| 17 |
Do Large Language Models Think Like the Brain? Sentence-Level Evidences from Layer-Wise Embeddings and fMRI |
通过层级嵌入和fMRI,研究大型语言模型与人脑的句子级神经机制相似性 |
large language model |
|
|
| 18 |
Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities |
综述:探索基于大型语言模型的Text-to-SQL技术进展、挑战与机遇 |
large language model |
|
|
| 19 |
Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition |
揭示LLM语音识别评估漏洞:LibriSpeech和Common Voice数据集污染 |
large language model |
|
|
| 20 |
Say What You Mean: Natural Language Access Control with Large Language Models for Internet of Things |
LACE:利用大语言模型实现物联网自然语言访问控制 |
large language model |
|
|
| 21 |
ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage |
提出ICH-Qwen:面向中国非物质文化遗产的大语言模型 |
large language model |
|
|
| 22 |
LoKI: Low-damage Knowledge Implanting of Large Language Models |
LoKI:一种低损的大语言模型知识植入方法,有效缓解灾难性遗忘。 |
large language model |
|
|
| 23 |
Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists |
提出HiTEC框架,通过分层错误检查列表提升大语言模型工具学习能力 |
large language model |
|
|
| 24 |
MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models |
MemOS:为大语言模型设计内存增强生成(MAG)的操作系统 |
large language model |
|
|
| 25 |
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models |
BiasFilter:一种用于大型语言模型的推理时去偏框架 |
large language model |
|
|
| 26 |
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset |
PEARL:一个大规模、文化感知的阿拉伯语多模态指令数据集,用于提升LVLM的文化理解能力。 |
multimodal |
|
|
| 27 |
Learning Composable Chains-of-Thought |
提出可组合思维链学习方法,提升LLM在复杂推理任务上的泛化能力 |
large language model chain-of-thought |
|
|
| 28 |
Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing |
ToxEdit:通过毒性感知知识编辑保障LLM的通用能力 |
large language model instruction following |
|
|
| 29 |
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective |
|
large language model chain-of-thought |
|
|
| 30 |
ArgInstruct: Specialized Instruction Fine-Tuning for Computational Argumentation |
ArgInstruct:面向计算论证的专用指令微调方法 |
large language model instruction following |
|
|
| 31 |
Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions |
提出Chain-of-Talkers (CoTalk),加速密集图像描述的人工标注,提升标注质量。 |
multimodal |
|
|
| 32 |
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging |
提出正交子空间模型合并(OSRM),解决LoRA模型合并时的性能退化问题。 |
large language model |
|
|
| 33 |
Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning |
提出Self-Error-Instruct框架,通过错误泛化提升LLM数学推理能力 |
large language model |
|
|
| 34 |
Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks |
揭示LLM多轮不完全信息横向推理任务中评估幻觉问题,并提出改进方案 |
large language model |
|
|
| 35 |
If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals? |
CounterLogic数据集揭示LLM在反事实推理中逻辑能力下降,并提出Self-Segregate方法显著提升性能。 |
large language model |
|
|
| 36 |
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models |
JQL:一种基于语言模型的多语言预训练数据高效过滤方法 |
large language model |
|
|
| 37 |
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design |
提出层级推测解码框架,解决量化模型中推测解码计算开销过大的问题 |
large language model |
✅ |
|
| 38 |
DeepRTL2: A Versatile Model for RTL-Related Tasks |
DeepRTL2:用于RTL相关任务的多功能大型语言模型 |
large language model |
|
|
| 39 |
Curse of High Dimensionality Issue in Transformer for Long-context Modeling |
提出动态分组注意力(DGA)以解决Transformer长文本建模中的高维诅咒问题 |
large language model |
✅ |
|
| 40 |
Knowledge Base Construction for Knowledge-Augmented Text-to-SQL |
构建知识库增强Text-to-SQL,提升LLM在领域数据库上的查询精度 |
large language model |
|
|
| 41 |
OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature |
OWL数据集揭示LLM在世界文学中跨语言记忆能力,即使对低资源语言也有效。 |
large language model |
|
|
| 42 |
What Has Been Lost with Synthetic Evaluation? |
评估LLM生成基准的有效性:揭示合成评估中信息损失 |
large language model |
|
|
| 43 |
First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay |
提出“窃听代理”范式,利用多模态LLM辅助人类对话,以龙与地下城游戏为例。 |
multimodal |
✅ |
|
| 44 |
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems |
通过整合自动反馈系统的标注,提升自动作文评分的准确性 |
large language model |
|
|
| 45 |
ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM |
ClaimPKG:利用轻量级专用LLM生成伪子图,增强声明验证能力 |
large language model |
|
|
| 46 |
Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development |
提出Co-Saving,一种资源感知的多智能体协作软件开发框架,提升效率和代码质量。 |
large language model |
|
|
| 47 |
GateNLP at SemEval-2025 Task 10: Hierarchical Three-Step Prompting for Multilingual Narrative Classification |
提出分层三步提示(H3Prompt)方法,用于多语言叙事分类,并在SemEval-2025任务中取得领先。 |
large language model |
✅ |
|
| 48 |
Self-Critique and Refinement for Faithful Natural Language Explanations |
提出SR-NLE框架,通过自批判与改进提升LLM自然语言解释的忠实性 |
large language model |
|
|
| 49 |
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning |
GuessArena:提出自适应评估框架,用于评估LLM在特定领域知识和推理能力 |
large language model |
|
|
| 50 |
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs |
LLM中的随机变色龙现象:无关上下文诱导的幻觉揭示了基于类别的(误)泛化 |
large language model |
|
|
| 51 |
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding |
Fast-dLLM:通过KV缓存和并行解码加速Diffusion LLM的训练,无需额外训练。 |
large language model |
|
|
| 52 |
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts |
提出LayerMoE:一种基于层级混合专家模型的LLM高效多语言扩展方法 |
large language model |
|
|
| 53 |
Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation |
提出一种灵活的多LLM集成框架,实现可扩展的知识聚合 |
large language model |
✅ |
|
| 54 |
Fair Document Valuation in LLM Summaries via Shapley Values |
提出基于Shapley值的Cluster Shapley算法,用于LLM摘要中文档贡献的公平评估。 |
large language model |
|
|
| 55 |
SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context |
提出SkewRoute,一种免训练的LLM路由方法,用于知识图谱RAG,通过检索上下文的分数偏度。 |
large language model |
✅ |
|
| 56 |
Measuring Sycophancy of Language Models in Multi-turn Dialogues |
提出SYCON Bench,用于评估多轮对话中语言模型的谄媚行为 |
large language model |
✅ |
|
| 57 |
Advancing Expert Specialization for Better MoE |
提出正交性和方差损失,提升MoE模型专家特化能力 |
large language model |
|
|
| 58 |
InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing |
InComeS:通过压缩与选择机制增强LLM,实现高效的模型编辑 |
large language model |
|
|
| 59 |
ChatCFD: An LLM-Driven Agent for End-to-End CFD Automation with Domain-Specific Structured Reasoning |
ChatCFD:一个基于LLM的端到端CFD自动化Agent,具备领域特定结构化推理能力 |
large language model |
✅ |
|
| 60 |
Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home? |
提出基于相似度的MIA检测框架,保护RAG系统中检索数据的隐私 |
large language model |
|
|
| 61 |
Reviewing Scientific Papers for Critical Problems With Reasoning LLMs: Baseline Approaches and Automatic Evaluation |
利用推理LLM评估科学论文质量:基线方法与自动评估框架 |
large language model |
|
|
| 62 |
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate |
提出一种基于翻译和评估的框架,用于衡量多语言LLM在不同语言间的一致性。 |
large language model |
|
|
| 63 |
Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data |
利用访谈信息增强的大语言模型模拟调查问卷回复,对比分析AI生成数据与人类数据。 |
large language model |
|
|
| 64 |
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack |
揭示视觉-语言模型对抗攻击脆弱性,提出双阶段评估框架与安全对齐规范。 |
multimodal |
|
|
| 65 |
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse |
EFIM:通过改进KV缓存复用,高效服务于LLM的文本填充任务 |
large language model |
✅ |
|
| 66 |
Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries |
提出基于原则性内容选择的多文档摘要方法,提升多样性和个性化。 |
large language model |
|
|