| 1 |
When Language Overrules: Revealing Text Dominance in Multimodal Large Language Models |
系统性揭示多模态大语言模型中文本主导现象,并提出token压缩方法有效缓解该问题。 |
large language model multimodal |
|
|
| 2 |
MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding |
提出MAC:一个用于评估多模态大语言模型科学理解能力的动态基准 |
large language model multimodal |
✅ |
|
| 3 |
DiFaR: Enhancing Multimodal Misinformation Detection with Diverse, Factual, and Relevant Rationales |
DiFaR:通过多样、真实、相关的理由增强多模态错误信息检测 |
multimodal chain-of-thought |
|
|
| 4 |
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents |
提出MM-BrowseComp多模态浏览代理基准,评估模型在复杂网页环境下的多模态推理能力。 |
multimodal |
|
|
| 5 |
SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth |
SproutBench:面向青少年安全和伦理的大语言模型评测基准 |
large language model |
|
|
| 6 |
DLLMQuant: Quantizing Diffusion-based Large Language Models |
DLLMQuant:为基于扩散的大语言模型提出量化方案,提升压缩效率。 |
large language model |
|
|
| 7 |
Yet another algorithmic bias: A Discursive Analysis of Large Language Models Reinforcing Dominant Discourses on Gender and Race |
提出一种定性分析框架,揭示大型语言模型在性别和种族议题上强化主流话语的偏见。 |
large language model |
|
|
| 8 |
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints |
提出基于计算经济学框架的LLM训练方法,提升资源约束下的模型效率与可解释性 |
large language model |
|
|
| 9 |
Large Language Models for Summarizing Czech Historical Documents and Beyond |
利用大型语言模型在捷克历史文档摘要任务中取得新突破 |
large language model |
|
|
| 10 |
Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules |
Rule2Text框架利用大语言模型为知识图谱规则生成自然语言解释,提升可理解性。 |
large language model chain-of-thought |
✅ |
|
| 11 |
Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning |
提出Psyche-R1,首个融合共情、专业知识和推理能力的中文心理学大语言模型。 |
large language model chain-of-thought |
|
|
| 12 |
Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs |
提出ICE框架,利用扩散LLM的In-Place Prompting提升推理性能并加速计算。 |
large language model chain-of-thought |
|
|
| 13 |
A Survey on Diffusion Language Models |
综述扩散语言模型:探索并行生成、双向上下文建模及可控生成的新范式。 |
multimodal |
✅ |
|
| 14 |
Reinforced Language Models for Sequential Decision Making |
提出MS-GRPO算法,用于提升小规模语言模型在序列决策任务中的性能。 |
large language model |
|
|
| 15 |
Chain-of-Query: Unleashing the Power of LLMs in SQL-Aided Table Understanding via Multi-Agent Collaboration |
提出Chain-of-Query,通过多智能体协作提升LLM在SQL辅助表格理解中的能力 |
large language model |
✅ |
|
| 16 |
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics |
研究LLM跨语言知识迁移失败问题,揭示统一表征的重要性并提出调控方法 |
large language model |
|
|
| 17 |
SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression |
SurfaceLogicKV:利用表面和逻辑注意力实现鲁棒的KV缓存压缩 |
large language model |
|
|
| 18 |
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation |
提出基于稀疏自编码器的分层扰动方法,用于对抗性文本生成,提升大语言模型安全性。 |
large language model |
|
|
| 19 |
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks |
ReportBench:通过学术综述任务评估深度研究智能体的性能 |
large language model |
✅ |
|
| 20 |
Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning |
提出RAMP框架,通过反思、记忆和规划提升营销应用中多智能体系统的可靠性 |
large language model |
|
|
| 21 |
Approaching the Source of Symbol Grounding with Confluent Reductions of Abstract Meaning Representation Directed Graphs |
利用融合归约抽象意义表示有向图,探索符号 grounding 的源头 |
large language model |
|
|
| 22 |
BIPOLAR: Polarization-based granular framework for LLM bias evaluation |
BIPOLAR:提出一种基于极化的细粒度框架,用于评估LLM中的偏见。 |
large language model |
|
|
| 23 |
eDIF: A European Deep Inference Fabric for Remote Interpretability of LLM |
构建欧洲深度推理平台eDIF,实现LLM远程可解释性研究 |
large language model |
|
|
| 24 |
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts |
提出DH-CoT攻击,有效破解商业黑盒LLM的恶意内容防御 |
chain-of-thought |
|
|
| 25 |
Cross-Prompt Encoder for Low-Performing Languages |
提出跨提示编码器XPE,提升低资源语言在参数高效微调中的性能 |
large language model |
|
|