| 1 |
Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures |
综述:面向大语言模型的内在可解释性设计原则与架构 |
large language model |
✅ |
|
| 2 |
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models |
提出SIVR框架,利用LLM内部方差学习不确定性,提升幻觉检测泛化性。 |
large language model |
✅ |
|
| 3 |
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models |
揭示LLM判断中的虚伪性:大型语言模型在语用能力上存在听者-说话者不对称性 |
large language model |
|
|
| 4 |
A Systematic Study of Training-Free Methods for Trustworthy Large Language Models |
系统性评估免训练方法在提升大语言模型可信度方面的有效性与权衡。 |
large language model |
|
|
| 5 |
Optimizing Korean-Centric LLMs via Token Pruning |
通过Token剪枝优化面向韩语的大语言模型,提升生成稳定性和翻译性能。 |
large language model instruction following |
|
|
| 6 |
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration |
提出RAGognizer,通过集成检测头进行幻觉感知微调,提升RAG生成质量。 |
large language model |
|
|
| 7 |
DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition |
DiZiNER:通过模拟Pilot标注过程,利用异构LLM解决零样本NER指令优化问题 |
large language model |
|
|
| 8 |
From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text |
提出双重评估框架,大规模评测LLM在越南法律文本上的推理能力。 |
large language model |
|
|
| 9 |
BAGEL: Benchmarking Animal Knowledge Expertise in Language Models |
提出BAGEL基准以评估语言模型的动物知识能力 |
large language model |
|
|
| 10 |
Can LLMs Understand the Impact of Trauma? Costs and Benefits of LLMs Coding the Interviews of Firearm Violence Survivors |
评估LLM在枪支暴力幸存者访谈编码中的应用:成本与收益分析 |
large language model |
|
|
| 11 |
GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows |
GTA-2:构建通用工具智能体的分层基准,评估原子工具使用到开放式工作流的性能。 |
multimodal |
✅ |
|
| 12 |
Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies |
研究人机交互中人类与AI属性的影响,揭示模拟与真实用户研究的差异 |
chain-of-thought |
|
|
| 13 |
No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus |
PLUM语料库揭示礼貌用语对LLM的影响:跨语言、多模型分析 |
large language model |
|
|
| 14 |
SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation |
提出基于LLM的框架以解决叙事文本中的词义消歧问题 |
large language model |
|
|
| 15 |
Stochasticity in Tokenisation Improves Robustness |
引入随机分词提升大语言模型对对抗攻击的鲁棒性 |
large language model |
|
|
| 16 |
Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms |
通过解耦LLM内部机制,研究其数学推理能力 |
large language model |
|
|
| 17 |
Exploring the Capability Boundaries of LLMs in Mastering of Chinese Chouxiang Language |
提出Mouse基准,探索LLM在中文抽象语言理解上的能力边界 |
large language model |
|
|
| 18 |
Qwen3.5-Omni Technical Report |
Qwen3.5-Omni:基于混合专家注意力机制实现卓越的多模态理解与生成能力 |
visual grounding |
|
|
| 19 |
CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents |
提出CHOP框架,通过分块上下文保持提升多文档RAG系统的检索精度。 |
large language model |
|
|
| 20 |
MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents |
MemEvoBench:评估LLM Agent中记忆错误演化的基准测试 |
large language model |
|
|
| 21 |
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing |
Skill-RAG:通过隐状态探测和技能路由实现故障感知检索增强生成 |
large language model |
|
|
| 22 |
Preference Estimation via Opponent Modeling in Multi-Agent Negotiation |
提出一种基于对手建模的偏好估计方法,提升多方协商中的协议达成率和偏好估计精度。 |
large language model |
|
|
| 23 |
C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment |
C-Mining:通过几何错位无监督地发现文化数据合成的种子。 |
large language model |
|
|
| 24 |
LLMs Corrupt Your Documents When You Delegate |
揭示LLM在委托任务中易引入文档错误,提出DELEGATE-52基准评测 |
large language model |
|
|