| 1 |
Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought |
基于熵轨迹形状预测LLM推理可靠性,诊断思维链中的不确定性动态 |
chain-of-thought |
|
|
| 2 |
UGID: Unified Graph Isomorphism for Debiasing Large Language Models |
提出UGID框架,通过统一图同构性来消除大语言模型中的偏见。 |
large language model |
|
|
| 3 |
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? |
提出MultiTempBench多语言时间推理基准,揭示LLM时间推理能力受分词质量和时间表示影响。 |
large language model |
✅ |
|
| 4 |
Evaluating Counterfactual Strategic Reasoning in Large Language Models |
评估大语言模型在反事实情境下的策略推理能力 |
large language model |
|
|
| 5 |
Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks |
揭示大语言模型评分中的隐性偏见:写作风格如何影响数学、编程和论文任务的自动评估 |
large language model |
|
|
| 6 |
GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms |
GAIN:用于评估大语言模型在不完善规范下目标对齐决策的基准测试。 |
large language model |
|
|
| 7 |
Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs |
提出多模态任务干扰基准,分析多模态LLM中历史-目标不匹配问题 |
multimodal |
|
|
| 8 |
TARo: Token-level Adaptive Routing for LLM Test-time Alignment |
提出TARo:一种Token级自适应路由方法,用于LLM测试时对齐,提升推理能力。 |
large language model instruction following |
|
|
| 9 |
EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models |
EntropyCache:利用解码Token熵引导扩散语言模型的KV缓存,实现高效推理。 |
large language model chain-of-thought |
✅ |
|
| 10 |
Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media |
提出跨模态推理迁移方法,用于社交媒体人道主义分类的可解释性研究。 |
multimodal |
|
|
| 11 |
Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval |
提出假设条件查询重写(HCQR)框架,提升RAG在决策型检索任务中的性能。 |
large language model |
|
|
| 12 |
When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making |
提出ICE-Guard框架以检测LLM决策中的系统性偏见 |
large language model |
|
|
| 13 |
UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference |
提出UT-ACA框架,通过不确定性触发的自适应上下文分配解决长文本推理中的挑战。 |
large language model |
|
|
| 14 |
Parallelograms Strike Back: LLMs Generate Better Analogies than People |
大型语言模型在类比生成任务中优于人类,更符合平行四边形模型 |
large language model |
|
|
| 15 |
Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo |
评估LLM生成的语言学习课程:Duolingo案例研究,强调个性化与专业领域适应性。 |
large language model |
|
|
| 16 |
WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior |
WASD:通过定位关键神经元作为充分条件来解释和控制LLM行为 |
large language model |
|
|
| 17 |
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation |
探究LLM先验听觉知识对音频语言模型的影响:一个全面评估 |
large language model |
|
|
| 18 |
Automatic detection of Gen-AI texts: A comparative framework of neural models |
提出并比较多种神经模型,用于自动检测生成式AI文本,解决学术、编辑和社会领域难题。 |
large language model |
|
|
| 19 |
Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition |
提出基于评估分解的跨语言LLM评判迁移方法,解决多语言评估难题。 |
large language model |
|
|
| 20 |
Inducing Sustained Creativity and Diversity in Large Language Models |
提出一种解码方案,用于激发大语言模型持续的创造性和多样性,解决探索性搜索中结果同质化问题。 |
large language model |
|
|
| 21 |
Scalable Prompt Routing via Fine-Grained Latent Task Discovery |
提出基于细粒度潜在任务发现的可扩展Prompt路由方法 |
large language model |
|
|
| 22 |
Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure |
研究表明,基于探针的评估意识证据可能仅反映提示结构,而非模型真正的评估能力。 |
large language model |
|
|