| 1 |
PCEval: A Benchmark for Evaluating Physical Computing Capabilities of Large Language Models |
PCEval:用于评估大语言模型物理计算能力的首个自动化基准 |
large language model |
|
|
| 2 |
Do Large Language Models Know What They Are Capable Of? |
评估大语言模型自我认知能力:模型能否准确预测自身任务表现? |
large language model |
|
|
| 3 |
Large language models and the entropy of English |
利用大型语言模型揭示英语文本中的长程结构与依赖关系 |
large language model |
|
|
| 4 |
Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models |
针对开源推理大语言模型,构建计算-精度帕累托前沿,优化工业应用选型。 |
large language model |
|
|
| 5 |
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time |
提出CREST,通过干预注意力头引导LLM推理,提升效率和准确率。 |
large language model chain-of-thought |
|
|
| 6 |
Adaptive Dependency-aware Prompt Optimization Framework for Multi-Step LLM Pipeline |
提出ADOPT框架,自适应优化多步LLM流水线中的提示,提升复杂任务性能。 |
large language model |
|
|
| 7 |
Speculative Decoding: Performance or Illusion? |
系统性分析推测解码在生产级LLM推理引擎中的加速效果与瓶颈 |
large language model |
|
|
| 8 |
Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements |
提出Encyclo-K,通过动态组合知识语句评估LLM的综合理解能力 |
large language model |
|
|
| 9 |
Safe in the Future, Dangerous in the Past: Dissecting Temporal and Linguistic Vulnerabilities in LLMs |
揭示大语言模型在语言和时间维度上的安全漏洞,提出不变对齐 |
large language model |
|
|
| 10 |
RIMRULE: Improving Tool-Using Language Agents via MDL-Guided Rule Learning |
RIMRULE:通过MDL引导的规则学习提升工具使用语言Agent能力 |
large language model |
|
|
| 11 |
Vibe Coding, Interface Flattening |
分析“Vibe Coding”范式,揭示LLM驱动开发中界面扁平化与控制权转移的矛盾 |
large language model |
|
|
| 12 |
MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models |
提出MUSIC:多步指令对比方法,提升多轮对话奖励模型性能 |
large language model |
|
|
| 13 |
Quantum Visual Word Sense Disambiguation: Unraveling Ambiguities Through Quantum Inference Model |
提出量子推理模型以解决视觉词义消歧问题 |
large language model |
|
|