| 1 |
BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting |
BacktestBench:用于自动化量化策略回测的大语言模型基准测试平台 |
large language model |
|
|
| 2 |
Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA |
评估LLMLingua-2在扩散大语言模型LLaDA上的Prompt压缩性能,揭示其与自回归模型的差异 |
large language model |
|
|
| 3 |
How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking |
提出BanglaMedVQA数据集,评估LLM在孟加拉语医学视觉问答中的能力 |
large language model foundation model |
|
|
| 4 |
Code as Agent Harness |
提出“代码即Agent Harness”统一视角,研究代码在智能体系统中的核心作用。 |
large language model multimodal |
|
|
| 5 |
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention |
提出DashAttention,一种可微自适应稀疏分层注意力机制,提升长文本建模效率。 |
large language model |
|
|
| 6 |
Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency |
揭示LLM事实性知识回忆能力与模型规模及主题频率的scaling law关系 |
large language model |
|
|
| 7 |
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective |
EvoMemBench:从自进化角度评估LLM Agent记忆能力,填补现有评测体系空白 |
large language model |
✅ |
|
| 8 |
From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG |
提出EPIC,通过偏好对齐的内存构建,实现高效的端侧RAG |
large language model |
|
|
| 9 |
Machine Unlearning for Masked Diffusion Language Models |
提出Masked Diffusion Unlearning (MDU),用于擦除Masked Diffusion语言模型中的特定知识。 |
large language model |
✅ |
|
| 10 |
Multilingual jailbreaking of LLMs using low-resource languages |
利用低资源语言进行多语言LLM越狱攻击研究 |
large language model |
|
|
| 11 |
Context Memorization for Efficient Long Context Generation |
提出Attention-State Memory,解决长文本生成中前缀信息衰减和注意力计算效率问题。 |
large language model |
|
|
| 12 |
Predictive Prefetching for Retrieval-Augmented Generation |
提出预测式预取框架,解决检索增强生成中同步检索带来的延迟问题。 |
large language model |
|
|
| 13 |
Multi-agent AI systems outperform human teams in creativity |
多智能体LLM系统在创造力方面超越人类团队,解决创新难题。 |
large language model |
|
|
| 14 |
MA$^{2}$P: A Meta-Cognitive Autonomous Intelligent Agents Framework for Complex Persuasion |
提出MA²P框架,用于复杂说服场景下提升自主智能体的说服成功率。 |
large language model |
|
|
| 15 |
Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics |
提出探针轨迹以监测大型推理模型的推理动态 |
chain-of-thought |
|
|
| 16 |
Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning |
提出隐式分层GRPO算法,解耦工具调用与执行,提升工具集成数学推理能力 |
large language model |
✅ |
|
| 17 |
Presupposition and Reasoning in Conditionals: A Theory-Based Study of Humans and LLMs |
研究条件句中预设与推理,对比人类与大语言模型在语用学上的差异 |
large language model |
|
|
| 18 |
A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoE |
提出PARAMΔ集成方法以高效扩展多语言LLM |
large language model |
|
|
| 19 |
PPAI: Enabling Personalized LLM Agent Interoperability for Collaborative Edge Intelligence |
PPAI:实现个性化LLM Agent互操作,赋能协同边缘智能 |
large language model |
|
|
| 20 |
Internalizing Tool Knowledge in Small Language Models via QLoRA Fine-Tuning |
通过QLoRA微调,将工具知识内化于小型语言模型中,实现无描述推理。 |
large language model |
|
|