| 1 |
Cross-Lingual Empirical Evaluation of Large Language Models for Arabic Medical Tasks |
跨语言评估大型语言模型在阿拉伯语医疗任务中的表现差距 |
large language model |
|
|
| 2 |
Transport and Merge: Cross-Architecture Merging for Large Language Models |
提出基于最优传输的跨架构模型融合方法,实现大模型知识向异构小模型的迁移。 |
large language model |
|
|
| 3 |
A Systematic Evaluation of Large Language Models for PTSD Severity Estimation: The Role of Contextual Knowledge and Modeling Strategies |
系统评估大型语言模型在创伤后应激障碍严重程度评估中的作用,着重上下文知识和建模策略的影响。 |
large language model |
|
|
| 4 |
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions |
OdysseyArena:用于长时程、主动和归纳交互的大语言模型基准测试 |
large language model |
✅ |
|
| 5 |
Consensus-Aligned Neuron Efficient Fine-Tuning Large Language Models for Multi-Domain Machine Translation |
提出共识对齐神经元高效微调方法,提升大语言模型在多领域机器翻译中的性能 |
large language model |
|
|
| 6 |
CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models |
提出CASTLE:一个综合性评测基准,用于评估大语言模型中面向学生的个性化安全。 |
large language model |
|
|
| 7 |
SciDef: Automating Definition Extraction from Academic Literature with Large Language Models |
SciDef:提出一种基于大语言模型的学术文献定义自动抽取流程。 |
large language model |
✅ |
|
| 8 |
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration |
OPUS:通过优化器引导的投影效用选择,实现大语言模型预训练的迭代式高效数据选择。 |
large language model |
|
|
| 9 |
IESR:Efficient MCTS-Based Modular Reasoning for Text-to-SQL with Large Language Models |
IESR:一种高效的基于MCTS的模块化推理框架,用于大型语言模型上的Text-to-SQL任务 |
large language model |
✅ |
|
| 10 |
EuroLLM-22B: Technical Report |
EuroLLM-22B:为欧洲公民需求从头训练的多语言大语言模型 |
large language model instruction following |
|
|
| 11 |
RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference |
RRAttention:通过轮询移位实现动态块稀疏注意力,加速长文本推理。 |
large language model multimodal |
|
|
| 12 |
Multi-Task GRPO: Reliable LLM Reasoning Across Tasks |
提出MT-GRPO算法,提升LLM在多任务场景下的可靠推理性能,尤其关注最差任务表现。 |
large language model |
|
|
| 13 |
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better |
提出LET:利用小模型知识加速大语言模型训练,提升性能。 |
large language model |
|
|
| 14 |
CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs |
提出CoPE:一种可扩展的RoPE软裁剪方法,提升长文本LLM性能 |
large language model |
✅ |
|
| 15 |
Are Open-Weight LLMs Ready for Social Media Moderation? A Comparative Study on Bluesky |
评估开源LLM在社交媒体审核中的应用:以Bluesky平台为例 |
large language model |
|
|
| 16 |
LinguistAgent: A Reflective Multi-Model Platform for Automated Linguistic Annotation |
LinguistAgent:一个用于自动语言学标注的反射式多模型平台 |
large language model |
✅ |
|
| 17 |
Structured Context Engineering for File-Native Agentic Systems: Evaluating Schema Accuracy, Format Effectiveness, and Multi-File Navigation at Scale |
针对文件原生Agent系统,研究结构化上下文工程对SQL生成任务的影响 |
large language model |
|
|
| 18 |
Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science |
构建集体AI科学体系:提出基于LLM的多智能体系统从盲目试错到严谨科学的过渡框架 |
large language model |
|
|
| 19 |
DFlash: Block Diffusion for Flash Speculative Decoding |
DFlash:提出基于块扩散的Flash推测解码框架,加速LLM推理。 |
large language model |
|
|
| 20 |
DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs |
提出动态滑动块调度(DSB)方法,提升Diffusion LLM的生成质量和效率。 |
large language model |
✅ |
|
| 21 |
KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs |
KV-CoRE:一种评估LLM中KV-Cache数据依赖性低秩可压缩性的基准方法 |
large language model |
|
|
| 22 |
Codified Finite-state Machines for Role-playing |
提出Codified有限状态机,提升LLM角色扮演的连贯性和可控性 |
large language model |
|
|
| 23 |
Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs |
提出CFA²框架,利用因果前门调整实现对LLM的鲁棒越狱攻击 |
large language model |
|
|
| 24 |
Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models |
提出M2CQA基准与CFHR指标,揭示多语言视觉-语言模型在文化背景下的反事实幻觉问题 |
multimodal |
|
|
| 25 |
Multi-Field Tool Retrieval |
提出多字段工具检索框架,解决LLM工具检索中的语义鸿沟与多维度建模问题。 |
large language model |
|
|
| 26 |
FedMosaic: Federated Retrieval-Augmented Generation via Parametric Adapters |
FedMosaic:基于参数化适配器的联邦检索增强生成框架,解决隐私场景下的知识孤岛问题。 |
large language model |
|
|
| 27 |
Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions |
Bagpiper:通过富文本描述解决开放域音频任务的80亿参数音频基础模型 |
foundation model |
|
|