| 1 |
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models |
提出SOB:一个多源结构化输出基准,用于评估大语言模型在结构化数据提取中的质量。 |
large language model |
|
|
| 2 |
LegalMidm: Use-Case-Driven Legal Domain Specialization for Korean Large Language Model |
LegalMidm:针对韩国法律领域,以用例驱动的大语言模型专业化 |
large language model |
|
|
| 3 |
Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs |
提出基于医学实体树的MLLM数据工程框架,提升医学领域复杂推理能力。 |
large language model multimodal |
|
|
| 4 |
Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling |
Luminol-AIDetect:基于文本洗牌困惑度的快速零样本机器生成文本检测 |
large language model |
|
|
| 5 |
CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation |
比较传统方法与LLM,用于食谱营养成分估计,权衡精度与效率 |
large language model |
|
|
| 6 |
Cross-Lingual Jailbreak Detection via Semantic Codebooks |
提出基于语义编码本的跨语言越狱检测方法,无需重训练或特定语言适配。 |
large language model |
|
|
| 7 |
LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation |
提出LLM-ReSum框架,通过自评估提升LLM生成摘要的质量和准确性 |
large language model |
|
|
| 8 |
From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation |
提出依赖驱动的提示管道,用于生成具有连贯性的RPG游戏内容 |
large language model |
|
|
| 9 |
Progressing beyond Art Masterpieces or Touristic Clichés: how to assess your LLMs for cultural alignment? |
提出文化对齐评估数据集构建指南,提升LLM文化敏感性测试的区分度 |
large language model |
|
|
| 10 |
FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments |
FAMA:面向交互式工具使用环境,基于失败感知的开源LLM元代理框架 |
large language model |
|
|
| 11 |
What Makes Good Instruction-Tuning Data? An In-Context Learning Perspective |
提出基于加权上下文影响的指令数据选择框架,提升指令调优效果 |
instruction following |
|
|