| 1 |
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models |
评估开源大语言模型在医疗文本关键信息抽取和幻觉问题上的表现 |
large language model |
|
|
| 2 |
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese |
BrowseComp-ZH:构建中文Web浏览能力评测基准,揭示LLM在中文信息检索与推理的不足。 |
large language model |
✅ |
|
| 3 |
Efficient Reasoning for LLMs through Speculative Chain-of-Thought |
提出SCoT以降低大型语言模型推理延迟 |
chain-of-thought |
✅ |
|
| 4 |
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? |
VIST-GPT:利用大型多模态模型开启视觉故事讲述新纪元 |
multimodal visual grounding |
|
|
| 5 |
Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting |
提出结构化对话微调SDFT,解决视觉语言模型知识注入中的灾难性遗忘问题 |
multimodal chain-of-thought |
|
|
| 6 |
Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing |
提出UniGuard,通过多任务学习和模型融合,高效保障语言模型安全。 |
large language model |
|
|
| 7 |
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics |
ClimaEmpact:提出领域对齐的小语言模型和数据集,用于极端天气分析 |
large language model |
|
|
| 8 |
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation |
提出端到端检索增强生成框架,提升语音到语音对话模型性能。 |
large language model |
|
|
| 9 |
AndroidGen: Building an Android Language Agent under Data Scarcity |
AndroidGen:一种数据稀缺下构建Android语言代理的框架 |
large language model |
✅ |
|
| 10 |
WuNeng: Hybrid State with Attention |
WuNeng:融合RNN与注意力机制,提升大语言模型的表达能力和上下文连贯性 |
large language model |
|
|
| 11 |
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries |
提出APE-Bench I基准,用于评估LLM在形式化数学库文件级自动化证明工程中的能力。 |
large language model |
|
|