| 1 |
Evaluating Hallucinations in Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions |
提出RePOPE-Spk基准,评估语音查询下多模态LLM的幻觉问题。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents |
提出MicroRCA-Agent,利用大语言模型Agent进行微服务根因分析 |
large language model multimodal |
✅ |
|
| 3 |
Evaluation of Causal Reasoning for Large Language Models in Contextualized Clinical Scenarios of Laboratory Test Interpretation |
评估大语言模型在实验室测试解读情境下的因果推理能力 |
large language model |
|
|
| 4 |
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey |
全面综述:提升基于大语言模型Agent的泛化能力,应对多样化任务与环境。 |
large language model |
|
|
| 5 |
How Large Language Models are Designed to Hallucinate |
揭示大语言模型幻觉的结构性根源,提出基于存在主义结构的幻觉分类与评测基准。 |
large language model |
|
|
| 6 |
EHR-MCP: Real-world Evaluation of Clinical Information Retrieval by Large Language Models via Model Context Protocol |
EHR-MCP:通过模型上下文协议,在真实医院环境中评估大型语言模型在临床信息检索中的应用 |
large language model |
|
|
| 7 |
The (Short-Term) Effects of Large Language Models on Unemployment and Earnings |
基于合成差分法分析大型语言模型对就业和收入的短期影响 |
large language model |
|
|
| 8 |
GPO: Learning from Critical Steps to Improve LLM Reasoning |
GPO:通过学习关键步骤提升大型语言模型推理能力 |
large language model |
|
|
| 9 |
LightCode: Compiling LLM Inference for Photonic-Electronic Systems |
LightCode:用于光子-电子系统的LLM推理编译框架 |
large language model |
|
|
| 10 |
SENSE-7: Taxonomy and Dataset for Measuring User Perceptions of Empathy in Sustained Human-AI Conversations |
提出SENSE-7数据集与移情分类器,用于衡量用户在人机对话中对AI移情的感知。 |
large language model |
|
|
| 11 |
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models |
通过塑造心理人格特征来调节语言模型的能力和安全性 |
large language model |
|
|
| 12 |
Stress Testing Deliberative Alignment for Anti-Scheming Training |
压力测试审慎对齐方法,评估其在反欺骗训练中的有效性 |
chain-of-thought |
|
|