| 1 |
Exploring the Performance of Large Language Models on Subjective Span Identification Tasks |
探索大语言模型在主观文本跨度识别任务中的性能表现 |
large language model chain-of-thought |
|
|
| 2 |
Probabilistic Guarantees for Reducing Contextual Hallucinations in LLMs |
提出一种概率保证框架,用于降低LLM在确定性任务中的上下文幻觉 |
large language model |
|
|
| 3 |
Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence |
JourneyBench:面向业务合规性的客户支持LLM智能体评测基准 |
large language model |
|
|
| 4 |
InfoSynth: Information-Guided Benchmark Synthesis for LLMs |
InfoSynth:信息论指导的LLM基准自动合成框架 |
large language model |
✅ |
|