| 1 |
Error Reflection Prompting: Can Large Language Models Successfully Understand Errors? |
提出错误反思提示以提升语言模型的推理能力 |
large language model chain-of-thought |
|
|
| 2 |
Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation |
提出多模态框架评估视觉语言模型的文化能力 |
multimodal |
✅ |
|
| 3 |
GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs |
提出GAICo框架以解决生成AI输出评估标准化问题 |
multimodal |
|
|
| 4 |
Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT |
提出基于局部加权有限自动机的RetoMaton以解决LLM推理不稳定问题 |
large language model chain-of-thought |
|
|
| 5 |
If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition |
提出无前提问题分解框架以增强声明验证的鲁棒性 |
large language model |
|
|
| 6 |
Leveraging Language Models and Machine Learning in Verbal Autopsy Analysis |
利用语言模型和机器学习提升口述尸检分析的准确性 |
multimodal |
|
|
| 7 |
Compiling Prompts, Not Crafting Them: A Reproducible Workflow for AI-Assisted Evidence Synthesis |
提出一种结构化框架以提升系统文献综述的可靠性 |
large language model |
|
|
| 8 |
RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits |
提出RAGAPHENE以解决LLMs对话评估问题 |
large language model |
|
|
| 9 |
How Good are LLM-based Rerankers? An Empirical Analysis of State-of-the-Art Reranking Models |
系统评估LLM重排序模型在信息检索中的表现 |
large language model |
✅ |
|