| 1 |
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models |
提出ODE:一种开放集动态评估多模态大语言模型幻觉的方法 |
large language model multimodal |
|
|
| 2 |
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web |
提出IW-Bench基准,评估大型多模态模型在图像到Web转换任务中的性能。 |
multimodal chain-of-thought |
|
|
| 3 |
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM |
提出双阶段前缀增强多模态LLM,用于生成电影事件导向的属性 |
large language model multimodal |
|
|
| 4 |
Efficient Fine-Tuning of Large Language Models for Automated Medical Documentation |
MediGen:高效微调LLaMA3-8B,自动化生成医疗文档,减轻医生负担 |
large language model |
|
|
| 5 |
Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer |
研究Transformer计算能力的局限性,并探讨CoT如何弥补循环计算的缺失。 |
chain-of-thought |
|
|
| 6 |
ASR Error Correction using Large Language Models |
利用大语言模型提升ASR纠错性能,无需访问底层代码或模型权重。 |
large language model |
|
|
| 7 |
Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models |
对比检索增强与参数高效微调,实现大语言模型隐私保护的个性化 |
large language model |
|
|
| 8 |
Uddessho: An Extensive Benchmark Dataset for Multimodal Author Intent Classification in Low-Resource Bangla Language |
提出Uddessho数据集,用于低资源孟加拉语多模态作者意图分类。 |
multimodal |
|
|
| 9 |
Enhancing LLM Problem Solving with REAP: Reflection, Explicit Problem Deconstruction, and Advanced Prompting |
REAP方法通过反思、分解和高级提示增强LLM的问题解决能力 |
large language model |
|
|
| 10 |
Constructive Approach to Bidirectional Influence between Qualia Structure and Language Emergence |
构建语言涌现与感受质结构双向影响模型,探索具身认知 |
multimodal |
|
|
| 11 |
Thinking Before Speaking: A Role-playing Model with Mindset |
提出基于思维模式的角色扮演模型,提升LLM的角色模拟真实度 |
large language model |
|
|
| 12 |
Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AI |
提出以人为本的框架,评估生成式AI在社交媒体文本自动标注中的应用,强调人工验证的重要性。 |
large language model |
|
|
| 13 |
Measuring the Influence of Incorrect Code on Test Generation |
研究代码正确性对LLM测试生成的影响:正确代码提升测试准确率57% |
large language model |
|
|
| 14 |
NovAScore: A New Automated Metric for Evaluating Document Level Novelty |
提出NovAScore,一种自动化的文档级新颖性评估指标。 |
large language model |
|
|