| 1 |
Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions |
探索大语言模型在多模态情感分析中的应用:挑战与基准 |
large language model multimodal |
|
|
| 2 |
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark |
提出任务分解框架与蒸馏训练,提升开源MLLM在文图生成自动评估中的性能。 |
large language model chain-of-thought |
|
|
| 3 |
"All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations |
提出评估框架,解决低质量标注下模型与人工评估的可靠性问题 |
large language model |
|
|
| 4 |
A Survey on LLM-as-a-Judge |
综述LLM-as-a-Judge:探索构建可靠的大语言模型评判系统的策略与方法。 |
large language model |
|
|
| 5 |
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain |
ChemSafetyBench:化学领域LLM安全基准测试,评估并提升模型安全性。 |
large language model |
✅ |
|
| 6 |
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations |
通过数据变换评估LLM的不可学习性,揭示现有方法的脆弱性 |
large language model |
|
|
| 7 |
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts |
提出LiveEdit,解决视觉语言模型终身知识编辑难题 |
large language model |
|
|