| 1 |
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark |
MMMU-Pro:更鲁棒的多学科多模态理解评测基准 |
multimodal chain-of-thought |
|
|
| 2 |
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models |
发布中文多模态数学数据集CMM-Math,用于评估和提升大模型数学推理能力。 |
large language model multimodal |
|
|
| 3 |
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation |
提出PUB:用于评估大语言模型在合成视觉数据理解能力上的基准数据集 |
large language model multimodal |
|
|
| 4 |
CLUE: Concept-Level Uncertainty Estimation for Large Language Models |
提出概念级不确定性估计CLUE,提升LLM生成结果的可解释性与可靠性 |
large language model |
|
|
| 5 |
Exploring Sentiment Dynamics and Predictive Behaviors in Cryptocurrency Discussions by Few-Shot Learning with Large Language Models |
利用GPT-4o进行加密货币讨论的情感动态与预测行为分析 |
large language model |
|
|
| 6 |
More is More: Addition Bias in Large Language Models |
揭示大语言模型中的加法偏见:倾向于添加而非删除修改 |
large language model |
|
|
| 7 |
Do Large Language Models Possess Sensitive to Sentiment? |
评估大型语言模型的情感感知能力,揭示其在情感理解上的局限性 |
large language model |
|
|
| 8 |
Detecting Calls to Action in Multimodal Content: Analysis of the 2021 German Federal Election Campaign on Instagram |
利用BERT和GPT-4自动检测社交媒体行动呼吁,分析德国联邦选举Instagram活动 |
multimodal |
|
|
| 9 |
A Comparative Study on Large Language Models for Log Parsing |
对比研究大型语言模型在日志解析任务中的性能,发现开源模型可与商业模型媲美。 |
large language model |
|
|
| 10 |
How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review |
评估大语言模型在隐私合规性方面的能力,并提出隐私技术审查框架。 |
large language model |
|
|
| 11 |
Prompt Baking |
Prompt Baking:将Prompt信息烘焙到LLM权重中,提升零样本性能并更新模型知识。 |
instruction following chain-of-thought |
|
|
| 12 |
Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts |
利用论证理论驱动的提示,揭示大型语言模型在分析中隐含的厌女推理 |
large language model chain-of-thought |
|
|
| 13 |
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs |
提出解混因果自适应(DCA)方法,提升LLM在问题解决中的推理能力 |
large language model |
|
|
| 14 |
ISO: Overlap of Computation and Communication within Seqenence For LLM Inference |
提出序列级计算通信重叠方法ISO,提升LLM推理效率 |
large language model |
|
|
| 15 |
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA) |
提出单轮渐强攻击(STCA),通过单次交互绕过LLM内容审查,引发有害响应。 |
large language model |
|
|
| 16 |
Historical German Text Normalization Using Type- and Token-Based Language Modeling |
提出一种结合类型和Token的Transformer语言模型,用于历史德语文本规范化。 |
large language model |
|
|
| 17 |
Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models? |
针对LLM嵌入模型,研究Pooling和Attention的有效设计方案,并提出多层可训练Pooling方法。 |
large language model |
|
|
| 18 |
DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels |
提出DetectiveQA数据集,用于评估LLM在侦探小说长文本推理中的能力。 |
large language model |
|
|
| 19 |
STAB: Speech Tokenizer Assessment Benchmark |
提出STAB:语音Tokenizer评估基准,用于全面评估和理解语音Tokenizer的特性。 |
large language model |
|
|