| 1 |
Cross-Modal Consistency in Multimodal Large Language Models |
提出跨模态一致性评估框架,揭示GPT-4V在视觉和语言模态间的不一致性 |
large language model multimodal |
|
|
| 2 |
Piecing It All Together: Verifying Multi-Hop Multimodal Claims |
提出MMCV数据集,用于评估多跳多模态信息的可信度验证 |
large language model multimodal |
|
|
| 3 |
Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide |
综述有限数据下大语言模型微调方法,提供实用指南 |
large language model |
|
|
| 4 |
Task-Aligned Tool Recommendation for Large Language Models |
提出PTR方法,为大语言模型精准推荐任务对齐的工具集 |
large language model |
|
|
| 5 |
DROJ: A Prompt-Driven Attack against Large Language Models |
提出DROJ,一种通过优化嵌入表示绕过LLM安全机制的提示攻击方法 |
large language model |
✅ |
|
| 6 |
Evaluating Gender Bias in Large Language Models |
评估大型语言模型在职业语境中基于代词选择的性别偏见 |
large language model |
|
|
| 7 |
Evaluating the Predictive Capacity of ChatGPT for Academic Peer Review Outcomes Across Multiple Platforms |
评估ChatGPT在多个平台预测学术同行评审结果的能力 |
large language model chain-of-thought |
|
|
| 8 |
The Moral Foundations Weibo Corpus |
构建道德基础微博语料库,用于中文道德情感分析与模型训练。 |
large language model |
|
|
| 9 |
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs |
MM-Eval:用于评估LLM在现代蒙古语能力的分层基准 |
large language model |
✅ |
|
| 10 |
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams |
提出StreamAdapter以解决测试时适应效率低下问题 |
large language model |
|
|
| 11 |
Squeezed Attention: Accelerating Long Context Length LLM Inference |
Squeezed Attention:通过离线聚类和稀疏注意力加速长文本LLM推理。 |
large language model |
✅ |
|
| 12 |
Adaptive Decoding via Latent Preference Optimization |
提出基于隐偏好优化的自适应解码方法,动态调整语言模型生成温度以提升性能 |
instruction following |
|
|
| 13 |
BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency |
BabyLM挑战:探索变异集对语言模型训练效率的影响 |
large language model |
|
|
| 14 |
DTELS: Towards Dynamic Granularity of Timeline Summarization |
提出DTELS:一种动态粒度时间线摘要新范式,并构建了相应的基准数据集和评估框架。 |
large language model |
|
|
| 15 |
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs |
P-MMEval:用于一致评估LLM的并行多语言多任务基准 |
large language model |
✅ |
|