| 1 |
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models |
提出MME-SCI以解决多模态大语言模型评估中的关键挑战 |
large language model multimodal |
✅ |
|
| 2 |
CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation |
提出CyPortQA以解决港口飓风应对中的多模态数据整合问题 |
large language model multimodal |
|
|
| 3 |
Generics and Default Reasoning in Large Language Models |
评估大型语言模型在默认推理中的表现与局限性 |
large language model chain-of-thought |
|
|
| 4 |
Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study |
比较大型语言模型与儿童语言描述的相似性 |
large language model multimodal |
|
|
| 5 |
Mechanistic Exploration of Backdoored Large Language Model Attention Patterns |
探讨后门攻击对大型语言模型注意力模式的影响 |
large language model |
|
|
| 6 |
A Review of Developmental Interpretability in Large Language Models |
综述大型语言模型的开发性可解释性研究进展 |
large language model |
|
|
| 7 |
ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions? |
提出ViExam基准以评估视觉语言模型在越南多模态考试中的表现 |
multimodal |
|
|
| 8 |
Ask Good Questions for Large Language Models |
提出Ask-Good-Question框架以解决对话系统中的用户困惑问题 |
large language model |
|
|
| 9 |
ALIGN: Word Association Learning for Cultural Alignment in Large Language Models |
提出ALIGN方法以解决大型语言模型的文化偏见问题 |
large language model |
|
|
| 10 |
The Promise of Large Language Models in Digital Health: Evidence from Sentiment Analysis in Online Health Communities |
利用大型语言模型解决数字健康领域情感分析挑战 |
large language model |
|
|
| 11 |
MATA (māta): Mindful Assessment of the Telugu Abilities of Large Language Models |
提出MATA评估数据集以评估大型语言模型的泰卢固语能力 |
large language model |
|
|
| 12 |
Scalable Scientific Interest Profiling Using Large Language Models |
提出基于大语言模型的科学兴趣画像生成方法 |
large language model |
|
|
| 13 |
Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation |
提出LoRA微调方法以解决阿拉伯方言生成问题 |
large language model foundation model |
|
|
| 14 |
Prompt-Based One-Shot Exact Length-Controlled Generation with LLMs |
提出基于提示的一次性精确长度控制生成方法以解决LLMs文本生成问题 |
large language model instruction following |
|
|
| 15 |
MultiFuzz: A Dense Retrieval-based Multi-Agent System for Network Protocol Fuzzing |
提出MultiFuzz以解决传统协议模糊测试的有效性问题 |
large language model chain-of-thought |
|
|
| 16 |
Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA |
提出Pressure-Tune以解决科学问答中的谄媚偏见问题 |
large language model chain-of-thought |
|
|
| 17 |
Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency |
提出Finch-Zk以解决大型语言模型的幻觉检测与缓解问题 |
large language model |
|
|
| 18 |
GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs |
提出GRILE基准以解决罗马尼亚LLMs的语法推理与解释问题 |
large language model |
|
|
| 19 |
Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text |
提出DA-MTL框架以解决LLM生成文本的检测与归属问题 |
large language model |
|
|
| 20 |
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation |
提出PING方法以解决大语言模型的安全性问题 |
large language model |
|
|
| 21 |
DPad: Efficient Diffusion Language Models with Suffix Dropout |
提出DPad以解决扩散语言模型的计算效率问题 |
large language model |
✅ |
|
| 22 |
Prediction is not Explanation: Revisiting the Explanatory Capacity of Mapping Embeddings |
挑战传统假设,揭示词嵌入的解释能力局限性 |
large language model |
|
|
| 23 |
Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams? |
提出Alvorada-Bench以评估语言模型在巴西大学入学考试中的表现 |
chain-of-thought |
|
|
| 24 |
Measuring LLM Code Generation Stability via Structural Entropy |
通过结构熵评估大型语言模型代码生成的稳定性 |
large language model |
|
|
| 25 |
Comparing energy consumption and accuracy in text classification inference |
评估文本分类推理中的能耗与准确性权衡 |
large language model |
|
|
| 26 |
ReviewGraph: A Knowledge Graph Embedding Based Framework for Review Rating Prediction with Sentiment Features |
提出ReviewGraph框架以解决酒店客户评价评分预测问题 |
large language model |
✅ |
|
| 27 |
MGT-Prism: Enhancing Domain Generalization for Machine-Generated Text Detection via Spectral Alignment |
提出MGT-Prism以解决机器生成文本检测的领域泛化问题 |
large language model |
|
|
| 28 |
CRISP: Persistent Concept Unlearning via Sparse Autoencoders |
提出CRISP以解决大语言模型知识去除问题 |
large language model |
|
|