| 1 |
Evaluating Multimodal Large Language Models on Educational Textbook Question Answering |
评估多模态大语言模型在教材问答任务中的表现 |
large language model multimodal |
|
|
| 2 |
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability |
提出Ordered CommonGen基准以评估LLMs的组合泛化与指令遵循能力 |
large language model instruction following |
|
|
| 3 |
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification |
提出SciVer以评估多模态科学声明验证中的基础模型能力 |
foundation model multimodal |
|
|
| 4 |
The Compositional Architecture of Regret in Large Language Models |
提出新方法以识别和分析大语言模型中的遗憾机制 |
large language model |
|
|
| 5 |
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts |
提出WikiMixQA基准以解决多模态文档理解问题 |
multimodal |
|
|
| 6 |
Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models |
提出GIFI以评估大型语言模型的性别多样性 |
large language model |
|
|
| 7 |
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction |
提出PredGen以解决大语言模型实时语音交互中的延迟问题 |
large language model |
|
|
| 8 |
COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation |
提出COSMMIC以解决印度语言多模态摘要生成问题 |
multimodal |
|
|
| 9 |
DeVisE: Behavioral Testing of Medical Large Language Models |
提出DeVisE框架以评估医疗大语言模型的行为表现 |
large language model |
|
|
| 10 |
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding |
提出PaceLLM以解决长上下文理解问题 |
large language model |
|
|
| 11 |
A Comparative Study of Task Adaptation Techniques of Large Language Models for Identifying Sustainable Development Goals |
比较大型语言模型任务适应技术以识别可持续发展目标 |
large language model |
|
|
| 12 |
FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning |
提出FinEval-KR框架以解决金融领域LLM评估问题 |
large language model |
|
|
| 13 |
A General Method for Detecting Information Generated by Large Language Models |
提出通用LLM检测器以解决信息生成识别问题 |
large language model |
|
|
| 14 |
Hybrid EEG--Driven Brain--Computer Interface: A Large Language Model Framework for Personalized Language Rehabilitation |
提出混合EEG驱动的脑机接口以解决个性化语言康复问题 |
large language model |
|
|
| 15 |
Identifying economic narratives in large text corpora -- An integrated approach using Large Language Models |
利用大型语言模型提取经济叙事以解决文本分析问题 |
large language model |
|
|
| 16 |
Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning |
提出单轮自我纠错机制以提升语言模型推理能力 |
large language model chain-of-thought |
|
|
| 17 |
Understanding GUI Agent Localization Biases through Logit Sharpness |
提出细粒度评估框架以解决GUI代理定位偏差问题 |
large language model multimodal |
|
|
| 18 |
Research on Graph-Retrieval Augmented Generation Based on Historical Text Knowledge Graphs |
提出Graph RAG框架以解决历史文本分析中的知识缺口问题 |
large language model chain-of-thought |
|
|
| 19 |
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs |
提出ProtoReasoning以解决大规模语言模型推理能力不足问题 |
large language model chain-of-thought |
|
|
| 20 |
TokenShapley: Token Level Context Attribution with Shapley Value |
提出TokenShapley以解决LLM生成响应的关键词归因问题 |
large language model |
|
|
| 21 |
Veracity: An Open-Source AI Fact-Checking System |
提出Veracity以应对虚假信息问题 |
large language model |
|
|
| 22 |
Context-Informed Grounding Supervision |
提出上下文信息引导监督以解决生成模型的基础问题 |
large language model |
|
|
| 23 |
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation |
提出RE-IMAGINE框架以评估大型语言模型的推理能力 |
large language model |
|
|
| 24 |
Rethinking LLM Training through Information Geometry and Quantum Metrics |
通过信息几何与量子度量重新思考大语言模型训练 |
large language model |
|
|
| 25 |
PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning |
提出PhantomHunter以解决未见私有调优LLM生成文本检测问题 |
large language model |
|
|
| 26 |
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need |
提出AgentGroupChat-V2以解决多智能体系统的复杂任务挑战 |
large language model |
✅ |
|
| 27 |
PRAISE: Enhancing Product Descriptions with LLM-Driven Structured Insights |
提出PRAISE以解决电商产品描述不准确问题 |
large language model |
|
|
| 28 |
SANSKRITI: A Comprehensive Benchmark for Evaluating Language Models' Knowledge of Indian Culture |
提出SANSKRITI基准以评估语言模型对印度文化的理解 |
large language model |
|
|
| 29 |
Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants |
提出一种新数据集以评估巴斯克和西班牙语言变体的NLI性能 |
large language model |
|
|
| 30 |
Mix-of-Language-Experts Architecture for Multilingual Programming |
提出MoLE架构以解决多语言编程中的效率与专业化问题 |
large language model |
|
|
| 31 |
Representation Consistency for Accurate and Coherent LLM Answer Aggregation |
提出表示一致性方法以提升LLM答案聚合的准确性 |
large language model |
|
|
| 32 |
Learning-Time Encoding Shapes Unlearning in LLMs |
提出学习时间编码以解决大语言模型的去学习问题 |
large language model |
|
|