| 1 |
Self-Reflection Makes Large Language Models Safer, Less Biased, and Ideologically Neutral |
利用自反思提升大语言模型的安全性、公正性和意识形态中立性 |
large language model chain-of-thought |
✅ |
|
| 2 |
DevBench: A multimodal developmental benchmark for language learning |
DevBench:一个用于语言学习的多模态发展基准测试,旨在弥合模型与儿童语言学习的差距。 |
multimodal |
|
|
| 3 |
CliBench: A Multifaceted and Multigranular Evaluation of Large Language Models for Clinical Decision Making |
CliBench:一个多方面、多粒度的大语言模型临床决策评估基准 |
large language model |
|
|
| 4 |
Evaluation of Large Language Models: STEM education and Gender Stereotypes |
评估大型语言模型在STEM教育和性别刻板印象方面的偏差 |
large language model |
|
|
| 5 |
RadEx: A Framework for Structured Information Extraction from Radiology Reports based on Large Language Models |
RadEx:基于大型语言模型的放射报告结构化信息抽取框架 |
large language model |
|
|
| 6 |
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models |
提出CHiSafetyBench,用于评估中文大语言模型安全性的分层基准 |
large language model |
✅ |
|
| 7 |
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations |
综述:面向医疗应用的大语言模型,聚焦数据集、方法和评估 |
large language model |
|
|
| 8 |
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading |
SciEx:提出一个基于大学计算机科学考试题的LLM评测基准,包含人工和自动评分。 |
large language model |
|
|
| 9 |
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages |
SEACrowd:构建东南亚语言多模态数据中心与基准评测体系 |
multimodal |
|
|
| 10 |
On the Evaluation of Speech Foundation Models for Spoken Language Understanding |
评估语音基础模型以提升口语理解任务的效果 |
foundation model |
|
|
| 11 |
Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering |
提出融合图推理与大语言模型的对话式问答方法,提升复杂推理能力。 |
large language model |
|
|
| 12 |
GEB-1.3B: Open Lightweight Large Language Model |
提出轻量级开源大语言模型GEB-1.3B,优化CPU推理效率。 |
large language model |
|
|
| 13 |
Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models |
提出动态知识注入方法,提升语言模型在知识库视觉问答任务中的性能。 |
large language model multimodal |
|
|
| 14 |
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation |
ChartMimic:通过图表到代码生成评估LMM的跨模态推理能力 |
multimodal |
|
|
| 15 |
A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention |
提出HiP:一种无需训练的子二次复杂度Transformer模型服务框架,通过分层剪枝注意力机制实现高效长文本处理。 |
large language model |
|
|
| 16 |
HIRO: Hierarchical Information Retrieval Optimization |
提出HIRO,通过深度优先搜索优化RAG中的层级信息检索,提升性能。 |
large language model |
|
|
| 17 |
Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments |
提出RAFTS:通过合成对比论证进行检索增强的事实核查 |
large language model |
|
|
| 18 |
Domain-Specific Shorthand for Generation Based on Context-Free Grammar |
提出基于上下文无关文法的领域特定速记方法,降低生成式AI中结构化数据生成的token数量。 |
large language model |
|
|
| 19 |
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs |
提出Goldfish Loss,降低生成式LLM的记忆化风险,保护隐私和版权。 |
large language model |
|
|
| 20 |
Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation |
研究机器评估同声传译质量与人类评估的相关性,探索GPT模型的应用潜力 |
large language model |
|
|
| 21 |
A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization |
通过优化提示词结构提升LLM在文本生成评估中的性能 |
large language model |
|
|
| 22 |
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages |
BLEnD:一个评估LLM在多元文化和语言日常知识表现的基准 |
large language model |
✅ |
|
| 23 |
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting |
提出基于Rapport构建策略的虚拟代理对话系统,提升首次交互用户体验 |
large language model |
|
|
| 24 |
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey |
综述LLM驱动的合成数据生成、管理与评估,填补领域框架空白。 |
large language model |
|
|
| 25 |
Detecting Response Generation Not Requiring Factual Judgment |
提出DDFC数据集,用于检测对话生成中无需事实性判断的句子 |
large language model |
|
|
| 26 |
FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation |
FreeCtrl:通过前馈层构建控制中心,实现免学习的可控文本生成 |
large language model |
|
|