| 1 |
BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment |
BayLing 2:通过高效语言对齐增强多语言大语言模型能力 |
large language model foundation model |
|
|
| 2 |
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? |
提出SOCRATES数据集,评估大语言模型在无捷径条件下潜在多跳推理能力。 |
large language model chain-of-thought |
|
|
| 3 |
AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning |
AtomR:原子操作赋能大语言模型,用于异构知识推理 |
large language model chain-of-thought |
|
|
| 4 |
TransCompressor: LLM-Powered Multimodal Data Compression for Smart Transportation |
TransCompressor:利用LLM进行智能交通多模态数据压缩与重建 |
large language model multimodal |
|
|
| 5 |
Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring |
评估LLM在多维度作文评分中的表现,探索AI辅助教师的新途径 |
large language model |
|
|
| 6 |
Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models |
提出基于多大型语言模型共识的框架,提升复杂问题回答的可靠性。 |
large language model |
|
|
| 7 |
EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code |
EnStack:一种基于大语言模型集成堆叠的源代码漏洞检测框架 |
large language model |
|
|
| 8 |
DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings |
提出DoubleCCA方法,利用随机句子嵌入增强基础模型对群体偏见的鲁棒性。 |
foundation model |
|
|
| 9 |
Lessons from Studying Two-Hop Latent Reasoning |
研究表明大语言模型具备潜在的双跳推理能力,但事实组合仍具挑战 |
large language model chain-of-thought |
|
|
| 10 |
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision |
提出基于批判模型的LLM推理增强方法,提升复杂推理任务性能 |
large language model |
|
|
| 11 |
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions (Full Thesis) |
提出检索增强训练数据集(RATD)和知识融合方法,提升小模型在复杂推理问答上的泛化能力。 |
large language model |
|
|
| 12 |
What can LLM tell us about cities? |
利用大型语言模型探索城市知识:一种数据驱动的城市研究新范式 |
large language model |
|
|
| 13 |
Parameter Efficient Instruction Tuning: An Empirical Study |
参数高效指令调优实证研究:揭示LoRA和Adapter的性能边界与适用场景 |
instruction following |
|
|
| 14 |
LLM Augmentations to support Analytical Reasoning over Multiple Documents |
提出动态证据树以增强多文档分析推理能力 |
large language model |
|
|
| 15 |
Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings |
提出基于刻板印象维度偏见剖析方法,用于评估大型语言模型中的性别偏见。 |
large language model |
|
|
| 16 |
Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval |
提出基于计算图检索的类比学习方法,提升LLM在数学应用题上的少样本提示能力 |
large language model |
|
|
| 17 |
FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web |
FineWeb-zhtw:构建大规模高质量繁体中文网络文本数据集 |
large language model |
|
|
| 18 |
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines |
提出多模态检索增强多模态生成框架M²RAG,并构建数据集、评估指标和基线模型。 |
foundation model |
|
|
| 19 |
NormXLogit: The Head-on-Top Never Lies |
提出NormXLogit,一种模型无关的LLM可解释性方法,提升token重要性评估的忠实性。 |
large language model |
|
|
| 20 |
MH-MoE: Multi-Head Mixture-of-Experts |
提出MH-MoE,利用多头机制提升稀疏MoE模型的性能,同时保持参数量和计算量不变。 |
large language model |
|
|
| 21 |
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text |
提出SAGEval框架,利用批判Agent提升无参考开放文本生成评估质量。 |
large language model |
|
|