| 1 |
Large Language Model Benchmarks in Medical Tasks |
综述医学领域大语言模型评测基准,促进临床任务的LLM应用。 |
large language model multimodal |
|
|
| 2 |
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text |
提出TransformLLM,通过LLM转换的阅读理解文本来适配大型语言模型,提升其在特定领域的性能。 |
large language model |
|
|
| 3 |
Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation |
SubgraphRAG:利用图结构和轻量级模型提升知识图谱检索增强生成效果 |
large language model |
|
|
| 4 |
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense |
StingrayBench:揭示多语言大模型在跨语言词义消歧方面的局限性 |
large language model |
|
|
| 5 |
Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups |
提出Group-SAE以解决大语言模型稀疏自编码器训练效率问题 |
large language model |
|
|
| 6 |
Can Large Language Models Act as Symbolic Reasoners? |
研究大型语言模型是否具备符号推理能力及其可解释性 |
large language model |
|
|
| 7 |
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart |
提出C$ ext{T}^2$C-QA数据集与AED多智能体系统,用于解决中文文本、表格和图表的多模态问答问题。 |
multimodal |
|
|
| 8 |
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment |
LLMCBench:构建大语言模型压缩基准,促进高效部署 |
large language model |
✅ |
|
| 9 |
A Survey on Automatic Credibility Assessment Using Textual Credibility Signals in the Era of Large Language Models |
在大语言模型时代,综述基于文本可信度信号的自动可信度评估方法。 |
large language model |
|
|
| 10 |
An Actor-Critic Approach to Boosting Text-to-SQL Large Language Model |
提出Actor-Critic框架,提升大语言模型在Text-to-SQL任务中的性能 |
large language model |
|
|
| 11 |
CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models |
提出CRAT多Agent框架,增强LLM在机器翻译中对上下文相关术语的处理能力 |
large language model |
|
|
| 12 |
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates |
NewTerm:构建年度更新的LLM实时新词评测基准,解决知识截断问题。 |
large language model |
✅ |
|
| 13 |
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training |
提出多语言和多质量等级的文本复述方法,用于提升大型语言模型预训练效果 |
large language model |
|
|
| 14 |
ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven Agents |
提出ElectionSim,基于大语言模型驱动的Agent进行大规模选举模拟。 |
large language model |
|
|
| 15 |
SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval |
SandboxAQ探索多语言多任务信息检索,着重分析大语言模型在QA和NER任务上的性能差异。 |
large language model chain-of-thought |
|
|
| 16 |
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression |
MultiTok:一种基于LZW压缩的高效变长分词方法,加速LLM训练。 |
large language model |
|
|
| 17 |
Can Machines Think Like Humans? A Behavioral Evaluation of LLM Agents in Dictator Games |
评估大型语言模型在独裁者游戏中的亲社会行为 |
large language model |
|
|
| 18 |
SCULPT: Systematic Tuning of Long Prompts |
SCULPT:通过系统调优长提示来提升大语言模型性能 |
large language model |
|
|
| 19 |
Graph-based Uncertainty Metrics for Long-form Language Model Outputs |
提出基于图的LLM不确定性度量方法,提升长文本生成的事实性和信息量。 |
large language model |
|
|
| 20 |
Estimating Causal Effects of Text Interventions Leveraging LLMs |
CausalDANN:利用LLM进行文本干预因果效应估计,解决高维文本数据挑战。 |
large language model |
|
|
| 21 |
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation |
EoRA:一种免微调的低秩本征空间近似方法,用于补偿压缩LLM的精度损失。 |
large language model |
✅ |
|
| 22 |
Palisade -- Prompt Injection Detection Framework |
Palisade:一种用于检测提示注入攻击的多层防御框架 |
large language model |
|
|
| 23 |
FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval |
提出FACT迭代上下文重写方法,解决LLM多事实检索中“中间信息丢失”问题 |
large language model |
|
|
| 24 |
FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks |
提出基于哈希认证标签的FATH方法,防御针对LLM应用的间接提示注入攻击。 |
large language model |
✅ |
|
| 25 |
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics |
揭示大语言模型算术能力:并非算法或记忆,而是启发式规则组合 |
large language model |
|
|
| 26 |
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation |
提出M2RC-EVAL:大规模多语言仓库级代码补全评估基准 |
large language model |
|
|
| 27 |
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency |
提出基于符号等价和语义一致性的数学语句自动形式化框架 |
large language model |
|
|
| 28 |
AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline |
提出AutoRAG框架,自动优化检索增强生成(RAG)流水线,提升特定数据集性能。 |
large language model |
✅ |
|
| 29 |
A Simple Yet Effective Corpus Construction Framework for Indonesian Grammatical Error Correction |
提出印尼语语法纠错语料库构建框架,并探索LLM辅助标注可行性 |
large language model |
✅ |
|
| 30 |
LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation |
研究表明,大语言模型在检索增强生成中作为评估者时,偏见不明显,更注重事实准确性。 |
large language model |
|
|
| 31 |
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation |
发现LLM评判者对不确定性表达不鲁棒:存在对认知标记的负偏见 |
large language model |
|
|