| 1 |
Faithful Logical Reasoning via Symbolic Chain-of-Thought |
提出SymbCoT框架,结合符号逻辑推理增强大语言模型的逻辑推理能力 |
large language model chain-of-thought |
✅ |
|
| 2 |
Understanding Intrinsic Socioeconomic Biases in Large Language Models |
揭示大语言模型中固有的社会经济偏见,关注交叉性影响。 |
large language model |
|
|
| 3 |
Exploring Context Window of Large Language Models via Decomposed Positional Vectors |
通过分解位置向量探索大语言模型上下文窗口,并提出无训练扩展方法。 |
large language model |
|
|
| 4 |
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization |
提出双向偏好优化方法,生成可控的LLM个性化引导向量。 |
large language model |
|
|
| 5 |
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models |
提出ConSiDERS框架,用于改进生成式大语言模型的人工评估方法 |
large language model |
|
|
| 6 |
An Empirical Analysis on Large Language Models in Debate Evaluation |
研究表明大型语言模型在辩论评估中表现优异,但存在多种偏见 |
large language model |
|
|
| 7 |
Active Use of Latent Constituency Representation in both Humans and Large Language Models |
通过单样本学习任务,揭示人类和大型语言模型中潜在的成分句法结构 |
large language model |
|
|
| 8 |
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints |
利用提示引导大型语言模型,解决临床笔记中的医疗错误识别与纠正问题。 |
large language model |
|
|
| 9 |
Tool Learning with Large Language Models: A Survey |
综述:大型语言模型工具学习,提升复杂问题解决能力 |
large language model |
✅ |
|
| 10 |
Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action |
提出Conv-CoA框架,通过对话式行动链提升大语言模型在开放域问答中的表现。 |
large language model |
|
|
| 11 |
IAPT: Instruction-Aware Prompt Tuning for Large Language Models |
提出指令感知Prompt Tuning(IAPT),仅用四个soft token实现高效LLM微调。 |
large language model |
|
|
| 12 |
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models |
提出TimeChara基准,评估角色扮演大语言模型的时间点幻觉问题,并提出Narrative-Experts方法缓解。 |
large language model |
|
|
| 13 |
C$^{3}$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models |
提出C³Bench,用于全面评估大语言模型在古文理解方面的能力 |
large language model |
✅ |
|
| 14 |
Arithmetic Reasoning with LLM: Prolog Generation & Permutation |
提出基于Prolog生成的算术推理方法,提升LLM在数学问题上的表现。 |
large language model chain-of-thought |
|
|
| 15 |
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning |
提出MMCTAgent,用于复杂视觉推理的多模态批判性思维Agent框架 |
large language model |
|
|
| 16 |
Knowledge Circuits in Pretrained Transformers |
揭示Transformer中知识存储方式:提出知识回路以理解模型行为 |
large language model |
✅ |
|
| 17 |
Don't Forget to Connect! Improving RAG with Graph-based Reranking |
提出G-RAG:一种基于图神经网络的RAG重排序方法,提升文档连接和语义理解能力 |
large language model |
|
|
| 18 |
PromptWizard: Task-Aware Prompt Optimization Framework |
PromptWizard:面向任务的自适应提示优化框架,提升LLM性能。 |
large language model |
|
|
| 19 |
The Impossibility of Fair LLMs |
论证通用大语言模型(LLM)公平性的内在不可行性 |
large language model |
|
|
| 20 |
LLMs and Memorization: On Quality and Specificity of Copyright Compliance |
提出系统性分析方法,评估大型语言模型在版权合规性方面的表现,并分析其规避行为。 |
large language model |
✅ |
|
| 21 |
The Battle of LLMs: A Comparative Study in Conversational QA Tasks |
对比研究:大型语言模型在对话式问答任务中的性能评估 |
large language model |
|
|
| 22 |
MockLLM: A Multi-Agent Behavior Collaboration Framework for Online Job Seeking and Recruiting |
提出MockLLM,用于在线招聘中模拟面试交互,提升人岗匹配精度。 |
large language model |
|
|
| 23 |
Spanish and LLM Benchmarks: is MMLU Lost in Translation? |
揭示MMLU基准测试翻译陷阱:西班牙语场景下LLM性能评估的挑战与改进 |
large language model |
|
|
| 24 |
fMRI predictors based on language models of increasing complexity recover brain left lateralization |
利用不同复杂度的语言模型预测fMRI,揭示大脑左侧优势 |
large language model |
|
|
| 25 |
Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations |
综述个性化对话生成:数据集、方法与评估的最新趋势 |
large language model |
|
|
| 26 |
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference |
XL3M:一种基于分段推理的LLM长度泛化零训练框架 |
large language model |
|
|
| 27 |
Recent Advances of Foundation Language Models-based Continual Learning: A Survey |
综述:基于大语言模型的持续学习研究进展与方法分类 |
large language model |
|
|
| 28 |
Decoding moral judgement from text: a pilot study |
探索基于文本的道德判断解码:一项脑机接口的初步研究 |
large language model |
|
|
| 29 |
Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning |
提出泰语Winograd Schema基准测试,用于评估泰语常识推理能力 |
large language model |
|
|
| 30 |
Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning |
针对知识学习,提出基于语义感知的参数高效微调方法 |
large language model |
|
|
| 31 |
Aligning to Thousands of Preferences via System Message Generalization |
提出Janus,通过系统消息泛化实现LLM对用户个性化偏好的对齐 |
large language model |
✅ |
|
| 32 |
More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs |
提出ALoRA,解决领域LLM通用能力集成难题,提升领域任务性能 |
large language model |
|
|
| 33 |
Detection-Correction Structure via General Language Model for Grammatical Error Correction |
提出基于通用语言模型GLM的检测-纠错结构DeCoGLM,用于语法纠错。 |
large language model |
|
|