| 1 |
How to Choose a Threshold for an Evaluation Metric for Large Language Models |
提出一种基于风险管理的LLM评估指标阈值选择方法,保障模型可靠性 |
large language model |
|
|
| 2 |
Rethinking Emotion Annotations in the Era of Large Language Models |
利用大型语言模型辅助情感标注,提升标注质量与效率 |
large language model |
|
|
| 3 |
Zero-Shot ATC Coding with Large Language Models for Clinical Assessments |
提出基于LLM的零样本ATC编码方法,解决临床评估中人工编码瓶颈。 |
large language model |
|
|
| 4 |
Searching for Structure: Investigating Emergent Communication with Large Language Models |
利用大型语言模型探索涌现通信结构,模拟语言进化。 |
large language model |
|
|
| 5 |
Generating Knowledge Graphs from Large Language Models: A Comparative Study of GPT-4, LLaMA 2, and BERT |
利用大型语言模型生成知识图谱,提升GraphRAG性能 |
large language model |
|
|
| 6 |
CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models |
提出CMT:一种面向大语言模型持续知识学习的记忆压缩方法 |
large language model |
|
|
| 7 |
SpecFuse: Ensembling Large Language Models via Next-Segment Prediction |
SpecFuse:通过下一片段预测集成大型语言模型,提升生成质量。 |
large language model |
|
|
| 8 |
Towards Predictive Communication with Brain-Computer Interfaces integrating Large Language Models |
结合大型语言模型的脑机接口预测通信方法研究综述 |
large language model |
|
|
| 9 |
Automatic Item Generation for Personality Situational Judgment Tests with Large Language Models |
利用GPT-4自动生成人格情境判断测验,提升心理测评效率与质量 |
large language model |
|
|
| 10 |
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model |
提出“巴别塔假说”,揭示多语言代码大模型能力演进过程并优化预训练语料。 |
large language model |
|
|
| 11 |
Multimodal Sentiment Analysis Based on Causal Reasoning |
提出基于因果推理的对抗多模态情感分析框架,解决单模态数据偏差问题。 |
multimodal |
|
|
| 12 |
PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips |
PrisonBreak:仅需少量目标比特翻转即可破解大型语言模型的安全对齐 |
large language model |
|
|
| 13 |
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models |
提出HARPE:一种单阶段长文本建模方法,突破LLM上下文长度限制 |
large language model |
|
|
| 14 |
ChocoLlama: Lessons Learned From Teaching Llamas Dutch |
ChocoLlama:探索将Llama模型适配到低资源荷兰语的策略 |
large language model foundation model |
|
|
| 15 |
Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering |
通过激活工程识别和操纵LLM中的人格特质 |
large language model |
|
|
| 16 |
HalluCana: Fixing LLM Hallucination with A Canary Lookahead |
HalluCana:利用Canary Lookahead机制修正大语言模型的事实性幻觉 |
large language model |
|
|
| 17 |
Forking Paths in Neural Text Generation |
提出一种新方法,无需微调或访问模型权重即可评估LLM文本生成中的不确定性,并发现关键分叉token。 |
large language model |
|
|
| 18 |
Asking Again and Again: Exploring LLM Robustness to Repeated Questions |
研究重复提问对LLM阅读理解能力的影响,发现重复提问对性能提升不显著 |
large language model |
|
|
| 19 |
Granite Guardian |
Granite Guardian:开源LLM安全保障模型,覆盖多维度风险检测。 |
large language model |
✅ |
|
| 20 |
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation |
TRIM:通过Token缩减与推理建模,实现高性价比的语言生成。 |
large language model |
|
|
| 21 |
FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks |
FlexLLM:探索LLM定制化,针对黑盒LLM的越狱攻击实施移动目标防御 |
large language model |
|
|
| 22 |
Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation |
提出主动推理框架以解决大语言模型适应性不足问题 |
large language model |
|
|
| 23 |
DRUM: Learning Demonstration Retriever for Large MUlti-modal Models |
DRUM:学习演示检索器,提升大型多模态模型上下文学习能力 |
large language model |
|
|
| 24 |
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation |
提出LLM-as-an-Interviewer框架,用于动态评估大型语言模型 |
large language model |
✅ |
|
| 25 |
CoPrUS: Consistency Preserving Utterance Synthesis towards more realistic benchmark dialogues |
CoPrUS:通过保持一致性的语句合成,生成更真实的基准对话数据集 |
large language model |
|
|
| 26 |
Bilingual BSARD: Extending Statutory Article Retrieval to Dutch |
提出bBSARD:扩展法规条文检索至荷兰语,并benchmark多种检索模型。 |
foundation model |
|
|
| 27 |
Optimizing Alignment with Less: Leveraging Data Augmentation for Personalized Evaluation |
利用数据增强优化对齐:提升个性化评估中LLM与人类偏好的一致性 |
large language model |
|
|
| 28 |
Algorithmic Phase Transitions in Language Models: A Mechanistic Case Study of Arithmetic |
揭示语言模型算术能力相变:Gemma-2-2b在不同位数加法中采用不同策略 |
large language model |
|
|
| 29 |
Na'vi or Knave: Jailbreaking Language Models via Metaphorical Avatars |
提出AVATAR框架,利用隐喻化身攻击大型语言模型,实现越狱并暴露安全风险。 |
large language model |
|
|
| 30 |
My Words Imply Your Opinion: Reader Agent-based Propagation Enhancement for Personalized Implicit Emotion Analysis |
提出RAPPIE模型,通过读者代理增强个性化隐式情感分析 |
large language model |
|
|
| 31 |
Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation |
提出Frame Representation Hypothesis,用于多token LLM的可解释性和概念引导文本生成。 |
large language model |
✅ |
|
| 32 |
Enhancing Relation Extraction via Supervised Rationale Verification and Feedback |
提出基于监督式理由验证与反馈的关系抽取增强框架,提升LLM在RE任务上的性能。 |
large language model |
|
|
| 33 |
HARP: Hesitation-Aware Reframing in Transformer Inference Pass |
HARP:Transformer推理中基于犹豫感知的重构方法,提升模型性能。 |
large language model |
|
|
| 34 |
Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation |
提出标签置信度感知的不确定性估计方法,提升自然语言生成模型的可靠性。 |
large language model |
|
|
| 35 |
KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context |
提出KULTURE Bench:一个评估语言模型在韩国文化背景下理解能力的基准 |
large language model |
|
|
| 36 |
Multi-Response Preference Optimization with Augmented Ranking Dataset |
提出基于增强排序数据集的多响应偏好优化方法,提升LLM性能 |
large language model |
|
|
| 37 |
Exploring Coding Spot: Understanding Parametric Contributions to LLM Coding Performance |
探索LLM代码能力:揭示参数对代码性能的贡献,发现“代码区域” |
large language model |
|
|
| 38 |
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need |
提出基于代理任务的大语言模型涌现能力预测方法 |
large language model |
|
|