| 1 |
RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning |
提出RippleCOT,通过思维链上下文学习增强语言模型知识编辑的涟漪效应。 |
large language model chain-of-thought |
|
|
| 2 |
CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions |
提出CommonIT以解决大语言模型指令调优问题 |
large language model instruction following |
✅ |
|
| 3 |
Zero-Shot Fact Verification via Natural Logic and Large Language Models |
提出一种基于自然逻辑和大型语言模型的零样本事实核查方法。 |
large language model zero-shot transfer |
|
|
| 4 |
What do Large Language Models Need for Machine Translation Evaluation? |
研究大型语言模型在机器翻译评估中的信息需求与提示策略 |
large language model chain-of-thought |
|
|
| 5 |
PersoBench: Benchmarking Personalized Response Generation in Large Language Models |
PersoBench:用于评估大语言模型个性化回复生成能力的新基准 |
large language model chain-of-thought |
✅ |
|
| 6 |
Using Prompts to Guide Large Language Models in Imitating a Real Person's Language Style |
利用提示工程引导大语言模型模仿个人语言风格,提升对话AI个性化 |
large language model |
|
|
| 7 |
Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) |
针对LLM的语言感知和语言无关的分词方法研究,提升低资源语言支持 |
large language model |
|
|
| 8 |
Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis |
提出ANGST基准,评估大语言模型在抑郁-焦虑共病诊断中的能力 |
large language model |
|
|
| 9 |
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? |
提出SWE-bench Multimodal,评估AI系统在视觉软件领域的泛化能力。 |
multimodal |
|
|
| 10 |
Output Scouting: Auditing Large Language Models for Catastrophic Responses |
提出Output Scouting方法,高效审计大语言模型中的灾难性输出 |
large language model |
✅ |
|
| 11 |
One2set + Large Language Model: Best Partners for Keyphrase Generation |
提出One2set+LLM框架,通过生成-选择策略提升关键短语生成效果 |
large language model |
|
|
| 12 |
Generating bilingual example sentences with large language models as lexicography assistants |
利用大型语言模型生成双语例句,辅助词典编纂工作 |
large language model |
|
|
| 13 |
PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models |
PersonalSum:一个用户主观引导的个性化摘要数据集,用于评估大型语言模型 |
large language model |
|
|
| 14 |
A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation |
提出基于大语言模型的招标文档检索增强生成框架,提升专业文档生成质量 |
large language model |
|
|
| 15 |
Steering Large Language Models between Code Execution and Textual Reasoning |
提出三种方法,提升大型语言模型在代码执行与文本推理间的引导能力 |
large language model |
✅ |
|
| 16 |
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios |
提出CliMedBench:大规模中文医疗LLM评测基准,聚焦临床场景 |
large language model |
|
|
| 17 |
Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models |
利用大语言模型融合上下文和多系统输出,提升ASR后情感识别准确率 |
large language model |
|
|
| 18 |
Consultation on Industrial Machine Faults with Large language Models |
提出基于多轮提示的大语言模型方法,用于工业机器故障诊断 |
large language model |
|
|
| 19 |
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas |
首个探索视觉角色对多模态大语言模型行为影响的研究 |
large language model |
|
|
| 20 |
Autoregressive Large Language Models are Computationally Universal |
证明自回归大语言模型在计算上具有通用性,无需外部干预。 |
large language model |
|
|
| 21 |
Self-Powered LLM Modality Expansion for Large Speech-Text Models |
提出自驱动LLM模态扩展方法,解决语音-文本大模型中的语音锚定偏差问题 |
large language model multimodal instruction following |
✅ |
|
| 22 |
Searching for Best Practices in Medical Transcription with Large Language Model |
利用大型语言模型提升医疗转录准确率,尤其针对印度口音 |
large language model |
|
|
| 23 |
ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities |
ActPlan-1K:用于评估视觉语言模型在家庭活动中程序规划能力的基准 |
embodied AI large language model |
|
|
| 24 |
SAG: Style-Aligned Article Generation via Model Collaboration |
提出SAG:一种基于模型协作的风格对齐文章生成方法,显著提升生成质量。 |
large language model instruction following |
|
|
| 25 |
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity |
LLM-TOPLA:通过最大化多样性实现高效的大语言模型集成 |
large language model |
✅ |
|
| 26 |
Detecting Machine-Generated Long-Form Content with Latent-Space Variables |
提出潜在空间变量模型以解决机器生成长文本检测问题 |
large language model |
|
|
| 27 |
RAFT: Realistic Attacks to Fool Text Detectors |
RAFT:提出一种针对LLM检测器的逼真黑盒攻击方法,提升攻击的隐蔽性和有效性。 |
large language model |
|
|
| 28 |
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation |
提出Auto-GDA,通过自动领域自适应提升RAG中 grounding verification 的效率。 |
large language model |
|
|
| 29 |
Mixture of Attentions For Speculative Decoding |
提出混合注意力机制用于推测解码,提升单设备和客户端-服务器场景下的解码速度和精度。 |
large language model |
|
|
| 30 |
KidLM: Advancing Language Models for Children -- Early Insights and Future Directions |
KidLM:面向儿童的语言模型,通过定制数据和训练目标提升性能。 |
large language model |
|
|
| 31 |
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues |
提出一种可扩展的框架,利用大型语言模型构建用于社交对话的社会文化规范库。 |
large language model |
|
|
| 32 |
ORAssistant: A Custom RAG-based Conversational Assistant for OpenROAD |
提出ORAssistant,一个基于RAG的OpenROAD定制对话助手 |
large language model |
|
|
| 33 |
Re-examining Sexism and Misogyny Classification with Annotator Attitudes |
通过考察标注者态度,重新审视性别歧视和厌女症分类问题 |
large language model |
|
|
| 34 |
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores |
量化LLM基准测试不确定性,提升评估可复现性 |
large language model |
|
|
| 35 |
Generating Equivalent Representations of Code By A Self-Reflection Approach |
提出一种自反思方法,利用大语言模型自动生成代码的等价表示 |
large language model |
|
|
| 36 |
Beyond Film Subtitles: Is YouTube the Best Approximation of Spoken Vocabulary? |
利用YouTube字幕构建高质量词频资源,提升心理语言学和词汇复杂度预测任务性能 |
large language model |
✅ |
|
| 37 |
Showing LLM-Generated Code Selectively Based on Confidence of LLMs |
HonestCoder:基于置信度选择性展示LLM生成代码,提升开发者效率并降低安全风险 |
large language model |
|
|
| 38 |
Can Watermarked LLMs be Identified by Users via Crafted Prompts? |
提出Water-Probe算法以识别水印LLM的隐蔽性问题 |
large language model |
|
|
| 39 |
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale |
X-ALMA:通过即插即用模块和自适应拒绝优化,实现大规模高质量翻译 |
large language model |
|
|
| 40 |
UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective |
提出UNComp,利用矩阵熵指导LLM的KV缓存压缩,提升长文本推理效率。 |
large language model |
✅ |
|
| 41 |
Enhancing Short-Text Topic Modeling with LLM-Driven Context Expansion and Prefix-Tuned VAEs |
提出LLM驱动的上下文扩展与Prefix-Tuned VAEs方法,提升短文本主题建模效果。 |
large language model |
|
|