| 1 |
Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation |
提出视觉检索增强生成(V-RAG)框架,减少医疗多模态大语言模型中的幻觉问题 |
large language model multimodal |
|
|
| 2 |
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics |
综述:基于反馈的多步推理提升大语言模型数学能力 |
large language model chain-of-thought |
|
|
| 3 |
Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models |
提出U-SafeBench,评估大语言模型在用户特定安全标准下的表现 |
large language model chain-of-thought |
✅ |
|
| 4 |
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO |
AlphaMaze:利用GRPO提升大语言模型在迷宫导航中的空间智能 |
large language model chain-of-thought |
|
|
| 5 |
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following |
提出StructFlowBench,用于评估LLM在多轮指令跟随中的结构化流程理解能力 |
large language model instruction following |
✅ |
|
| 6 |
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback |
提出InterFeedback框架,评估大型多模态模型与人类交互的智能水平 |
multimodal |
|
|
| 7 |
Harnessing PDF Data for Improving Japanese Large Multimodal Models |
利用PDF数据增强日语大型多模态模型,提升日语文化知识理解能力 |
multimodal |
|
|
| 8 |
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models |
Obliviate:一种高效的遗忘方法,用于保护大型语言模型中的知识产权 |
large language model |
|
|
| 9 |
Hallucination Detection in Large Language Models with Metamorphic Relations |
提出MetaQA,利用变质关系和提示突变检测大语言模型中的幻觉问题。 |
large language model |
|
|
| 10 |
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators |
TritonBench:首个针对LLM生成Triton算子的综合性基准测试,揭示现有模型在高性能代码生成上的不足。 |
large language model |
✅ |
|
| 11 |
Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models |
揭示大语言模型基准测试的内在局限性,质疑其泛化能力评估的可靠性 |
large language model |
|
|
| 12 |
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models |
MedHallu:用于检测大型语言模型医学幻觉的综合基准 |
large language model |
|
|
| 13 |
Fact or Guesswork? Evaluating Large Language Models' Medical Knowledge with Structured One-Hop Judgments |
提出MKJ数据集,评估大语言模型在医学知识领域的准确性和校准性 |
large language model |
|
|
| 14 |
Effects of Prompt Length on Domain-specific Tasks for Large Language Models |
研究提示长度对大语言模型在领域特定任务上表现的影响 |
large language model |
|
|
| 15 |
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease |
RareScale:结合专家系统与大语言模型,提升罕见病诊断准确率 |
large language model |
|
|
| 16 |
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models |
提出HippoRAG 2,提升LLM在事实、推理和关联记忆任务上的非参数持续学习能力。 |
large language model |
✅ |
|
| 17 |
Explanations of Large Language Models Explain Language Representations in the Brain |
利用可解释AI,揭示大语言模型与大脑语言表征的关联 |
large language model |
|
|
| 18 |
Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models |
软Token攻击无法可靠地审计大型语言模型中的非学习效果 |
large language model |
✅ |
|
| 19 |
CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models |
提出CORBA:一种针对基于大语言模型的多智能体系统的传染性递归阻塞攻击 |
large language model |
✅ |
|
| 20 |
Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models |
提出一种基于大语言模型的上下文感知聊天机器人,增强智能环境用户体验 |
large language model |
|
|
| 21 |
Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models |
利用大型语言模型评估企业气候信息披露并识别“漂绿”行为 |
large language model |
|
|
| 22 |
Optimizing Singular Spectrum for Large Language Model Compression |
提出SoCo框架,通过优化奇异谱实现大语言模型高效压缩 |
large language model |
|
|
| 23 |
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps |
提出FUR框架,通过消除推理步骤信息评估CoT推理的参数忠实度。 |
chain-of-thought |
|
|
| 24 |
SurveyX: Academic Survey Automation via Large Language Models |
SurveyX:利用大型语言模型实现学术调研自动化,显著提升内容和引文质量。 |
large language model |
|
|
| 25 |
Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of Topic Models |
大型语言模型在缺乏人工干预时难以描述大型语料库主题,需人机协同评估主题模型 |
large language model |
|
|
| 26 |
LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning |
LIFT:通过长输入微调提升大语言模型的长文本理解能力 |
large language model |
|
|
| 27 |
Behavioral Analysis of Information Salience in Large Language Models |
提出可解释框架,通过摘要行为分析大语言模型的信息显著性偏好。 |
large language model |
|
|
| 28 |
Optimal word order for non-causal text generation with Large Language Models: the Spanish case |
针对西班牙语,提出基于Viterbi算法的最大似然估计方法,优化非因果语言模型的文本生成顺序。 |
large language model |
|
|
| 29 |
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models |
提出基于Token密度的大语言模型不确定性量化方法,提升生成结果的真实性。 |
large language model |
|
|
| 30 |
A Survey on Data Contamination for Large Language Models |
综述大型语言模型数据污染问题,并分析检测与应对方法 |
large language model |
|
|
| 31 |
SR-LLM: Rethinking the Structured Representation in Large Language Model |
SR-LLM:通过结构化表示增强大型语言模型的推理能力 |
large language model |
|
|
| 32 |
English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports |
首个多语言Bug报告机器翻译评估:对比大型语言模型与传统翻译模型 |
large language model |
✅ |
|
| 33 |
Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization |
提出Transfer-Prompting,通过双阶段Prompt优化提升大语言模型跨任务迁移能力 |
large language model |
✅ |
|
| 34 |
QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification |
QUAD-LLM-MLTC:利用大语言模型集成学习进行医疗文本多标签分类 |
large language model |
|
|
| 35 |
Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach |
提出EvoStealer,利用差分进化算法实现对文本生成图像模型的提示词模板窃取。 |
large language model multimodal |
✅ |
|
| 36 |
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting |
提出ReVision数据集与基线VLM,用于保护隐私的视觉指令重写任务 |
multimodal |
|
|
| 37 |
LUME: LLM Unlearning with Multitask Evaluations |
LUME:通过多任务评估实现LLM的不可学习性 |
large language model |
|
|
| 38 |
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention |
LServe:通过统一稀疏注意力加速长序列LLM服务。 |
large language model |
✅ |
|
| 39 |
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling |
提出FR-Spec,通过频率排序推测采样加速大词汇量语言模型 |
large language model |
✅ |
|
| 40 |
Rapid Word Learning Through Meta In-Context Learning |
提出Minnow元学习框架,提升语言模型在少量样本下的快速单词学习能力 |
large language model |
|
|
| 41 |
ExpertLens: Activation steering features are highly interpretable |
ExpertLens:通过激活调控发现LLM中高度可解释的概念表征 |
large language model |
|
|
| 42 |
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection |
提出一种基于不确定性验证的黑盒幻觉检测方法,提升效率并保持高性能。 |
large language model |
|
|
| 43 |
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers |
LLM-Microscope揭示了Transformer上下文中标点符号的隐藏作用 |
large language model |
|
|
| 44 |
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries |
量化AI过度拒绝与情感依恋边界:提出LLM情感边界处理评估框架。 |
large language model |
|
|
| 45 |
Revealing and Mitigating Over-Attention in Knowledge Editing |
提出选择性注意力漂移限制(SADR)方法,缓解知识编辑中的过度关注问题 |
large language model |
|
|
| 46 |
eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables |
提出eC-Tab2Text数据集,用于电商产品表格到文本的属性特定生成。 |
large language model |
|
|
| 47 |
Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis |
提出Tree-of-Debate框架,利用多角色辩论树促进科学论文的对比分析和批判性思维。 |
large language model |
|
|
| 48 |
Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs |
提出X-KDE框架,实现LLM中跨语言知识同步编辑与更新 |
large language model |
|
|
| 49 |
PredictaBoard: Benchmarking LLM Score Predictability |
PredictaBoard:评估LLM预测能力,提升AI系统安全性 |
large language model |
✅ |
|
| 50 |
SEA-HELM: Southeast Asian Holistic Evaluation of Language Models |
SEA-HELM:东南亚语言模型综合评估基准,填补多语言文化评估空白 |
large language model |
|
|
| 51 |
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels |
MCQA-Eval:利用标准答案评估NLG置信度,提升评估效率与可靠性 |
large language model |
|
|
| 52 |
CoME: An Unlearning-based Approach to Conflict-free Model Editing |
CoME:一种基于遗忘学习的无冲突模型编辑方法 |
large language model |
|
|
| 53 |
Using tournaments to calculate AUROC for zero-shot classification with LLMs |
利用锦标赛机制计算AUROC,提升LLM零样本分类性能 |
large language model |
|
|
| 54 |
Contextualizing Search Queries In-Context Learning for Conversational Rewriting with LLMs |
提出Prompt引导的上下文学习方法,解决低资源对话式查询重写问题 |
large language model |
|
|
| 55 |
CLIPPER: Compression enables long-context synthetic data generation |
CLIPPER:通过压缩增强长文本合成数据生成,提升叙事性声明验证性能 |
chain-of-thought |
|
|
| 56 |
GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks |
提出GATE框架以解决多任务工具构建效率问题 |
large language model |
✅ |
|
| 57 |
HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States |
HiddenDetect:通过监控隐藏状态检测大型视觉语言模型的越狱攻击 |
multimodal |
✅ |
|
| 58 |
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines |
SuperGPQA:构建涵盖285个研究生学科的大规模LLM评估基准 |
large language model |
|
|
| 59 |
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search |
I-MCTS:通过内省蒙特卡洛树搜索增强Agentic AutoML |
large language model |
✅ |
|
| 60 |
Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup |
PAS-SQL通过抽象查询模式和上下文模式标记,提升Text-to-SQL在复杂问题上的性能。 |
large language model |
|
|
| 61 |
Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs |
利用通用LLM,通过上下文学习和微调预测引文意图,无需领域特定预训练。 |
large language model |
|
|
| 62 |
LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization |
提出LoRA-MGPO,通过动量引导扰动优化缓解低秩自适应中的双重下降问题。 |
large language model |
|
|
| 63 |
Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases |
利用信息论分析,评估大语言模型模拟二语英语对话中母语干扰偏差的能力 |
large language model |
|
|
| 64 |
Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression |
提出ESA:通过查询-键压缩实现高效选择性注意力,突破长文本上下文长度限制 |
large language model |
|
|
| 65 |
Unstructured Evidence Attribution for Long Context Query Focused Summarization |
提出SUnsET数据集和非结构化证据抽取方法,提升长文本问答式摘要的真实性。 |
large language model |
|
|
| 66 |
Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment |
利用小型LLM进行教育论证挖掘:论证成分识别、分类与评估 |
large language model |
|
|
| 67 |
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems |
提出面向通信的大语言模型多智能体系统综述框架,分析交互机制并展望未来方向。 |
large language model |
|
|
| 68 |
ParallelComp: Parallel Long-Context Compressor for Length Extrapolation |
提出ParallelComp并行长文本压缩方法,解决LLM长文本外推的内存瓶颈和注意力衰减问题。 |
large language model |
✅ |
|
| 69 |
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension |
CAPTex基准揭示mLLM在文化程序文本理解上的局限性,尤其在低资源语言中 |
large language model |
|
|
| 70 |
Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection |
提出ARG框架,通过主动自反思实现知识图谱推理的端到端训练。 |
large language model |
|
|
| 71 |
PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant |
PaperHelper:基于知识的LLM问答论文阅读助手,提升文献理解效率。 |
large language model |
|
|