| 1 |
SafeMT: Multi-turn Safety for Multimodal Language Models |
提出SafeMT基准,评估多模态大语言模型在多轮对话中的安全性,并提出对话安全调节器 |
large language model multimodal |
|
|
| 2 |
Multi-stage Prompt Refinement for Mitigating Hallucinations in Large Language Models |
提出多阶段提示精炼(MPR)框架,缓解大语言模型中的幻觉问题。 |
large language model |
|
|
| 3 |
CPR: Mitigating Large Language Model Hallucinations with Curative Prompt Refinement |
提出CPR框架,通过优化提示词缓解大语言模型幻觉问题 |
large language model |
|
|
| 4 |
From Knowledge to Treatment: Large Language Model Assisted Biomedical Concept Representation for Drug Repurposing |
LLaDR:利用大语言模型辅助生物医学概念表示,用于药物重定向 |
large language model |
✅ |
|
| 5 |
Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models |
提出Credal Transformer,通过不确定性建模缓解大语言模型的幻觉问题 |
large language model |
|
|
| 6 |
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness |
综述小型与大型语言模型协同,提升性能、降低成本、保障隐私与可信性。 |
large language model |
✅ |
|
| 7 |
Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models |
提出PACE:一种简单高效的LLM创造力评估指标,避免数据污染且与人类评估高度相关 |
large language model |
|
|
| 8 |
Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory |
通过道德基础理论探究大型语言模型中的政治和人口统计关联性 |
large language model |
|
|
| 9 |
COSTAR-A: A prompting framework for enhancing Large Language Model performance on Point-of-View questions |
COSTAR-A框架通过优化Prompt提升小模型在视角问题上的性能 |
large language model |
|
|
| 10 |
Community size rather than grammatical complexity better predicts Large Language Model accuracy in a novel Wug Test |
Wug测试揭示:语言模型准确率受社群规模而非语法复杂度主导 |
large language model |
|
|
| 11 |
Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation |
探索LLM在社会模拟中的开放性:提升测量、减少偏差、增强方法效用 |
large language model |
|
|
| 12 |
Uncertainty Quantification for Hallucination Detection in Large Language Models: Foundations, Methodology, and Future Directions |
综述性研究:面向大语言模型幻觉检测的不确定性量化方法 |
large language model |
|
|
| 13 |
Schema for In-Context Learning |
提出SA-ICL框架,通过显式schema激活提升LLM的上下文学习能力 |
large language model chain-of-thought |
|
|
| 14 |
Not in Sync: Unveiling Temporal Bias in Audio Chat Models |
揭示音频聊天模型中的时间偏差,提出TBI指标进行量化 |
multimodal TAMP |
|
|
| 15 |
A Survey on Parallel Reasoning |
综述并行推理:提升大语言模型鲁棒性的新兴推理范式 |
large language model chain-of-thought |
✅ |
|
| 16 |
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception |
提出Omni-Captioner,用于多模态细粒度感知,并构建相应的数据集、模型和评测基准。 |
multimodal |
|
|
| 17 |
Toward LLM-Supported Automated Assessment of Critical Thinking Subskills |
利用大语言模型自动评估学生批判性思维子技能 |
large language model |
|
|
| 18 |
Dr.LLM: Dynamic Layer Routing in LLMs |
Dr.LLM:通过动态层路由提升大语言模型推理效率与精度 |
large language model |
|
|
| 19 |
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences |
窄域微调在LLM激活差异中留下可读痕迹,可用于理解微调领域。 |
large language model |
|
|
| 20 |
StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis |
StyleDecipher:利用文体分析实现对LLM生成文本的鲁棒且可解释的检测 |
large language model |
✅ |
|
| 21 |
When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection |
提出个性化文本检测基准以解决机器生成文本的识别问题 |
large language model |
|
|
| 22 |
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection |
提出基于分类法的越狱攻击检测方法,提升大语言模型安全性 |
large language model |
|
|
| 23 |
Tokenization Disparities as Infrastructure Bias: How Subword Systems Create Inequities in LLM Access and Efficiency |
揭示分词差异中的基础设施偏差:子词系统如何造成LLM访问和效率的不平等 |
large language model |
|
|
| 24 |
LLM-REVal: Can We Trust LLM Reviewers Yet? |
LLM-REVal:评估LLM作为评审者的可靠性,揭示其偏见与潜在风险 |
large language model |
|
|
| 25 |
Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability |
通过机制可解释性分析微调LLM中的道德偏见,并提出缓解方法 |
large language model |
|
|
| 26 |
An AI-Based Behavioral Health Safety Filter and Dataset for Identifying Mental Health Crises in Text-Based Conversations |
提出基于AI的行为健康安全过滤器及数据集,用于识别文本对话中的精神健康危机。 |
large language model |
|
|
| 27 |
Interpreting the Latent Structure of Operator Precedence in Language Models |
研究LLM内部如何编码算术运算优先级,揭示中间计算过程。 |
large language model |
|
|
| 28 |
LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization |
提出Prompt Duel Optimizer (PDO),高效解决无标签条件下的LLM提示优化问题 |
large language model |
|
|
| 29 |
OPLoRA: Orthogonal Projection LoRA Prevents Catastrophic Forgetting during Parameter-Efficient Fine-Tuning |
OPLoRA:正交投影LoRA防止参数高效微调中的灾难性遗忘 |
large language model |
|
|
| 30 |
A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation |
大规模多语种研究揭示LLM安全防护、个性化与虚假信息传播的复杂关系 |
large language model |
|
|
| 31 |
3-Model Speculative Decoding |
提出金字塔推测解码,通过引入中间模型提升大语言模型推理速度。 |
large language model |
|
|
| 32 |
The Curious Case of Curiosity across Human Cultures and LLMs |
提出CUEST框架,揭示LLM在跨文化好奇心表达上的偏差并提出优化方案。 |
large language model |
|
|
| 33 |
RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs |
提出RAID框架以解决大型语言模型的越狱攻击问题 |
large language model |
|
|
| 34 |
Attribution Quality in AI-Generated Content:Benchmarking Style Embeddings and LLM Judges |
对比风格嵌入和LLM判别器,评估AI生成内容归属质量并构建基准。 |
large language model |
|
|
| 35 |
Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation |
提出CLEAR框架,通过探测潜在知识冲突提升RAG系统的忠实性 |
large language model |
✅ |
|
| 36 |
Fine-grained Analysis of Brain-LLM Alignment through Input Attribution |
提出细粒度输入归因方法,深入分析大脑与LLM对齐关系 |
large language model |
|
|
| 37 |
A large-scale, unsupervised pipeline for automatic corpus annotation using LLMs: variation and change in the English consider construction |
提出基于LLM的大规模无监督语料自动标注流程,加速语料库语言学研究。 |
large language model |
|
|
| 38 |
The Harder The Better: Maintaining Supervised Fine-tuning Generalization with Less but Harder Data |
提出THTB框架,通过更少但更难的数据维持监督微调的泛化能力 |
large language model |
✅ |
|
| 39 |
Towards Inference-time Scaling for Continuous Space Reasoning |
探索推理时缩放技术在连续空间推理中的应用与挑战 |
large language model |
|
|
| 40 |
Information Extraction from Conversation Transcripts: Neuro-Symbolic vs. LLM |
对比神经符号与LLM方法,评估农业领域对话信息抽取的性能与成本。 |
large language model |
|
|