| 1 |
Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors |
利用思维链中的不确定性来缓解预测有害用户行为的偏差 |
large language model chain-of-thought |
|
|
| 2 |
Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models |
提出Shortcut Suite以评估大语言模型的快捷学习问题 |
large language model chain-of-thought |
✅ |
|
| 3 |
Roadmap towards Superhuman Speech Understanding using Large Language Models |
提出基于LLM的超人语音理解路线图与SAGI基准评测体系 |
large language model foundation model |
|
|
| 4 |
Learning Multimodal Cues of Children's Uncertainty |
构建儿童不确定性多模态线索数据集,并提出模型预测儿童不确定性 |
multimodal |
|
|
| 5 |
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors |
揭示主观任务数据聚合伪影如何影响大语言模型后验分布 |
large language model |
|
|
| 6 |
Semi-supervised Fine-tuning for Large Language Models |
提出SemiEvol框架,通过半监督微调提升大语言模型在有限标注数据下的性能。 |
large language model |
|
|
| 7 |
On the Role of Attention Heads in Large Language Model Safety |
提出Safety Head ImPortant Score (Ships)和Sahara算法,用于评估和归因LLM中的安全注意力头。 |
large language model |
|
|
| 8 |
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models |
提出UCFE:一个用户中心的金融专业知识基准,用于评估大型语言模型 |
large language model |
|
|
| 9 |
RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs in Medicine |
RiTeK:一个用于评估大语言模型在医学文本知识图谱上复杂推理能力的数据集 |
large language model |
|
|
| 10 |
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models |
大型语言模型伦理研究白皮书:为LLM研究提供伦理指导与实践规范 |
large language model |
|
|
| 11 |
De-mark: Watermark Removal in Large Language Models |
提出De-mark框架,有效移除大型语言模型中基于n-gram的水印 |
large language model |
|
|
| 12 |
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval |
提出知识感知的查询扩展框架,利用大语言模型提升文本和关系检索效果 |
large language model |
|
|
| 13 |
SynapticRAG: Enhancing Temporal Memory Retrieval in Large Language Models through Synaptic Mechanisms |
SynapticRAG:通过突触机制增强大语言模型中的时间记忆检索 |
large language model |
|
|
| 14 |
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR |
结合参数高效微调与文本自适应,提升低资源ASR多语言多模态模型性能 |
multimodal |
|
|
| 15 |
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models |
提出PTR框架,通过渐进式思维提炼提升大语言模型在开放场景下的性能 |
large language model |
|
|
| 16 |
BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models |
BiasJailbreak:揭示并利用大语言模型中的伦理偏见进行对抗攻击,并提出防御方法。 |
large language model |
|
|
| 17 |
Advancing Large Language Model Attribution through Self-Improving |
提出START框架,通过自学习迭代提升大语言模型的事实归因能力 |
large language model |
|
|
| 18 |
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis |
提出语言混淆熵,量化评估大语言模型中的语言混淆现象,并分析其安全性影响。 |
large language model |
|
|
| 19 |
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy |
提出CBT-BENCH基准,评估大型语言模型在认知行为疗法辅助中的能力 |
large language model |
|
|
| 20 |
BQA: Body Language Question Answering Dataset for Video Large Language Models |
提出BQA数据集,用于评估视频大语言模型对肢体语言的理解能力 |
large language model |
|
|
| 21 |
Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models |
评估自生成文档以增强大语言模型的检索增强生成效果 |
large language model |
|
|
| 22 |
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing |
提出轻量高效的代码大语言模型aiXcoder-7B,提升代码补全精度与效率。 |
large language model |
|
|
| 23 |
Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings |
评估大语言模型在英语和低资源语言上的性能差异,揭示跨语言应用挑战 |
large language model |
|
|
| 24 |
Data Defenses Against Large Language Models |
提出数据防御方法,通过对抗性提示注入,保护数据免受大型语言模型的不当推断。 |
large language model |
✅ |
|
| 25 |
Can MLLMs Understand the Deep Implication Behind Chinese Images? |
提出CII-Bench基准,评估多模态大语言模型对中文图像深层含义的理解能力 |
large language model multimodal |
✅ |
|
| 26 |
Retrospective Learning from Interactions |
ReSpect:利用交互历史中的隐式反馈提升多模态LLM的推理能力 |
large language model multimodal |
|
|
| 27 |
Generating Signed Language Instructions in Large-Scale Dialogue Systems |
构建基于大型对话系统的手语指令生成系统,提升多模态交互体验。 |
large language model multimodal |
✅ |
|
| 28 |
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs |
SimpleToM:揭示大语言模型在显式心理理论推理和隐式应用之间的差距 |
large language model chain-of-thought |
|
|
| 29 |
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models |
GeoCoder:通过视觉-语言模型生成模块化代码解决几何问题 |
multimodal |
|
|
| 30 |
Reference-Based Post-OCR Processing with LLM for Precise Diacritic Text in Historical Document Recognition |
提出基于LLM和参考书的OCR后处理方法,提升古籍文字识别精度 |
large language model |
|
|
| 31 |
Detecting AI-Generated Texts in Cross-Domains |
提出RoBERTa-Ranker模型,解决跨领域AI生成文本检测性能下降问题 |
large language model |
|
|
| 32 |
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems |
提出MIRAGE-Bench,用于自动评估多语言检索增强生成系统的基准测试平台。 |
large language model |
✅ |
|
| 33 |
Retrieval of Temporal Event Sequences from Textual Descriptions |
提出TPP-Embedding模型,用于从文本描述中检索时序事件序列,并构建了TESRBench基准。 |
large language model |
|
|
| 34 |
Measuring and Modifying the Readability of English Texts with GPT-4 |
利用GPT-4评估并修改英文文本可读性,显著优于传统方法。 |
large language model |
|
|
| 35 |
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education |
揭示LLM在个性化教育中作为“教师”的偏见,并提出评估指标。 |
large language model |
|
|
| 36 |
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization |
提出多文档摘要中的幻觉问题研究以提升LLM性能 |
large language model |
|
|
| 37 |
BenTo: Benchmark Task Reduction with In-Context Transferability |
BenTo:利用上下文迁移性进行大模型评测基准任务缩减 |
large language model |
|
|
| 38 |
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions |
通过建模未来对话轮次,提升LLM提问澄清问题的能力 |
large language model |
|
|
| 39 |
Unconstrained Model Merging for Enhanced LLM Reasoning |
提出一种无约束模型融合框架,提升LLM在推理任务上的性能 |
large language model |
|
|
| 40 |
ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization |
提出ORCHID中文辩论语料库,用于目标无关立场检测和辩论对话摘要。 |
large language model |
|
|
| 41 |
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model |
对比研究OpenAI o1模型的推理模式,揭示其在数学、代码和常识推理上的优势 |
large language model |
|
|
| 42 |
Bias in the Mirror: Are LLMs opinions robust to their own adversarial attacks ? |
提出LLM自辩框架,评估模型偏见在对抗攻击下的鲁棒性 |
large language model |
|
|
| 43 |
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards |
提出RAG-DDR,通过可微数据奖励优化检索增强生成,提升小模型知识利用率。 |
large language model |
✅ |
|
| 44 |
IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection |
IterSelectTune:一种高效指令调优数据选择的迭代训练框架 |
large language model |
|
|
| 45 |
From Citations to Criticality: Predicting Legal Decision Influence in the Multilingual Swiss Jurisprudence |
提出Criticality Prediction数据集,用于预测瑞士法律判决的影响力,优化案件优先级排序。 |
large language model |
|
|
| 46 |
Judgment of Learning: A Human Ability Beyond Generative Artificial Intelligence |
揭示大型语言模型元认知局限:在学习判断任务中表现不如人类 |
large language model |
|
|
| 47 |
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights |
提出LAR-ECHR数据集,用于评估LLM在欧洲人权法院案例中的法律推理能力 |
large language model |
|
|
| 48 |
Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement |
Cerberus:通过自适应并行解码和序列知识增强实现高效LLM推理 |
large language model |
|
|
| 49 |
Learning to Route LLMs with Confidence Tokens |
提出Self-REF,通过置信度令牌提升LLM在下游任务中的可靠性和准确性。 |
large language model |
|
|
| 50 |
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning |
提出MUNCH:基于不确定性的多跳知识遗忘方法,解决现有方法在间接推理上的不足 |
large language model |
|
|
| 51 |
Atomic Calibration of LLMs in Long-Form Generations |
提出原子校准方法,评估LLM在长文本生成中细粒度的幻觉问题。 |
large language model |
|
|
| 52 |
SPIN: Self-Supervised Prompt INjection |
SPIN:自监督提示注入,用于检测和防御大语言模型的对抗攻击 |
large language model |
|
|
| 53 |
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs |
FaithBench:针对现代LLM摘要幻觉的多元化评测基准 |
large language model |
✅ |
|
| 54 |
The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces |
探究LLM数值推理几何:语言模型在线性子空间中比较数值属性 |
large language model |
|
|
| 55 |
SLM-Mod: Small Language Models Surpass LLMs at Content Moderation |
SLM-Mod:小语言模型在内容审核方面超越大型语言模型 |
large language model |
✅ |
|