| 1 |
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities |
提出Whiteboard-of-Thought提示方法,提升多模态大语言模型在视觉推理任务上的性能 |
large language model multimodal chain-of-thought |
|
|
| 2 |
QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis |
QuST-LLM:集成大语言模型以实现全面的空间转录组学分析 |
large language model |
✅ |
|
| 3 |
Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination |
揭示大语言模型中的“虚假否定”偏见,缓解输入冲突导致的幻觉问题 |
large language model |
|
|
| 4 |
Relation Extraction with Fine-Tuned Large Language Models in Retrieval Augmented Generation Frameworks |
提出基于微调LLM的RAG框架,提升隐式关系抽取性能 |
large language model |
|
|
| 5 |
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate |
提出多智能体协作攻击方法,研究辩论场景下LLM协作的对抗攻击 |
large language model |
|
|
| 6 |
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models |
提出GraphReader,通过图结构化长文本增强大语言模型的长程上下文处理能力 |
large language model |
|
|
| 7 |
Evidence of a log scaling law for political persuasion with large language models |
研究表明:大型语言模型政治说服力遵循对数比例定律,边际效益递减 |
large language model |
|
|
| 8 |
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing |
提出GETA:一种基于生成式演进测试的大语言模型价值观评估方法 |
large language model |
|
|
| 9 |
Using Game Play to Investigate Multimodal and Conversational Grounding in Large Multimodal Models |
提出基于游戏交互的多模态大型模型评估方法,用于评估视觉表征和对话对齐能力。 |
multimodal |
|
|
| 10 |
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective |
提出心理测量学攻击方法,评估大型语言模型中的隐性偏见 |
large language model |
✅ |
|
| 11 |
1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators? |
提出一种跨语言知识聚合方法,提升大语言模型的多语言一致性与性能。 |
large language model |
|
|
| 12 |
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary |
提出基于通俗化摘要提示的大语言模型,提升放射科报告总结的准确性和可访问性 |
large language model |
|
|
| 13 |
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning |
研究表明,具备思维链推理的神经语言模型可表示概率图灵机所能表示的字符串分布族。 |
chain-of-thought |
|
|
| 14 |
Aligning Large Language Models with Diverse Political Viewpoints |
通过政治观点对齐,提升大语言模型在政治信息处理中的准确性和公正性 |
large language model |
|
|
| 15 |
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models |
提出LLM微调数据选择三阶段框架,并统一评估标准,揭示方法优劣与未来挑战。 |
large language model |
|
|
| 16 |
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation |
提出推理时去污方法ITD,解决大语言模型benchmark泄露导致的性能虚高问题 |
large language model |
|
|
| 17 |
AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought |
提出AutoCAP,实现零样本思维链跨语言对齐的自动规划 |
chain-of-thought |
|
|
| 18 |
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data |
提出通过隐性推理解决大型语言模型知识监控问题 |
large language model chain-of-thought |
|
|
| 19 |
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs |
提出MR-Ben基准,用于评估LLM的System-2思维和元推理能力 |
large language model chain-of-thought |
|
|
| 20 |
Exploring Design Choices for Building Language-Specific LLMs |
探索构建特定语言LLM的设计选择,提升低资源语言性能 |
large language model |
|
|
| 21 |
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset |
提出OpenDebateEvidence数据集,用于论证挖掘和摘要生成,助力辩论领域研究。 |
large language model |
✅ |
|
| 22 |
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch |
提出安全对齐的模型合并方法,解决LLM合并过程中的不对齐问题。 |
large language model |
|
|
| 23 |
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning |
针对LLM微调,提出用户级别差分隐私方法以保障用户隐私 |
large language model |
|
|
| 24 |
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons |
通过识别安全神经元,解析大语言模型安全对齐的内在机制 |
large language model |
✅ |
|
| 25 |
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics |
提出TRAIT以评估大型语言模型的个性特征 |
large language model |
|
|
| 26 |
Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction |
利用LLM关系抽取探索历史湖区文本中的空间表征 |
large language model |
|
|
| 27 |
Leveraging LLMs for Bangla Grammar Error Correction:Error Categorization, Synthetic Data, and Model Evaluation |
利用LLM提升孟加拉语语法纠错:错误分类、数据合成与模型评估 |
large language model |
|
|
| 28 |
Step-Back Profiling: Distilling User History for Personalized Scientific Writing |
提出STEP-BACK PROFILING以解决个性化科学写作问题 |
large language model |
✅ |
|
| 29 |
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics |
通过会话游戏自博弈评估LLM,研究参数量、训练方式等因素对性能的影响 |
large language model |
|
|
| 30 |
Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task? |
SCALPEL方法剖析LLM在信念推理任务中的失效原因 |
large language model |
|
|
| 31 |
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell |
揭示LLM长文本失效机制:Transformer模型知而不言现象研究 |
large language model |
|
|
| 32 |
An Analysis of Multilingual FActScore |
分析多语言环境下的FActScore,并提出知识源缓解策略以提升跨语言的事实性评估。 |
large language model |
|
|
| 33 |
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation |
综述:揭示语言模型中数据污染的全貌——从检测到修复 |
large language model |
|
|
| 34 |
Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas |
利用显性和隐性人类因素构建LLM中的主观性建模 |
large language model |
|
|
| 35 |
Selected Languages are All You Need for Cross-lingual Truthfulness Transfer |
提出FaMSS,通过选择性语言协同提升跨语言大语言模型的真实性。 |
large language model |
|
|
| 36 |
Definition generation for lexical semantic change detection |
提出基于LLM生成定义的词义表示方法,用于词汇语义随时间变化检测。 |
large language model |
|
|
| 37 |
An Investigation of Prompt Variations for Zero-shot LLM-based Rankers |
探究Prompt变体对零样本LLM排序器性能的影响,揭示Prompt工程的重要性 |
large language model |
|
|
| 38 |
Prompt Injection Attacks in Defended Systems |
研究防御系统中Prompt注入攻击的黑盒方法,揭示潜在安全风险 |
large language model |
|
|
| 39 |
Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking |
研究表明,开源LLM在生成论证理由方面更具说服力,尤其是在成对论证排序任务中。 |
large language model |
|
|