| 1 |
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models |
提出TRON框架,用于多模态大语言模型风险控制与评估,提升开放环境下的可靠性。 |
large language model multimodal |
|
|
| 2 |
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs |
提出评估框架,揭示多模态大语言模型中视觉信息与常识知识的冲突问题。 |
large language model multimodal |
|
|
| 3 |
What Makes Large Language Models Reason in (Multi-Turn) Code Generation? |
研究提示工程对大语言模型多轮代码生成推理能力的影响 |
large language model chain-of-thought |
|
|
| 4 |
Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs) |
Thought2Text:利用大语言模型从脑电信号生成文本,实现“意念转文字” |
large language model multimodal |
|
|
| 5 |
Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models |
ECHOQA:探究大语言模型中参数知识与上下文知识的动态交互 |
large language model |
✅ |
|
| 6 |
The Large Language Model GreekLegalRoBERTa |
提出GreekLegalRoBERTa,提升希腊语法律文本的命名实体识别和主题分类性能。 |
large language model |
|
|
| 7 |
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models |
VibeCheck:自动发现并量化大语言模型输出中的细微风格差异 |
large language model |
|
|
| 8 |
A Closer Look at Machine Unlearning for Large Language Models |
提出机器反学习方法以解决大语言模型中的隐私问题 |
large language model |
✅ |
|
| 9 |
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering |
提出OKGQA基准测试,评估知识图谱增强大语言模型在开放域问答中的可信度。 |
large language model |
✅ |
|
| 10 |
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models |
提出教学启发式集成提示框架,提升大语言模型算术推理能力 |
large language model |
✅ |
|
| 11 |
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models |
提出Omni-MATH,一个面向大语言模型的奥林匹克级别数学推理基准 |
large language model |
|
|
| 12 |
Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions |
利用LLM生成的合成归一化提及改进疾病实体识别与归一化 |
large language model |
|
|
| 13 |
NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models |
NusaMT-7B:利用大型语言模型提升低资源印尼语机器翻译性能 |
large language model |
|
|
| 14 |
Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models |
提出MAEC方法,无需训练即可为大语言模型赋予多语言能力。 |
large language model |
✅ |
|
| 15 |
Uncovering Overfitting in Large Language Model Editing |
揭示大语言模型知识编辑中的过拟合问题,并提出LTI方法缓解。 |
large language model |
|
|
| 16 |
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps |
提出GameTraversalBenchmark,评估大型语言模型在2D游戏地图中的规划能力。 |
large language model |
✅ |
|
| 17 |
Detecting Training Data of Large Language Models via Expectation Maximization |
提出EM-MIA方法以解决大语言模型训练数据检测问题 |
large language model |
|
|
| 18 |
AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models |
AI-Press:一个基于大语言模型的多智能体新闻生成与反馈模拟系统 |
large language model |
|
|
| 19 |
OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting |
提出OneNet,一种免微调的LLM提示框架,用于少样本实体链接。 |
large language model |
|
|
| 20 |
Upcycling Large Language Models into Mixture of Experts |
提出虚拟组初始化和权重缩放方法,高效地将大型语言模型升级为混合专家模型。 |
large language model |
|
|
| 21 |
Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks |
多智能体辩论框架中,思维多样性增强LLM的推理能力 |
large language model chain-of-thought |
|
|
| 22 |
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning |
CLOVER:通过分解翻译和验证一阶逻辑,提升复杂逻辑推理能力。 |
large language model chain-of-thought |
|
|
| 23 |
Advancing biomolecular understanding and design following human instructions |
InstructBioMol:通过自然语言指令驱动的生物分子理解与设计 |
large language model multimodal |
✅ |
|
| 24 |
Dialectical Behavior Therapy Approach to LLM Prompting |
提出基于辩证行为疗法的LLM提示方法,提升小模型推理能力 |
large language model chain-of-thought |
|
|
| 25 |
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code |
MathCoder2:通过模型翻译的数学代码持续预训练,提升数学推理能力 |
large language model |
✅ |
|
| 26 |
Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes |
提出GUS-Net框架,用于细粒度、可解释的NLP偏见检测与缓解。 |
large language model |
|
|
| 27 |
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions |
DRAFT框架:通过自驱动交互动态优化工具文档,提升LLM工具使用能力 |
large language model |
|
|
| 28 |
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act |
COMPL-AI框架:欧盟AI法案的技术解读与LLM基准测试套件 |
large language model |
|
|
| 29 |
Benchmarking Agentic Workflow Generation |
提出WorfBench基准测试,评估LLM在复杂工作流生成中的能力,揭示序列与图规划能力差距。 |
large language model |
✅ |
|
| 30 |
Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting |
提出语义自洽性方法,通过语义加权增强语言模型推理能力 |
large language model |
|
|
| 31 |
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories |
AgentBank:通过在5万+交互轨迹上微调,实现通用LLM Agent |
large language model |
|
|
| 32 |
Smart Audit System Empowered by LLM |
提出基于LLM的智能审计系统,提升制造业质量审计效率与透明度。 |
large language model |
|
|
| 33 |
No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users |
RAG增强大语言模型在提升性能的同时,可能损害公平性,即使对于有防范意识的用户。 |
large language model |
|
|
| 34 |
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM? |
RealVul:首个基于LLM的PHP漏洞检测框架,提升漏洞检测能力。 |
large language model |
|
|
| 35 |
KRAG Framework for Enhancing LLMs in the Legal Domain |
提出KRAG框架,通过知识表示增强LLM在法律领域的应用 |
large language model |
|
|
| 36 |
MKGL: Mastery of a Three-Word Language |
提出MKGL:使LLM掌握三词知识图谱语言,提升知识图谱补全精度。 |
large language model |
|
|
| 37 |
DemoShapley: Valuation of Demonstrations for In-Context Learning |
提出DemoShapley方法,通过Shapley值评估In-Context Learning中演示样本的贡献。 |
large language model |
|
|
| 38 |
Using LLMs to Discover Legal Factors |
利用大型语言模型自动发现法律领域关键因素,辅助法律分析 |
large language model |
|
|
| 39 |
PublicHearingBR: A Brazilian Portuguese Dataset of Public Hearing Transcripts for Summarization of Long Documents |
提出了PublicHearingBR,一个用于巴西葡萄牙语长文档摘要的公共听证会记录数据集。 |
large language model |
|
|