| 1 |
Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models |
RuleBench:评估大语言模型推理规则遵循能力,并提出IRFT进行优化 |
large language model instruction following |
|
|
| 2 |
Fault Diagnosis in Power Grids with Large Language Model |
提出基于Prompt工程的大语言模型电力系统故障诊断方法 |
large language model chain-of-thought |
|
|
| 3 |
Uncertainty Estimation of Large Language Models in Medical Question Answering |
提出Two-phase Verification方法,提升医学问答中大语言模型的不确定性估计 |
large language model |
|
|
| 4 |
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation |
利用Jailbreak提示评估大型语言模型对抗偏见诱导的鲁棒性 |
large language model |
|
|
| 5 |
Evaluating Nuanced Bias in Large Language Model Free Response Answers |
提出一种半自动化流程,用于评估大型语言模型自由回答中细微的偏见。 |
large language model |
|
|
| 6 |
A Taxonomy for Data Contamination in Large Language Models |
提出LLM数据污染分类法,分析污染类型对下游任务性能的影响 |
large language model |
|
|
| 7 |
GTA: A Benchmark for General Tool Agents |
提出GTA基准测试,评估通用工具智能体在真实场景下的工具使用能力 |
large language model multimodal |
✅ |
|
| 8 |
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting |
Speculative RAG:通过草稿机制增强检索增强生成,提升准确率并降低延迟。 |
large language model |
|
|
| 9 |
NinjaLLM: Fast, Scalable and Cost-effective RAG using Amazon SageMaker and AWS Trainium and Inferentia2 |
NinjaLLM:利用Amazon SageMaker和AWS Trainium/Inferentia2实现快速、可扩展且经济高效的RAG |
large language model |
|
|
| 10 |
GPT-4 is judged more human than humans in displaced and inverted Turing tests |
GPT-4在移位和倒置图灵测试中被误判为人类的概率高于真人 |
large language model |
|
|
| 11 |
Brief state of the art in social information mining: Practical application in analysis of trends in French legislative 2024 |
利用社交媒体挖掘技术分析2024年法国立法选举趋势 |
large language model |
|
|
| 12 |
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency |
批判性分析:大型语言模型并非人类语言能力的完整复现 |
large language model |
|
|
| 13 |
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist |
提出MathCheck数学推理评估框架,提升LLM数学能力评估的泛化性和鲁棒性 |
large language model |
|
|
| 14 |
Towards Building Specialized Generalist AI with System 1 and System 2 Fusion |
提出融合System 1和System 2的专业化通用人工智能(SGAI)框架,迈向AGI |
large language model |
|
|
| 15 |
Turn-Level Empathy Prediction Using Psychological Indicators |
提出基于心理指标分解的turn-level共情预测方法,提升共情检测性能。 |
large language model |
|
|
| 16 |
On the Universal Truthfulness Hyperplane Inside LLMs |
探索LLM内部的通用真值超平面以解决幻觉问题 |
large language model |
|
|
| 17 |
Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective |
综述公共微调数据集构建方法,助力大模型训练与发展 |
large language model |
|
|
| 18 |
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks |
提出KVMerger,自适应合并KV缓存,提升LLM在长文本任务中的性能。 |
large language model |
|
|
| 19 |
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL |
提出RB-SQL,一种基于检索的LLM框架,用于提升Text-to-SQL任务性能 |
large language model |
|
|
| 20 |
Beyond Text: Leveraging Multi-Task Learning and Cognitive Appraisal Theory for Post-Purchase Intention Analysis |
利用多任务学习和认知评估理论分析购买后意图 |
large language model |
|
|