| 1 |
NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions |
NutriBench:一个用于评估大型语言模型从膳食描述中估计营养成分的数据集 |
large language model chain-of-thought |
✅ |
|
| 2 |
Benchmarking Complex Instruction-Following with Multiple Constraints Composition |
提出ComplexBench,用于评估LLM在多约束组合下的复杂指令遵循能力 |
large language model instruction following |
|
|
| 3 |
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks |
M5:一个多语言多文化的大型多模态模型评测基准 |
large language model multimodal |
|
|
| 4 |
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models |
利用大语言模型理解对话,生成视觉描述符,提升图像选择效果 |
large language model multimodal |
|
|
| 5 |
metabench -- A Sparse Benchmark of Reasoning and Knowledge in Large Language Models |
MetaBench:针对大型语言模型推理和知识能力的稀疏基准测试集 |
large language model |
|
|
| 6 |
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations |
系统性评测大语言模型:挑战、局限与建议 |
large language model |
|
|
| 7 |
Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing |
提出一种紧凑型CNN,用于边缘部署的蜂窝网络测试中软件日志分类,显著优于LLM。 |
large language model |
|
|
| 8 |
Text2TimeSeries: Enhancing Financial Forecasting through Time Series Prediction Updates with Event-Driven Insights from Large Language Models |
Text2TimeSeries:利用大语言模型事件驱动的洞察力增强金融预测 |
large language model |
|
|
| 9 |
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models |
提出RelD:一种鲁棒的LLM幻觉检测器,提升答案可靠性 |
large language model |
|
|
| 10 |
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization |
提出MAPO:一种模型自适应的Prompt优化方法,提升大语言模型在下游任务中的性能。 |
large language model |
|
|
| 11 |
Deep Content Understanding Toward Entity and Aspect Target Sentiment Analysis on Foundation Models |
提出EASTE任务,利用Transformer模型进行实体-属性目标情感分析,实现细粒度情感理解。 |
foundation model |
|
|
| 12 |
MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production |
提出MS2SL框架,利用多模态信息生成连续手语序列 |
multimodal |
|
|
| 13 |
Integrating Randomness in Large Language Models: A Linear Congruential Generator Approach for Generating Clinically Relevant Content |
利用线性同余生成器,提升大语言模型生成临床相关内容的多样性与质量 |
large language model |
|
|
| 14 |
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking |
LLMAEL:利用大语言模型增强实体链接的上下文信息,显著提升链接准确率。 |
large language model |
|
|
| 15 |
TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models |
提出TongGu以解决古典汉语理解的挑战 |
large language model |
✅ |
|
| 16 |
Chain-of-Thought Augmentation with Logit Contrast for Enhanced Reasoning in Language Models |
提出基于Logit对比的思维链增强方法,提升语言模型推理能力 |
chain-of-thought |
|
|
| 17 |
Black-box Model Ensembling for Textual and Visual Question Answering via Information Fusion |
提出InfoSel,通过信息融合实现黑盒模型集成,提升文本和视觉问答性能 |
large language model multimodal |
|
|
| 18 |
ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents |
ChatSOP:一种SOP引导的MCTS规划框架,用于可控LLM对话Agent |
large language model chain-of-thought |
|
|
| 19 |
STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering |
提出STOC-TOT以解决多跳问答中的复杂推理问题 |
large language model chain-of-thought |
|
|
| 20 |
Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM |
提出AMRS^3方法以提升句法简化任务的性能 |
large language model instruction following |
|
|
| 21 |
Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems |
提出ZPS:一种基于约束引导的多智能体系统,用于解决斑马难题 |
large language model chain-of-thought |
|
|
| 22 |
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs |
利用未来事件作为后门触发器:研究LLM中的时间脆弱性 |
large language model |
|
|
| 23 |
Improving Self Consistency in LLMs through Probabilistic Tokenization |
利用概率分词提升大型语言模型在推理任务中的自洽性 |
large language model |
|
|
| 24 |
Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms |
评估大语言模型在维基百科中应用中立性原则的能力与偏差 |
large language model |
|
|
| 25 |
Unlocking the Potential of Model Merging for Low-Resource Languages |
提出模型融合方法,解决低资源语言LLM任务能力不足问题 |
large language model |
|
|
| 26 |
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs |
LLM-jp:一个用于研发完全开源日语LLM的跨组织项目 |
large language model |
|
|
| 27 |
HYBRINFOX at CheckThat! 2024 -- Task 1: Enhancing Language Models with Structured Information for Check-Worthiness Estimation |
HYBRINFOX团队提出一种融合结构化信息增强语言模型的方法,用于评估新闻报道的查证价值。 |
large language model |
|
|
| 28 |
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction |
提出SWiM评估框架与Medoid Voting推理方法,提升长文本语言模型中间信息利用率。 |
large language model |
✅ |
|
| 29 |
HAF-RM: A Hybrid Alignment Framework for Reward Model Training |
提出混合对齐框架HAF-RM,提升奖励模型训练效果与对齐能力 |
large language model |
|
|
| 30 |
Defense Against Syntactic Textual Backdoor Attacks with Token Substitution |
提出基于Token替换的在线防御算法,有效对抗文本后门攻击 |
large language model |
|
|
| 31 |
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring |
揭示LLM评分过程:剖析自动评分中LLM与人类评分者的差异 |
large language model |
|
|
| 32 |
A Survey on Natural Language Counterfactual Generation |
综述自然语言反事实生成技术,着重分析基于大语言模型的方法。 |
large language model |
|
|
| 33 |
A framework for annotating and modelling intentions behind metaphor use |
提出隐喻意图分类体系并构建数据集,评估LLM在此任务上的表现 |
large language model |
|
|
| 34 |
Automated Progressive Red Teaming |
提出自动化渐进式红队测试框架APRT,有效识别大语言模型潜在风险。 |
large language model |
|
|
| 35 |
Anthropocentric bias in language model evaluation |
揭示并缓解语言模型评估中以人类为中心的偏见,提升评估的客观性和准确性 |
large language model |
|
|
| 36 |
GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels |
对比人类译员,全面评估GPT-4在多语言、领域和专业水平下的翻译质量 |
large language model |
|
|
| 37 |
DSLR: Document Refinement with Sentence-Level Re-ranking and Reconstruction to Enhance Retrieval-Augmented Generation |
提出DSLR框架,通过句子级别重排序和重构优化RAG系统中的文档检索。 |
large language model |
|
|
| 38 |
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks |
提出问题分析提示(QAP)方法,提升LLM在推理任务中的性能 |
chain-of-thought |
|
|
| 39 |
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model |
提出可注入重对齐模型(IRM),揭示Llama 2模型内部神经元与对齐行为的关联性。 |
large language model |
✅ |
|
| 40 |
Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction |
构建十九世纪拉丁美洲西班牙语报纸语料库,并提出基于LLM的OCR纠错框架。 |
large language model |
|
|
| 41 |
Core: Robust Factual Precision with Informative Sub-Claim Identification |
提出Core,通过信息性子声明识别增强大语言模型事实精确度评估的鲁棒性 |
large language model |
|
|