| 1 |
Meeseeks: A Feedback-Driven, Iterative Self-Correction Benchmark evaluating LLMs' Instruction Following Capability |
Meeseeks:一个反馈驱动的迭代自纠正基准,用于评估LLM的指令遵循能力 |
large language model instruction following chain-of-thought |
✅ |
|
| 2 |
On the Failure of Latent State Persistence in Large Language Models |
揭示大语言模型在维持潜在状态持久性方面的不足 |
large language model |
|
|
| 3 |
Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models |
利用微调大语言模型分析古代和中世纪小说中的文学母题 |
large language model |
|
|
| 4 |
Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring? |
研究表明,基于Prompt的大语言模型在作文评分中会识别学生人口统计信息并引入偏见。 |
large language model |
|
|
| 5 |
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges |
提出基于贝叶斯推断的LLM评估方法,解决小样本评估中的置信度问题 |
large language model |
|
|
| 6 |
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling |
GDI-Bench:一个视觉与推理解耦的通用文档智能基准 |
large language model multimodal |
✅ |
|
| 7 |
Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 |
提出基于Exaone 3.5的文本到SQL生成事实一致性评估框架,用于商业智能领域。 |
large language model |
|
|
| 8 |
Clustering Internet Memes Through Template Matching and Multi-Dimensional Similarity |
提出基于模板匹配和多维相似性的互联网模因聚类方法,无需预定义数据库并提升聚类效果。 |
multimodal |
|
|
| 9 |
Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications |
综述:通过心理测量工具、数据集和人机应用来理解和“人性化”大型语言模型 |
large language model |
|
|
| 10 |
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs |
研究表明LLM在推理长度上存在校准问题,对简单问题过度思考,对难题思考不足。 |
large language model |
|
|
| 11 |
Fine-Tuning LLMs for Low-Resource Dialect Translation: The Case of Lebanese |
针对低资源黎巴嫩方言翻译,提出基于文化数据微调LLM的方法 |
large language model |
|
|
| 12 |
RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations |
提出基于RDF的框架,用于评估多语言LLM在知识冲突下的质量。 |
large language model |
|
|
| 13 |
Memorization and Knowledge Injection in Gated LLMs |
MEGa:门控LLM中嵌入记忆与知识注入,解决持续学习中的灾难性遗忘问题 |
large language model |
|
|
| 14 |
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models |
AdaptMI:面向小语言模型的自适应技能型上下文数学指令学习 |
large language model |
|
|
| 15 |
A Report on the llms evaluating the high school questions |
评估大型语言模型在解决高中科学问题中的表现及教育应用潜力 |
large language model |
|
|
| 16 |
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models |
针对LLaMA模型的Spike感知混合精度量化策略,提升量化性能。 |
large language model |
|
|
| 17 |
Who Gets the Callback? Generative AI and Gender Bias |
通过审计开源LLM揭示招聘中的性别偏见,尤其在高薪职位上男性更受青睐。 |
large language model |
|
|