| 1 |
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback |
UICoder:通过自动反馈微调大型语言模型以生成用户界面代码 |
large language model |
|
|
| 2 |
MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models |
MultiPragEval:首个多语种LLM语用评估基准,考察模型深层语言理解能力 |
large language model |
|
|
| 3 |
Using General Large Language Models to Classify Mathematical Documents |
利用通用大语言模型进行数学文档分类,提升文献检索效率。 |
large language model |
|
|
| 4 |
Test-Time Fairness and Robustness in Large Language Models |
提出分层不变性与数据增强策略,提升大语言模型测试时公平性和鲁棒性 |
large language model |
|
|
| 5 |
Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis |
利用RAG和微调LLM进行任务关键型风险分析,提升效率并发现潜在风险 |
large language model |
|
|
| 6 |
Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing |
对比分析大型语言模型与众包在社交媒体立场标注中的表现,揭示LLM的局限性。 |
large language model |
|
|
| 7 |
Multimodal Belief Prediction |
提出多模态信念预测框架,融合文本与语音信息提升信念识别准确率 |
multimodal |
|
|
| 8 |
Markov Constraint as Large Language Model Surrogate |
提出NgramMarkov约束,利用大语言模型提升约束编程文本生成效率与质量。 |
large language model |
|
|
| 9 |
Large Language Models are Limited in Out-of-Context Knowledge Reasoning |
评估大语言模型在上下文无关知识推理中的局限性 |
large language model |
|
|
| 10 |
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy |
提出REAL采样以解决开放式生成中的事实性与多样性问题 |
large language model |
|
|
| 11 |
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena |
提出Open-LLM-Leaderboard,通过开放式问题评估LLM,解决选择偏差和随机猜测问题。 |
large language model |
✅ |
|
| 12 |
On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations |
提出实体变体鲁棒训练方法,提升文档级关系抽取模型泛化能力 |
large language model |
|
|
| 13 |
VersiCode: Towards Version-controllable Code Generation |
VersiCode:面向版本可控的代码生成任务、数据集与评估指标 |
large language model |
✅ |
|
| 14 |
BertaQA: How Much Do Language Models Know About Local Culture? |
BertaQA:评估语言模型对本地文化知识的掌握程度,揭示跨语言知识迁移现象。 |
large language model |
✅ |
|