| 1 |
Rethinking the Potential of Multimodality in Collaborative Problem Solving Diagnosis with Large Language Models |
利用大语言模型,探索多模态数据在协同问题解决诊断中的潜力 |
large language model multimodal |
|
|
| 2 |
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models |
EasyEdit2:一种易于使用的大语言模型行为引导框架 |
large language model |
✅ |
|
| 3 |
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models |
提出AutoNuggetizer框架,利用大语言模型自动化RAG系统的事实抽取与评估。 |
large language model |
|
|
| 4 |
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey |
全面综述大语言模型时代检索增强生成(RAG)的评估方法与框架。 |
large language model |
|
|
| 5 |
Natural Fingerprints of Large Language Models |
揭示大语言模型“自然指纹”:即使同数据集训练,模型输出仍可区分 |
large language model |
|
|
| 6 |
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators |
JETTS基准:评估LLM-Judge在测试时计算扩展中的有效性,揭示其在不同任务中的优劣势。 |
large language model instruction following |
|
|
| 7 |
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation |
CRUST-Bench:一个全面的C到安全Rust转译基准测试,促进遗留C代码的安全迁移。 |
large language model |
✅ |
|
| 8 |
Fully Bayesian Approaches to Topics over Time |
提出全贝叶斯时间主题模型(WBToT),提升主题随时间变化的建模稳定性和事件捕获能力。 |
TAMP |
|
|
| 9 |
On Self-improving Token Embeddings |
提出一种自提升Token嵌入方法,用于增强特定领域文本表示。 |
large language model |
|
|
| 10 |
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges |
对比人类与LLM评估RAG系统支持度,验证GPT-4o作为评估者的可靠性。 |
large language model |
|
|
| 11 |
Kuwain 1.5B: An Arabic SLM via Language Injection |
提出基于语言注入的阿拉伯语SLM,提升性能并保留原有知识 |
large language model |
|
|
| 12 |
Speculative Sampling via Exponential Races |
提出基于指数竞赛的推测采样方法ERSD,加速大语言模型推理。 |
large language model |
|
|
| 13 |
Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection |
通过专家与LLM协同,为零样本性别歧视检测构建定义并提升性能 |
large language model |
|
|
| 14 |
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety |
MrGuard:提出一种多语言推理安全防线,提升通用LLM在多语言环境下的安全性。 |
large language model |
|
|
| 15 |
EvalAgent: Discovering Implicit Evaluation Criteria from the Web |
EvalAgent:从网络挖掘隐含的评估标准,提升语言模型生成质量。 |
large language model |
|
|
| 16 |
The Synthetic Imputation Approach: Generating Optimal Synthetic Texts For Underrepresented Categories In Supervised Classification Tasks |
提出合成插补方法,利用生成式LLM为监督分类任务中代表性不足的类别生成最优合成文本。 |
large language model |
|
|
| 17 |
Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT |
利用ChatGPT评估机器翻译质量:基于LSP翻译错误类型的初步实验 |
large language model |
|
|
| 18 |
Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs |
利用大型语言模型自动生成扩展阅读材料与课程推荐,辅助教育内容创作 |
large language model |
|
|
| 19 |
Efficient Pretraining Length Scaling |
提出PHD-Transformer,实现预训练阶段高效长度扩展并保持推理效率。 |
large language model |
|
|
| 20 |
Evaluating LLMs on Chinese Topic Constructions: A Research Proposal Inspired by Tian et al. (2024) |
提出评估框架,用于考察大型语言模型在中文话题结构和岛屿约束上的语法知识。 |
large language model |
|
|
| 21 |
CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs |
CRAVE:提出一种基于冲突推理的可解释声明验证方法,利用大语言模型提升复杂声明验证的准确性和透明度。 |
large language model |
✅ |
|
| 22 |
Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation |
提出Context-Prior增强的引用生成任务,提升LLM内部和外部知识利用的可信度。 |
large language model |
|
|