| 1 |
Learning Dynamics in Continual Pre-Training for Large Language Models |
提出CPT缩放定律,预测大语言模型持续预训练过程中的性能演变。 |
large language model foundation model |
|
|
| 2 |
Reassessing Large Language Model Boolean Query Generation for Systematic Reviews |
重新评估大型语言模型在系统评价中生成布尔查询的能力,强调提示设计和模型选择的关键作用。 |
large language model chain-of-thought |
|
|
| 3 |
EmoMeta: A Multimodal Dataset for Fine-grained Emotion Classification in Chinese Metaphors |
EmoMeta:构建中文隐喻细粒度情感分类多模态数据集,促进情感智能研究。 |
multimodal |
✅ |
|
| 4 |
Large Language Models and Arabic Content: A Review |
综述性研究:大型语言模型在阿拉伯语内容处理中的应用与挑战 |
large language model |
|
|
| 5 |
Characterizing the Investigative Methods of Fictional Detectives with Large Language Models |
利用大型语言模型刻画虚构侦探的调查方法,为计算叙事学提供可扩展的角色分析框架。 |
large language model |
|
|
| 6 |
On the Superimposed Noise Accumulation Problem in Sequential Knowledge Editing of Large Language Models |
提出DeltaEdit以解决大语言模型序贯知识编辑中的噪声累积问题 |
large language model |
|
|
| 7 |
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models |
提出SAS-Bench基准,用于评估大语言模型在短答案评分中的表现,并提供细粒度分析。 |
large language model |
|
|
| 8 |
ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation |
提出ViMRHP:一个用于越南语多模态评论有用性预测的人工智能协同标注基准数据集 |
multimodal |
✅ |
|
| 9 |
One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models |
提出D-STT防御算法,通过单触发token平衡大语言模型的安全性和可用性。 |
large language model |
|
|
| 10 |
Spoken Language Understanding on Unseen Tasks With In-Context Learning |
提出基于随机化标签的任务无关微调方法,提升语音-文本LLM在未见SLU任务上的性能。 |
large language model |
|
|
| 11 |
Re$^2$: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions |
提出Re^2数据集,用于促进同行评审全流程和多轮回复讨论研究。 |
large language model |
|
|
| 12 |
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit |
OnPrem.LLM:一款注重隐私的本地化文档智能工具包 |
large language model |
|
|
| 13 |
Semantic Retention and Extreme Compression in LLMs: Can We Have Both? |
提出SrCr指标,探索剪枝与量化联合优化,提升LLM压缩率与语义保持能力。 |
large language model |
|
|
| 14 |
Are LLMs complicated ethical dilemma analyzers? |
构建伦理困境分析基准,评估大型语言模型在道德推理中的能力与局限性 |
large language model |
|
|
| 15 |
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning |
提出FalseReject数据集,通过结构化推理缓解LLM过度拒绝问题,提升上下文安全性。 |
large language model |
|
|
| 16 |
Benchmarking Retrieval-Augmented Generation for Chemistry |
提出ChemRAG-Bench化学领域RAG基准,提升LLM在化学任务中的性能 |
large language model |
|
|
| 17 |
Concept-Level Explainability for Auditing & Steering LLM Responses |
提出ConceptX,通过概念级解释性实现LLM响应的审核与引导。 |
large language model |
|
|
| 18 |
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution |
ToolACE-DEV:通过分解与进化实现工具学习的自提升 |
large language model |
|
|
| 19 |
QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines |
QUPID:结合架构多样性小模型的韩国搜索引擎相关性提升方案 |
large language model |
|
|
| 20 |
Domain Regeneration: How well do LLMs match syntactic properties of text domains? |
领域再生:评估大型语言模型对文本领域句法属性的匹配程度 |
large language model |
|
|
| 21 |
JobHop: A Large-Scale Dataset of Career Trajectories |
JobHop:发布大规模职业轨迹数据集,助力劳动力市场研究与职业发展预测。 |
large language model |
|
|
| 22 |
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs |
提出SENATOR框架,利用结构熵指导LLM知识缺陷检测与修复 |
large language model |
✅ |
|
| 23 |
HAMLET: Healthcare-focused Adaptive Multilingual Learning Embedding-based Topic Modeling |
HAMLET:一种面向医疗的自适应多语言学习嵌入主题建模方法 |
large language model |
|
|