| 1 |
SEAL: Steerable Reasoning Calibration of Large Language Models for Free |
SEAL:一种免训练的可操纵推理校准方法,提升大语言模型推理效率与准确率 |
large language model chain-of-thought |
✅ |
|
| 2 |
Can Large Language Models Match Tutoring System Adaptivity? A Benchmarking Study |
评估大语言模型在智能辅导系统中的适应性:一项基准研究 |
large language model instruction following |
|
|
| 3 |
A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models |
综述:大型语言模型时代科学发现的假设生成方法 |
large language model multimodal |
|
|
| 4 |
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning |
CoT在基于模式的上下文学习中表现欠佳:揭示其局限性 |
large language model chain-of-thought |
|
|
| 5 |
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning |
评估大型语言模型在代码推理上的泛化能力 |
large language model |
|
|
| 6 |
Less but Better: Parameter-Efficient Fine-Tuning of Large Language Models for Personality Detection |
PersLLM:面向人格检测,高效微调大语言模型的参数 |
large language model |
|
|
| 7 |
Leveraging Large Language Models for Cost-Effective, Multilingual Depression Detection and Severity Assessment |
利用大型语言模型进行高性价比、多语种的抑郁症检测与严重程度评估 |
large language model |
|
|
| 8 |
'Neural howlround' in large language models: a self-reinforcing bias phenomenon, and a dynamic attenuation solution |
提出神经啸叫衰减机制,解决大语言模型中自增强偏差导致的推理失效问题 |
large language model |
|
|
| 9 |
Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic |
通过规则诊断,揭示大语言模型在二整数加法中依赖模式匹配而非真正理解 |
large language model |
✅ |
|
| 10 |
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models |
提出基于领域的大语言模型越狱漏洞分类法,提升对模型安全风险的理解。 |
large language model |
|
|
| 11 |
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models |
系统研究量化对推理语言模型的影响,揭示模型大小、来源和任务难度是关键因素。 |
large language model chain-of-thought |
✅ |
|
| 12 |
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values |
COIG-P:一个高质量、大规模的中文偏好数据集,用于对齐人类价值观 |
large language model multimodal |
✅ |
|
| 13 |
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations |
利用因果归因缓解解释中的奖励欺骗问题 |
large language model chain-of-thought |
|
|
| 14 |
TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context |
提出TathyaNyaya数据集和FactLegalLlama模型,用于印度法律背景下的事实性判决预测与解释。 |
large language model |
|
|
| 15 |
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs |
提出基于TEE的LLM API模型替换检测方案,保障用户权益。 |
large language model |
✅ |
|
| 16 |
Bridging Industrial Expertise and XR with LLM-Powered Conversational Agents |
提出基于LLM的XR工业知识助手,解决工业领域知识传递难题。 |
large language model |
|
|
| 17 |
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs |
提出ValueExploration框架,探索LLM中价值观驱动行为的神经机制 |
large language model |
|
|
| 18 |
Pretraining Language Models for Diachronic Linguistic Change Discovery |
提出一种高效的领域限制预训练方法,用于发现历时语言变化。 |
large language model |
|
|
| 19 |
Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation |
提出基于检索增强生成(RAG)的LLM短答案自动评分框架,提升评分准确性 |
large language model |
|
|
| 20 |
LLM-based Automated Grading with Human-in-the-Loop |
提出GradeHITL框架,利用人机协同提升LLM自动评分的准确性 |
large language model |
|
|
| 21 |
Enhancing NER Performance in Low-Resource Pakistani Languages using Cross-Lingual Data Augmentation |
提出跨语言数据增强方法,提升低资源巴基斯坦语种命名实体识别性能 |
large language model |
|
|
| 22 |
Not All Data Are Unlearned Equally |
揭示LLM中数据频率对不可学习性的影响,强调评估与方法改进 |
large language model |
|
|
| 23 |
Voices of Freelance Professional Writers on AI: Limitations, Expectations, and Fears |
调查AI对自由撰稿人的影响:局限性、期望与担忧 |
large language model |
|
|
| 24 |
Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations |
提出Dialectic-RAG,通过辩证推理增强多语言检索增强语言模型,提升知识利用和鲁棒性。 |
large language model |
|
|
| 25 |
Can LLMs Interpret and Leverage Structured Linguistic Representations? A Case Study with AMRs |
利用AMR结构化信息,提升LLM在长文本任务中的性能 |
large language model |
|
|
| 26 |
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts |
提出Sequential-NIAH基准,评估LLM从长文本中提取序列信息的能力 |
large language model |
|
|