| 1 |
DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models |
DPN-LE:通过双重人格神经元定位与编辑实现大语言模型的精准人格控制 |
large language model |
|
|
| 2 |
HealthBench Professional: Evaluating Large Language Models on Real Clinician Chats |
HealthBench Professional:评估大型语言模型在真实临床医生对话中的表现 |
large language model |
|
|
| 3 |
ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models |
提出ScaleBox以解决大规模代码验证的准确性与效率问题 |
large language model |
|
|
| 4 |
Exploring Applications of Transfer-State Large Language Models: Cognitive Profiling and Socratic AI Tutoring |
探索迁移状态大语言模型的应用:认知画像与苏格拉底式AI辅导 |
large language model |
|
|
| 5 |
Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception |
研究表明:LLM Agent在城市情感感知中,Persona设定虽稳定但差异有限 |
large language model multimodal |
|
|
| 6 |
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction |
MiniCPM-o 4.5:面向实时全双工全模态交互的轻量级大模型 |
large language model multimodal |
|
|
| 7 |
Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior |
构建认知数字阴影数据集,评估LLM在模拟社会辩论中的表现与偏见 |
large language model |
|
|
| 8 |
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding |
提出TeCoD,利用模板约束解码提升Text-to-SQL在复杂场景下的准确率和效率。 |
large language model |
|
|
| 9 |
Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future |
综述:探讨大型语言模型在同行评审流程中的应用、评估及未来发展 |
large language model |
|
|
| 10 |
Instruction-Guided Poetry Generation in Arabic and Its Dialects |
提出InstructPoet-AR,实现阿拉伯语及其方言中指令引导的可控诗歌生成 |
large language model |
✅ |
|
| 11 |
APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation |
构建高质量英文隐私政策摘要与解读平行语料库APPSI-139,并提出混合框架TCSI-pp-V2。 |
large language model |
✅ |
|
| 12 |
Debiasing Reward Models via Causally Motivated Inference-Time Intervention |
提出因果干预的奖励模型去偏方法,提升大语言模型对齐效果。 |
large language model |
|
|
| 13 |
Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation |
DriftBench:揭示多轮LLM迭代中约束违反问题,并提出知识-违反率(KBV)指标。 |
large language model |
|
|
| 14 |
Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems |
提出基于对象描述推理的LLM方法,提升任务型对话系统中指代消解性能 |
large language model |
|
|
| 15 |
ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training |
ZipCCL:通过通信集合的无损压缩加速LLM训练 |
large language model |
|
|
| 16 |
Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments |
利用LLM分析卢森堡语新闻评论中的语言意识形态,揭示多语社会身份构建 |
large language model |
|
|
| 17 |
RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems |
提出RoadMapper多智能体系统,提升LLM生成科研路线图能力,节省专家时间。 |
large language model |
|
|
| 18 |
Entropy of Ukrainian |
首次对乌克兰语进行熵值测量以评估语言复杂性 |
large language model |
|
|
| 19 |
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO |
Skills-Coach:通过无训练GRPO实现LLM智能体技能的自进化优化 |
large language model |
|
|
| 20 |
A Reproducibility Study of LLM-Based Query Reformulation |
对基于LLM的查询重构方法进行可复现性研究,揭示其在不同检索范式下的性能差异。 |
large language model |
|
|
| 21 |
From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking |
提出LLM驱动的属性图方法,用于提升电商场景下的实体搜索与排序。 |
large language model |
|
|
| 22 |
Emotion-Aware Clickbait Attack in Social Media |
提出情感感知型Clickbait攻击框架,通过优化情感影响绕过现有检测系统。 |
large language model |
|
|
| 23 |
LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human--LLM Judgment Gaps |
研究表明LLM主要捕捉情感标签而非情感不确定性,并提出校准方法缩小人机差距 |
large language model |
|
|
| 24 |
To Diff or Not to Diff? Structure-Aware and Adaptive Output Formats for Efficient LLM-based Code Editing |
提出结构感知自适应编辑方法AdaEdit,提升LLM代码编辑效率并降低成本。 |
large language model |
|
|
| 25 |
What Don't You Understand? Using Large Language Models to Identify and Characterize Student Misconceptions About Challenging Topics |
利用大型语言模型识别并分析学生对生物医学科学难题的误解 |
large language model |
|
|
| 26 |
Exploring Applications of Transfer-State Large Language Models: Cognitive Profiling and Socratic AI Tutoring |
探索迁移状态大语言模型的应用:认知画像与苏格拉底式AI辅导 |
large language model |
|
|
| 27 |
Retrieval-Augmented Reasoning for Chartered Accountancy |
CA-ThinkFlow:面向印度特许会计的检索增强推理框架 |
large language model chain-of-thought |
|
|
| 28 |
ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts |
提出ViLegalNLI,首个大规模越南语法律自然语言推理数据集,促进法律文本理解。 |
large language model |
|
|
| 29 |
Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions |
揭示LLM战略博弈困境:观察、信念与行动间的断裂 |
large language model |
|
|
| 30 |
How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses |
NDBench:评估前沿LLM在神经多样性语境下的适应性及结构性调整 |
large language model |
|
|
| 31 |
Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory |
基于项目反应理论提出LLM自动短答案评分能力评估方法 |
large language model |
|
|
| 32 |
Confidence Estimation in Automatic Short Answer Grading with LLMs |
提出混合置信度框架,提升LLM在自动短答案评分中的可靠性 |
large language model |
|
|
| 33 |
RouteProfile: Elucidating the Design Space of LLM Profiles for Routing |
提出RouteProfile以优化LLM路由性能 |
large language model |
|
|
| 34 |
LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps |
研究表明:大语言模型擅长捕捉情感标签,但难以模拟情感不确定性 |
large language model |
|
|