| 1 |
Litespark Inference on Consumer CPUs: Custom SIMD Kernels for Ternary Neural Networks |
Litespark-Inference:面向消费级CPU的三元神经网络定制SIMD推理加速 |
large language model |
|
|
| 2 |
The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation |
提出BAIR框架以解决多模态生成中的文本偏差问题 |
large language model multimodal |
|
|
| 3 |
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity |
提出TableVista基准测试,揭示多模态大模型在复杂视觉与结构化表格推理中的性能瓶颈 |
foundation model multimodal |
|
|
| 4 |
Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning |
提出BADIT框架:通过基本能力分解与正交化LoRA专家缓解多任务指令微调中的跨任务干扰 |
large language model |
|
|
| 5 |
Reflections and New Directions for Human-Centered Large Language Models |
提出以人为中心的大语言模型(HCLLM)框架,实现全生命周期的价值对齐与责任部署 |
large language model |
|
|
| 6 |
Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts |
分析大语言模型在情感对话语境下的逻辑一致性与虚假信念易感性 |
large language model |
|
|
| 7 |
Uncovering Entity Identity Confusion in Multimodal Knowledge Editing |
揭示多模态知识编辑中的实体身份混淆问题,并提出基于I-E绑定约束的改进策略 |
multimodal |
|
|
| 8 |
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models |
提出BioTool数据集以增强大语言模型在生物医学领域的工具调用能力 |
large language model |
✅ |
|
| 9 |
Negative Before Positive: Asymmetric Valence Processing in Large Language Models |
揭示大语言模型中情感效价的非对称处理机制:基于激活修补与干预的深度分析 |
large language model |
|
|
| 10 |
IntentGrasp: A Comprehensive Benchmark for Intent Understanding |
提出IntentGrasp基准与意图微调(IFT)方法,显著提升大语言模型的意图理解能力 |
large language model |
|
|
| 11 |
Cognitive Agent Compilation for Explicit Problem Solver Modeling |
提出认知智能体编译(CAC)框架,通过显式建模实现教育场景下的可解释与可控问题求解 |
large language model |
|
|
| 12 |
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text |
提出MELD多任务平衡学习检测器,通过辅助监督与对抗蒸馏提升AI生成文本检测的鲁棒性 |
large language model |
|
|
| 13 |
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits |
提出PCNET,通过概率电路动态干预LLM幻觉问题,提升生成真实性。 |
large language model |
|
|
| 14 |
One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue |
提出TurnGate防御框架,通过响应感知机制识别多轮对话中的隐蔽恶意意图 |
large language model |
✅ |
|
| 15 |
SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair |
提出SmellBench评估框架,量化评估大模型智能体在架构级代码异味修复中的能力 |
large language model |
|
|
| 16 |
Can LLMs Take Retrieved Information with a Grain of Salt? |
提出基于交互设计的上下文确定性校准策略,显著提升LLM对检索信息置信度的判别与响应能力。 |
large language model |
|
|
| 17 |
EMO: Pretraining Mixture of Experts for Emergent Modularity |
提出EMO预训练框架,通过文档级约束实现混合专家模型(MoE)的涌现式模块化 |
large language model |
|
|
| 18 |
Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents |
提出首个LLM深度研究代理引用评估框架,揭示了引用质量与事实准确性之间的严重脱节。 |
large language model |
|
|
| 19 |
Algospeak, Hiding in the Open: The Trade-off Between Legible Meaning and Detection Avoidance |
提出Algospeak评估框架,量化语言规避策略在内容可理解性与检测逃逸间的权衡。 |
large language model |
|
|
| 20 |
Efficient Pre-Training with Token Superposition |
提出Token叠加训练(TST)方法,通过两阶段训练显著提升大模型预训练效率 |
large language model |
|
|
| 21 |
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid? |
提出STALE基准与CUPMem框架,解决LLM智能体在动态环境下的记忆失效与状态更新难题。 |
large language model |
|
|
| 22 |
SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following |
提出SEQUOR基准测试,揭示大模型在长多轮对话中遵循复杂约束的性能瓶颈 |
instruction following |
|
|
| 23 |
Quantifying the Statistical Effect of Rubric Modifications on Human-Autorater Agreement |
量化评估准则修改对人机评分一致性的统计影响,优化LLM作为裁判的评价效能 |
instruction following |
|
|
| 24 |
UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification |
提出UniPrefill框架,通过块级动态稀疏化实现通用长上下文预填充加速 |
large language model |
|
|
| 25 |
Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training |
揭示大模型后训练中静态机制定位的局限性,提出电路演化分析框架以应对参数动态更新挑战 |
large language model |
|
|
| 26 |
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence |
提出PrimeFacts数据集与提取框架,通过LLM去语境化重写事实核查证据以提升自动验证性能。 |
large language model |
|
|
| 27 |
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM |
提出TextPro-SLM:通过输入端对齐策略缩小语音大模型模态鸿沟 |
large language model |
|
|
| 28 |
Evaluation Awareness in Language Models Has Limited Effect on Behaviour |
实证研究表明:大型推理模型中的“评估意识”对模型行为的影响极其有限 |
chain-of-thought |
|
|
| 29 |
A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction |
提出领域专用小型语言模型Olava Extract,以低成本实现超越前沿大模型的合同结构化抽取能力。 |
large language model |
|
|