| 1 |
CIRF: Tokenizing Chain-of-Thoughts into Reusable Functional Units for Efficient Latent Reasoning in Large Language Models |
CIRF将思维链分解为可复用功能单元,提升大语言模型潜在推理效率。 |
large language model chain-of-thought |
|
|
| 2 |
Argument Quality Assessment with Large Language Models: A Pairwise Bradley-Terry Approach |
利用大型语言模型和Bradley-Terry模型进行论证质量评估。 |
large language model chain-of-thought |
|
|
| 3 |
SMILE-Next: Teaching Large Language Models to Detect, Classify, and Reason about Laughter |
提出SMILE-Next以解决真实场景中笑声理解问题 |
large language model multimodal |
✅ |
|
| 4 |
KSAFE-MM: A Multimodal Safety Benchmark via Localized Contextualization for Korean Cultural Risks |
KSAFE-MM:通过本地化情境化构建韩国文化风险多模态安全基准 |
large language model multimodal |
|
|
| 5 |
Reverse Probing: Supervised Token-level Uncertainty Quantification for Large Language Models in Clinical Text |
提出Reverse Probing,用于临床文本中大语言模型的监督式Token级不确定性量化。 |
large language model |
|
|
| 6 |
MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems |
MemTrace:通过可执行的记忆演化图追踪和归因大语言模型记忆系统中的错误 |
large language model |
✅ |
|
| 7 |
IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents |
提出IPO-Mine工具包与数据集,用于结构化分析长篇多模态IPO文件。 |
multimodal |
|
|
| 8 |
Can Large Language Models Handle Discourse Particles? A Case Study of Colloquial Malay |
提出MalayPrag基准,评估LLM处理马来口语语篇助词的能力 |
large language model |
|
|
| 9 |
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning |
提出AXPO,通过探索性策略优化解决多模态Agent推理中的Thinking-Acting Gap问题。 |
multimodal |
|
|
| 10 |
Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning |
研究表明大型语言模型推理中的拟人化反思标记并非必要,可被抑制且不影响性能。 |
large language model |
|
|
| 11 |
IFMTBench: A Comprehensive Benchmark for Multilingual Translation Instruction Following |
提出IFMTBench以解决多语言翻译指令遵循问题 |
instruction following |
✅ |
|
| 12 |
Prompting Is All You Need: Multi-view Prompting Large Language Models for Aspect-Based Sentiment Analysis |
提出LLM-MvP,通过多视角Prompting提升大语言模型在ABSA任务上的性能并降低计算成本。 |
large language model |
|
|
| 13 |
Personality, Role, and Expressive Style in Large Language Models: An Interactionist Analysis |
交互视角下的大语言模型人格、角色与表达风格研究 |
large language model |
|
|
| 14 |
MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models |
MemGuard:通过类型感知记忆管理,防止长程记忆增强大语言模型中的记忆污染 |
large language model |
|
|
| 15 |
ChildEval: When large language models meet children's personalities |
提出ChildEval基准,评估LLM在儿童个性化对话中的表现 |
large language model |
✅ |
|
| 16 |
VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading |
研究表明,在自然阅读中,视觉语言模型(VLM)相比大型语言模型(LLM)可能不会全局性地提升人类对齐。 |
large language model multimodal |
|
|
| 17 |
The Missing Piece in Pre-trained Model Evaluation: Reward-Guided Decoding Unlocks Task-Oriented Behavior Without Parameter Updates |
提出能量引导解码(EBD),无需参数更新即可激活预训练LLM的任务导向行为。 |
large language model instruction following |
|
|
| 18 |
Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation |
提出Context-Preference Activation Steering (CAS)框架,缓解MLLM中的对象幻觉问题 |
large language model multimodal |
|
|
| 19 |
Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study |
研究多语言LLM作为评估器的可靠性,探索不同资源下的优化策略。 |
large language model |
|
|
| 20 |
Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification |
提出功能熵以量化LLM代码生成的不确定性,从而预测代码功能正确性 |
large language model |
|
|
| 21 |
Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization |
提出跨标注者偏好优化CAPO,学习并复现标注者特定解释行为 |
large language model |
|
|
| 22 |
On Compositional Learning Behaviours in Formal Mathematics |
提出S2B-LM基准,研究形式化数学中组合学习行为对定理证明的影响。 |
chain-of-thought |
|
|
| 23 |
Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents |
提出MUTATE基准与ReDNA框架,提升交互式LLM Agent的发散性思维能力 |
large language model |
|
|
| 24 |
FABSVer: Faster Training and Better Self-Verification for LLM Mathematical Reasoning |
FABSVer:加速LLM数学推理训练并提升自验证能力 |
large language model |
|
|
| 25 |
PrunePath: Towards Highly Structured Sparse Language Models |
PrunePath:面向高结构化稀疏语言模型的自适应剪枝框架 |
large language model |
|
|
| 26 |
Framing Matters: Addressing Framing Sensitivity in Decision-Making through Behaviorally-Grounded Value Alignment |
提出Valign方法,通过行为价值对齐解决大语言模型决策中的框架敏感性问题 |
large language model |
|
|
| 27 |
SuperValid: Capability-Aligned OOD Validation for Generalizable Downstream Scaling |
SuperValid:面向可泛化下游扩展的、能力对齐的OOD验证方法 |
large language model |
|
|
| 28 |
DEPART: DEcomposing PARiTy across Multilingual LLMs |
DEPART:解构多语言LLM中的奇偶性差异,揭示性能差异的根本原因。 |
large language model |
|
|
| 29 |
Risk-aware Selective Prompting for Hallucination Mitigation in Large Vision-Language Models |
提出风险感知选择性Prompt方法,缓解大型视觉语言模型中的幻觉问题。 |
visual grounding |
|
|
| 30 |
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models |
提出Meow2X和TRNE框架,无需重训练即可定位并抑制语言模型中的毒性。 |
large language model |
|
|
| 31 |
An Evolutionary Approach for Designing Stable and Highly Expressible Low-Immunogenicity Therapeutic mRNA Sequences |
提出基于BERT和遗传算法的mRNA序列优化框架,提升稳定性和表达效率并降低免疫原性 |
large language model |
|
|
| 32 |
Periodic RoPE for Infinite Context LLMs |
提出Periodic RoPE,解决LLM无限上下文长度下的位置编码退化问题 |
large language model |
✅ |
|
| 33 |
AI Research Agents Narrow Scientific Exploration |
AI研究智能体倾向局部优化,难以有效拓展科学探索的广度。 |
large language model |
|
|
| 34 |
GRADE: Generalizable Reasoning-Aware Dialogue Evaluation for AI Tutors |
GRADE:面向AI辅导的通用推理感知对话评估框架 |
instruction following |
✅ |
|