| 1 |
Limited Linguistic Diversity in Embodied AI Datasets |
分析具身AI数据集的语言多样性,揭示指令重复性问题并提出改进方向。 |
embodied AI vision-language-action VLA |
|
|
| 2 |
Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective |
揭示思维链推理的解耦效应:基于人类标注变异的视角 |
chain-of-thought |
|
|
| 3 |
Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models |
提出基于语用推理的道德敏感性增强方法,提升大语言模型道德判断与纠错能力 |
large language model |
|
|
| 4 |
NorwAI's Large Language Models: Technical Report |
NorwAI发布挪威语大型语言模型,提升斯堪的纳维亚语种NLP能力 |
large language model |
|
|
| 5 |
MedDialogRubrics: A Comprehensive Benchmark and Evaluation Framework for Multi-turn Medical Consultations in Large Language Models |
MedDialogRubrics:构建多轮医疗咨询的综合评测基准与框架,提升医学LLM诊断能力 |
large language model |
|
|
| 6 |
MMFormalizer: Multimodal Autoformalization in the Wild |
MMFormalizer:提出一种多模态自动形式化方法,解决物理世界中数学推理的挑战。 |
multimodal |
|
|
| 7 |
Beyond the Black Box: Theory and Mechanism of Large Language Models |
构建LLM理论框架:生命周期视角下的理论与机制综述 |
large language model |
|
|
| 8 |
Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration |
利用语音基础模型中的线性脚本表示实现零样本转写 |
foundation model |
|
|
| 9 |
The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture |
对比中美大语言模型在中文文化理解上的差异与表现 |
large language model |
|
|
| 10 |
Punctuation-aware Hybrid Trainable Sparse Attention for Large Language Models |
提出Punctuation-aware Hybrid Sparse Attention (PHSA),提升长文本建模中稀疏注意力机制的性能。 |
large language model |
|
|
| 11 |
EComStage: Stage-wise and Orientation-specific Benchmarking for Large Language Models in E-commerce |
EComStage:电商大语言模型分阶段、面向场景的综合评测基准 |
large language model |
|
|
| 12 |
Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration |
提出一种多领域校准的迭代结构化剪枝方法,用于压缩大型语言模型。 |
large language model |
|
|
| 13 |
Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking |
FactArena:提出全面分阶段评测大语言模型在事实核查中表现的自动化框架 |
large language model |
|
|
| 14 |
Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage |
提出早期停止算法,缓解长解码阶段稀疏注意力导致的序列长度增加问题 |
large language model |
|
|
| 15 |
Self-Verification is All You Need To Pass The Japanese Bar Examination |
提出基于自验证的LLM,首次通过日本律师资格考试 |
large language model |
|
|
| 16 |
Mechanistic Knobs in LLMs: Retrieving and Steering High-Order Semantic Features via Sparse Autoencoders |
提出基于稀疏自编码器的框架,用于检索和操控大语言模型中的高阶语义特征。 |
large language model |
|
|
| 17 |
LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark |
提出LongBench Pro,一个更真实全面的双语长文本评估基准,用于评估LLM的长文本理解能力。 |
large language model |
|
|
| 18 |
TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents |
TiMem:面向长程对话Agent的时序分层记忆整合框架 |
large language model |
|
|
| 19 |
Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning |
提出PALU框架,通过局部熵最大化实现高效且低损的大语言模型定向遗忘。 |
large language model |
|
|
| 20 |
Detecting Hallucinations in Retrieval-Augmented Generation via Semantic-level Internal Reasoning Graph |
提出基于语义级内部推理图的RAG幻觉检测方法,提升事实一致性。 |
large language model |
|
|
| 21 |
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners |
揭示大语言推理模型多语言潜在推理能力:并非完全多语言,存在以英语为中心的倾向 |
chain-of-thought |
|
|
| 22 |
Stable-RAG: Mitigating Retrieval-Permutation-Induced Hallucinations in Retrieval-Augmented Generation |
提出Stable-RAG以缓解RAG中检索排序引起的幻觉问题 |
large language model |
|
|
| 23 |
Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy |
提出System-2策略,提升LLM在大规模计数任务中的准确性 |
large language model |
|
|
| 24 |
LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation |
提出LLM增强的变点检测框架,实现集成检测与自动解释。 |
large language model |
|
|
| 25 |
Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion |
提出DeLP指标与DELTA框架,解决多语言RAG系统中由评估偏差导致的语言偏好问题。 |
large language model |
|
|
| 26 |
Revisiting Data Compression with Language Modeling |
利用大型语言模型改进数据压缩,在enwik9数据集上取得SOTA |
large language model |
|
|
| 27 |
To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs |
提出逆社会人口提示以解决LLMs文化对齐问题 |
large language model |
|
|
| 28 |
SYNAPSE: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation |
SYNAPSE:通过激活扩散赋能LLM Agent以情景-语义记忆,解决长期记忆的断连问题。 |
large language model |
|
|
| 29 |
EvoRoute: Experience-Driven Self-Routing LLM Agent Systems |
EvoRoute:提出经验驱动的自路由LLM Agent系统,解决Agent系统三难困境 |
large language model |
|
|