| 1 |
Controllable Value Alignment in Large Language Models through Neuron-Level Editing |
NeVA:通过神经元级编辑实现大语言模型中可控的价值观对齐 |
large language model |
|
|
| 2 |
Towards Robust Scaling Laws for Optimizers |
提出优化器鲁棒缩放律,实现不同优化器在LLM预训练中的公平比较。 |
large language model |
|
|
| 3 |
Rational Transductors |
提出Rational Transductors,解决Transformer在序列逻辑和状态追踪上的泛化难题。 |
chain-of-thought |
|
|
| 4 |
Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization |
提出Astro框架以解决LLM后训练量化中的异常值问题 |
large language model |
|
|
| 5 |
Gaussian Match-and-Copy: A Minimalist Benchmark for Studying Transformer Induction |
提出高斯匹配复制基准测试,用于研究Transformer的归纳能力 |
large language model |
|
|
| 6 |
Hyperparameter Transfer Laws for Non-Recurrent Multi-Path Neural Networks |
提出基于图的有效深度概念,实现非循环多路径神经网络超参数的零样本迁移。 |
zero-shot transfer |
|
|
| 7 |
On the Importance of a Multi-Scale Calibration for Quantization |
提出MaCa:一种多尺度校准方法,提升LLM量化在变长输入下的精度。 |
large language model |
|
|
| 8 |
Sign-Based Optimizers Are Effective Under Heavy-Tailed Noise |
SignSGD优化器在重尾噪声下表现优异,为大模型训练提供理论支撑 |
large language model |
|
|
| 9 |
Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control |
ShaPO:通过选择性几何控制提升LLM安全对齐的鲁棒性 |
large language model |
|
|
| 10 |
Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization |
提出并行轨道Transformer,减少同步操作,加速GPU推理。 |
large language model |
|
|