| 1 |
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias |
理论分析Transformer学习正则语言识别:训练动态与隐式偏差 |
large language model chain-of-thought |
|
|
| 2 |
When Dynamic Data Selection Meets Data Augmentation |
提出动态数据选择与数据增强统一框架,提升训练效率与泛化性能。 |
multimodal |
|
|
| 3 |
Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free |
提出分组排序旋转(GSR),无需训练优化LLM的量化旋转变换 |
large language model |
|
|
| 4 |
Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling |
提出SDM-InstructGLM,通过相似度-度偏置采样提升InstructGLM在大规模图上的可扩展性。 |
large language model |
|
|