| 1 |
Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models |
重新审视通用Transformer:解构时间序列基础模型的强大基线 |
foundation model |
|
|
| 2 |
On the Non-Identifiability of Steering Vectors in Large Language Models |
揭示大语言模型Steering Vector的非唯一性,挑战现有可解释性方法 |
large language model |
|
|
| 3 |
Rare Event Analysis of Large Language Models |
提出LLM罕见事件分析框架,用于识别和分析模型部署中未曾观察到的显著行为。 |
large language model |
|
|
| 4 |
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models |
NanoQuant:首个实现大语言模型高效Sub-1-Bit量化的后训练量化方法 |
large language model |
|
|
| 5 |
AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models |
提出AsynDBT异步分布式双层调优算法,高效解决大语言模型上下文学习问题。 |
large language model |
|
|
| 6 |
DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters |
提出DiTS:一种基于多模态扩散Transformer的时间序列预测模型,显著提升预测精度。 |
multimodal |
|
|
| 7 |
Live Knowledge Tracing: Real-Time Adaptation using Tabular Foundation Models |
提出基于表格基础模型的实时知识追踪方法,加速预测并避免过拟合。 |
foundation model |
|
|
| 8 |
On the Plasticity and Stability for Post-Training Large Language Models |
提出概率冲突解决(PCR)框架,提升后训练大语言模型的稳定性和可塑性。 |
large language model |
|
|
| 9 |
Adaptive Retrieval helps Reasoning in LLMs -- but mostly if it's not used |
自适应检索增强LLM推理能力,但“不用”比“用”效果更好 |
large language model chain-of-thought |
|
|
| 10 |
XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference |
XShare:协同批内专家共享加速MoE模型推理 |
large language model |
|
|
| 11 |
tLoRA: Efficient Multi-LoRA Training with Elastic Shared Super-Models |
tLoRA:通过弹性共享超模型实现高效的多LoRA训练 |
large language model |
|
|
| 12 |
SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding |
SpecAttn:通过自验证引导的稀疏注意力加速长文本LLM的自推测解码。 |
large language model |
|
|
| 13 |
Collaborative and Efficient Fine-tuning: Leveraging Task Similarity |
提出CoLoRA,利用任务相似性协同高效地微调个性化大模型。 |
foundation model |
|
|
| 14 |
Discrete Adjoint Matching |
提出离散伴随匹配(DAM)算法,用于微调基于连续时间马尔可夫链的离散生成模型。 |
large language model |
|
|
| 15 |
ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs |
ScaleBITS:面向硬件友好的混合精度LLM可扩展比特宽度搜索 |
large language model |
|
|
| 16 |
T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility |
T-STAR:一种用于共享微出行短期概率需求预测的上下文感知Transformer框架 |
multimodal |
|
|
| 17 |
Can LLM Safety Be Ensured by Constraining Parameter Regions? |
评估参数约束法在确保LLM安全性的有效性,发现现有方法难以可靠识别安全区域。 |
large language model |
|
|
| 18 |
Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay |
提出最优学习率调度以解决损失动态建模问题 |
large language model |
|
|
| 19 |
EXACT: Explicit Attribute-Guided Decoding-Time Personalization |
提出EXACT以解决个性化生成中的偏好表示问题 |
large language model |
|
|
| 20 |
Confundo: Learning to Generate Robust Poison for Practical RAG Systems |
Confundo:学习生成鲁棒的RAG系统投毒,提升实际攻击效果 |
large language model |
|
|
| 21 |
Evolutionary Generation of Multi-Agent Systems |
EvoMAS:基于演化算法的多智能体系统自动生成框架,提升复杂任务性能。 |
large language model |
|
|
| 22 |
Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning |
提出基于群体因果反事实策略优化的LLM推理方法,提升推理泛化性 |
large language model |
|
|
| 23 |
Principle-Evolvable Scientific Discovery via Uncertainty Minimization |
PiEvo:通过不确定性最小化实现原理可演化的科学发现 |
large language model |
|
|
| 24 |
Uniform Spectral Growth and Convergence of Muon in LoRA-Style Matrix Factorization |
提出均匀谱增长与收敛机制以优化LoRA风格矩阵分解 |
large language model |
|
|