| 1 |
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model |
Brain-Semantoks:利用自蒸馏基础模型学习大脑动态的语义Token |
distillation foundation model |
|
|
| 2 |
Learning to Extract Context for Context-Aware LLM Inference |
提出基于强化学习的上下文提取框架,提升LLM在安全任务中的可靠性。 |
reinforcement learning large language model foundation model |
|
|
| 3 |
Fully Inductive Node Representation Learning via Graph View Transformation |
提出图视图变换GVT,实现跨数据集全归纳节点表示学习 |
representation learning foundation model |
|
|
| 4 |
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization |
提出Null-Space约束策略优化(NSPO)以缓解LLM安全对齐中的能力遗忘问题 |
reinforcement learning large language model instruction following |
|
|
| 5 |
Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control |
提出基于多目标强化学习的大规模混合交通控制框架,提升公平性和安全性。 |
reinforcement learning penetration |
|
|
| 6 |
Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits |
提出对称感知策略引导框架,提升等变扩散策略在对称任务中的样本效率和稳定性。 |
reinforcement learning diffusion policy |
|
|
| 7 |
Data Valuation for LLM Fine-Tuning: Efficient Shapley Value Approximation via Language Model Arithmetic |
提出基于语言模型算术的高效Shapley值近似方法,用于LLM微调的数据估值 |
DPO direct preference optimization large language model |
|
|
| 8 |
DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning |
DAPO:基于图对比学习与强化学习的高层次综合中设计结构感知的Pass排序 |
reinforcement learning contrastive learning |
|
|
| 9 |
GraphPerf-RT: A Graph-Driven Performance Model for Hardware-Aware Scheduling of OpenMP Codes |
提出GraphPerf-RT,用于OpenMP代码硬件感知调度的图驱动性能模型。 |
reinforcement learning world model model-based RL |
|
|
| 10 |
Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective |
提出基于测度的统一框架,分析大提示下的Softmax注意力机制。 |
linear attention |
|
|
| 11 |
ReactorFold: Generative discovery of nuclear reactor cores via emergent physical reasoning |
ReactorFold:通过涌现物理推理生成核反应堆堆芯设计 |
DPO direct preference optimization |
|
|