| 1 |
Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models |
Sparse Mamba:通过引入可控性、可观测性和稳定性改进结构化状态空间模型,应用于NLP。 |
Mamba SSM state space model |
|
|
| 2 |
TSO: Self-Training with Scaled Preference Optimization |
TSO:通过缩放偏好优化进行自训练,提升LLM与人类偏好的一致性 |
preference learning DPO direct preference optimization |
|
|
| 3 |
Foundations of Multivariate Distributional Reinforcement Learning |
提出 oracle-free 的多变量分布强化学习算法,解决多目标决策等问题。 |
reinforcement learning representation learning |
|
|
| 4 |
Robust off-policy Reinforcement Learning via Soft Constrained Adversary |
提出基于f-散度约束对抗的鲁棒离线强化学习方法 |
reinforcement learning |
|
|