| 1 |
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization |
Alpha-DPO:通过自适应奖励边际优化直接偏好,提升LLM对齐效果 |
reinforcement learning RLHF DPO |
✅ |
|
| 2 |
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach |
提出基于最大李雅普诺夫指数正则化的Dreamer V3,提升深度强化学习在连续控制任务中的鲁棒性。 |
reinforcement learning deep reinforcement learning dreamer |
|
|
| 3 |
Continual Deep Reinforcement Learning to Prevent Catastrophic Forgetting in Jamming Mitigation |
提出基于PackNet的持续深度强化学习方法,解决抗干扰通信中灾难性遗忘问题。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
BrainGPT: Unleashing the Potential of EEG Generalist Foundation Model by Autoregressive Pre-training |
提出BrainGPT,通过自回归预训练释放脑电图通用基础模型的潜力 |
masked autoencoder foundation model |
|
|
| 5 |
LoLCATs: On Low-Rank Linearizing of Large Language Models |
LoLCATs:通过低秩线性化方法提升大型语言模型的效率与性能 |
linear attention large language model |
|
|
| 6 |
Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning |
利用空间条件增强JEPA,实现更鲁棒高效的表征学习 |
representation learning masked autoencoder MAE |
|
|
| 7 |
HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning |
提出HGAurban,利用异构图自编码器解决城市时空数据学习中的噪声和稀疏性问题。 |
masked autoencoder spatial relationship spatiotemporal |
|
|
| 8 |
Mimetic Initialization Helps State Space Models Learn to Recall |
提出一种模仿初始化方法,提升状态空间模型在记忆任务中的学习能力 |
Mamba state space model |
|
|
| 9 |
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning |
提出分布式强化学习方法以解决高频决策中的性能问题 |
reinforcement learning DRL |
|
|
| 10 |
The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels |
揭示结构化状态空间模型易受干净标签投毒攻击的脆弱性 |
SSM state space model |
|
|
| 11 |
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models |
提出FiFA,通过自动过滤人类反馈数据,提升文本到图像扩散模型的对齐效果。 |
DPO direct preference optimization large language model |
|
|
| 12 |
StatioCL: Contrastive Learning for Time Series via Non-Stationary and Temporal Contrast |
StatioCL:通过非平稳性和时间对比学习提升时间序列表征,解决假阴性样本问题。 |
representation learning contrastive learning |
|
|
| 13 |
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning |
对比DCQN与DTQN在Atari游戏中性能,发现DCQN在速度和多数游戏上更优 |
reinforcement learning |
|
|
| 14 |
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes |
提出RED框架,解决平均奖励MDP中子任务学习和风险感知强化学习问题 |
reinforcement learning |
|
|
| 15 |
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective |
提出lrGAE:基于对比学习的图自编码器框架,为图表示学习建立新基准。 |
contrastive learning |
✅ |
|
| 16 |
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism |
提出基于更紧致的成本悲观和奖励乐观估计的安全强化学习算法,提升Regret上界。 |
reinforcement learning |
|
|
| 17 |
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning |
提出稳定哈达玛记忆,增强强化学习智能体在部分可观测环境下的记忆能力 |
reinforcement learning |
|
|
| 18 |
Learning Linear Attention in Polynomial Time |
提出线性注意力Transformer多项式时间可学习性理论框架,并验证其在有限自动机等任务上的有效性。 |
linear attention |
|
|
| 19 |
Lambda-Skip Connections: the architectural component that prevents Rank Collapse |
提出Lambda-Skip连接,从架构层面预防序列模型中的秩崩溃问题 |
SSM state space model |
|
|