| 1 |
A Practical Introduction to Deep Reinforcement Learning |
深度强化学习教程:以PPO算法为例,提供实用入门指南 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 2 |
Block-Biased Mamba for Long-Range Sequence Processing |
提出Block-Biased Mamba(B2S6)以提升Mamba在长序列任务上的性能。 |
Mamba SSM state space model |
|
|
| 3 |
InfoPO: On Mutual Information Maximization for Large Language Model Alignment |
提出InfoPO,通过互信息最大化提升大语言模型对齐效果 |
direct preference optimization large language model |
|
|
| 4 |
Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations |
提出一种基于少量观测的逆强化学习算法,用于连续空间中的代价函数估计。 |
reinforcement learning inverse reinforcement learning |
|
|
| 5 |
DyGSSM: Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update |
DyGSSM:结合状态空间模型梯度更新的多视角动态图嵌入方法 |
SSM state space model representation learning |
|
|
| 6 |
DSADF: Thinking Fast and Slow for Decision Making |
提出DSADF双系统决策框架,提升强化学习智能体在动态环境中的泛化能力 |
reinforcement learning large language model foundation model |
|
|
| 7 |
Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments |
提出Mamba模型的非结构化剪枝方法,用于资源受限环境下的高效部署 |
Mamba SSM |
|
|
| 8 |
A Multi-scale Representation Learning Framework for Long-Term Time Series Forecasting |
MDMixer:用于长期时间序列预测的多尺度表征学习框架 |
representation learning MAE |
|
|
| 9 |
Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL |
提出FASP框架,解决离线安全强化学习中长时安全和泛化性问题 |
reinforcement learning offline RL |
|
|
| 10 |
Continual Reinforcement Learning via Autoencoder-Driven Task and New Environment Recognition |
提出自编码器驱动的任务与新环境识别方法以解决持续强化学习问题 |
reinforcement learning |
|
|
| 11 |
Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression |
针对边缘AI部署,研究LLM压缩中微调与蒸馏的性能差异 |
distillation |
|
|
| 12 |
Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning |
提出基于影响范围的多智能体强化学习方法,解决稀疏奖励下的信用分配和高效探索问题。 |
reinforcement learning |
|
|
| 13 |
SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models |
SPAT:基于敏感度的多头注意力剪枝方法,提升时间序列预测模型效率。 |
Mamba MAE |
|
|
| 14 |
Low-Complexity Inference in Continual Learning via Compressed Knowledge Transfer |
提出低复杂度推理框架以解决持续学习中的计算成本问题 |
teacher-student distillation |
|
|