| 14 |
Offline Reinforcement Learning with Penalized Action Noise Injection |
提出PANI:通过惩罚性动作噪声注入提升离线强化学习性能 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 15 |
Uncertainty-aware Reward Design Process |
提出不确定性感知的奖励设计流程URDP,提升强化学习奖励函数设计的效率与质量。 |
reinforcement learning reward design large language model |
|
|
| 16 |
A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control |
提出Forget and Grow算法,通过遗忘早期经验和动态扩展网络解决深度强化学习中的首因偏差问题。 |
reinforcement learning deep reinforcement learning |
|
|
| 17 |
Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations |
提出基于深度强化学习的DRAM均衡器参数优化方法,提升信号完整性。 |
reinforcement learning deep reinforcement learning |
|
|
| 18 |
Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms |
HIPPO:一种用于跨物种蛋白质互作预测的分层多标签对比学习框架 |
contrastive learning zero-shot transfer |
|
|
| 19 |
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning |
ExPO:通过自解释引导的强化学习解锁复杂推理能力 |
reinforcement learning DPO |
✅ |
|
| 20 |
Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks |
研究数据科学家如何通过拼凑法构建预测模型的目标变量,以解决模糊概念建模问题。 |
predictive model |
|
|
| 21 |
Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions |
提出基于多智能体强化学习的动态定价方法,优化供应链策略。 |
reinforcement learning |
|
|
| 22 |
RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes |
提出RLHGNN,利用强化学习驱动的异构图神经网络进行业务流程中的下一活动预测。 |
reinforcement learning |
✅ |
|
| 23 |
On Efficient Bayesian Exploration in Model-Based Reinforcement Learning |
提出基于贝叶斯探索的预测轨迹采样(PTS-BE)方法,提升模型强化学习的数据效率。 |
reinforcement learning |
|
|
| 24 |
Understanding and Improving Length Generalization in Recurrent Models |
针对循环模型长度泛化性不足问题,提出基于状态覆盖的训练干预方法。 |
state space model linear attention |
|
|