| 1 |
Reasoning Beyond Limits: Advances and Open Problems for LLMs |
综述LLM推理能力进展与挑战,聚焦多语言、长文本及无监督推理。 |
reinforcement learning Mamba SSM |
|
|
| 2 |
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation |
MORAL:基于对抗数据增强的模型离线强化学习,提升策略鲁棒性 |
reinforcement learning policy learning offline RL |
|
|
| 3 |
Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks |
提出自适应梯度掩蔽对抗攻击以增强机器人深度强化学习的鲁棒性 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning |
提出STAR:一种状态感知扰动优化方法,提升DRL在对抗环境下的鲁棒性 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 5 |
Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping |
提出LLM-HFBF框架,利用零样本LLM进行强化学习奖励塑造,并纠正人类反馈偏差。 |
reinforcement learning reward shaping large language model |
|
|
| 6 |
World Model Agents with Change-Based Intrinsic Motivation |
探索性奖励驱动的世界模型智能体,提升稀疏奖励环境下的学习效果 |
reinforcement learning world model dreamer |
|
|
| 7 |
Innovative LSGTime Model for Crime Spatiotemporal Prediction Based on MindSpore Framework |
提出基于MindSpore框架的LSGTime模型,用于犯罪时空预测。 |
MAE spatiotemporal |
|
|
| 8 |
Reinforcement Learning for Efficient Toxicity Detection in Competitive Online Video Games |
提出基于强化学习的上下文Bandit算法,高效检测在线游戏中的恶意行为 |
reinforcement learning |
|
|
| 9 |
Cyborg Data: Merging Human with AI Generated Training Data |
提出Cyborg Data:融合人工与AI生成数据,提升自动评分系统效率 |
distillation large language model |
|
|
| 10 |
Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems |
Harmonia:一种基于多智能体强化学习的混合存储系统数据放置与迁移方法 |
reinforcement learning |
|
|
| 11 |
Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets |
CRAFT:通过比较不同数据集,离线学习Ex-BMDPs环境下的有效表征 |
policy learning representation learning |
|
|