| 1 |
LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems |
LLMOrbit:大型语言模型循环分类法,应对扩展壁垒并迈向Agentic AI系统 |
PPO RLHF DPO |
|
|
| 2 |
Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment |
提出基于注意力机制的离线强化学习与聚类方法,用于可解释的脓毒症治疗决策支持。 |
reinforcement learning offline reinforcement learning large language model |
|
|
| 3 |
Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression |
FireCastRL:结合时空预测与强化学习的野火主动抑制框架 |
reinforcement learning spatiotemporal |
|
|
| 4 |
RL-BioAug: Label-Efficient Reinforcement Learning for Self-Supervised EEG Representation Learning |
提出RL-BioAug,利用强化学习进行脑电信号自监督表征学习,提升数据增强效果。 |
reinforcement learning representation learning contrastive learning |
✅ |
|
| 5 |
KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning |
提出KAGE-Bench,用于快速评估强化学习中已知轴视觉泛化能力 |
reinforcement learning PPO latent dynamics |
✅ |
|
| 6 |
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow |
Jet-RL:通过统一训练和Rollout精度流实现On-Policy FP8强化学习 |
reinforcement learning large language model |
|
|
| 7 |
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning |
InT:通过自提议干预实现LLM推理中的信用分配 |
reinforcement learning IMoS large language model |
|
|
| 8 |
VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models |
VJEPA:概率世界模型,通过变分联合嵌入预测架构实现稳健的不确定性感知规划。 |
world model representation learning |
|
|
| 9 |
Differentiated Pickup Point Offering for Emission Reduction in Last-Mile Delivery |
提出差异化自提点推荐策略,降低末端配送碳排放 |
reinforcement learning DPO spatial relationship |
|
|
| 10 |
Q-learning with Adjoint Matching |
提出基于伴随匹配的Q学习(QAM),高效优化连续动作空间中的扩散策略。 |
reinforcement learning diffusion policy flow matching |
|
|
| 11 |
Report for NSF Workshop on AI for Electronic Design Automation |
探索AI赋能电子设计自动化:面临挑战与未来机遇 |
reinforcement learning large language model |
✅ |
|
| 12 |
Reinforcement Learning for Opportunistic Routing in Software-Defined LEO-Terrestrial Systems |
提出基于强化学习的机会路由以解决LEO网络数据传输延迟问题 |
reinforcement learning |
|
|
| 13 |
Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning |
提出平均奖励Q学习算法以解决样本复杂度问题 |
reinforcement learning |
|
|
| 14 |
GeoDynamics: A Geometric State-Space Neural Network for Understanding Brain Dynamics on Riemannian Manifolds |
GeoDynamics:一种用于理解黎曼流形上大脑动态的几何状态空间神经网络 |
SSM spatiotemporal |
|
|