| 1 |
Hypercube Policy Regularization Framework for Offline Reinforcement Learning |
提出超立方体策略正则化框架,提升离线强化学习在低质量数据集上的性能 |
reinforcement learning TD3 offline reinforcement learning |
|
|
| 2 |
Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning |
提出约束潜在动作策略以解决离线强化学习中的样本外问题 |
reinforcement learning policy learning offline reinforcement learning |
|
|
| 3 |
Pruning the Path to Optimal Care: Identifying Systematically Suboptimal Medical Decision-Making with Inverse Reinforcement Learning |
利用逆强化学习识别ICU中系统性次优医疗决策 |
reinforcement learning inverse reinforcement learning |
|
|
| 4 |
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations |
提出基于事后重生的强化学习方法,提升交互式对话Agent在心理健康支持和慈善捐赠场景下的表现。 |
reinforcement learning offline reinforcement learning large language model |
|
|
| 5 |
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF |
提出KL正则化的锐利分析以优化上下文赌博机和人类反馈强化学习 |
reinforcement learning policy learning offline RL |
|
|
| 6 |
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization |
提出一种改进的偏好优化流程,通过优化数据生成和预算控制正则化提升LLM对齐效果 |
DPO direct preference optimization large language model |
|
|
| 7 |
Scaling Laws for Pre-training Agents and World Models |
揭示预训练Agent和World Model的Scaling Laws,优化模型规模与数据配比 |
imitation learning world model |
|
|
| 8 |
Performative Reinforcement Learning with Linear Markov Decision Process |
针对线性MDP下的Performative强化学习,提出重复正则化优化方法并证明其收敛性。 |
reinforcement learning |
|
|
| 9 |
Watermarking Language Models through Language Models |
提出一种基于提示的语言模型水印框架,无需访问模型内部即可实现溯源与监管。 |
distillation large language model |
|
|
| 10 |
Fed-LDR: Federated Local Data-infused Graph Creation with Node-centric Model Refinement |
Fed-LDR:联邦学习框架下融合局部数据的图神经网络,用于城市时空数据分析。 |
MAE spatial relationship |
|
|
| 11 |
Semantic-Aware Resource Management for C-V2X Platooning via Multi-Agent Reinforcement Learning |
提出语义感知资源管理方法以解决C-V2X车队通信问题 |
reinforcement learning |
|
|
| 12 |
Comparing Fairness of Generative Mobility Models |
提出评估生成式出行模型公平性的框架,揭示模型精度与公平性的权衡。 |
predictive model spatiotemporal |
|
|
| 13 |
Solving Hidden Monotone Variational Inequalities with Surrogate Losses |
提出基于替代损失的算法,解决深度学习中隐藏单调变分不等式问题 |
reinforcement learning deep reinforcement learning |
|
|
| 14 |
Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks |
提出零样本时序分辨率域适应方法,解决SNN在不同时序分辨率数据下的性能下降问题。 |
SSM state space model |
|
|