| 1 |
Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning |
提出ERFSL,利用大语言模型高效搜索多目标强化学习自定义环境下的奖励函数。 |
reinforcement learning large language model |
|
|
| 2 |
Building Math Agents with Multi-Turn Iterative Preference Learning |
提出多轮迭代偏好学习框架,提升数学Agent工具集成推理能力 |
preference learning DPO large language model |
|
|
| 3 |
Do We Trust What They Say or What They Do? A Multimodal User Embedding Provides Personalized Explanations |
提出贡献感知多模态用户嵌入(CAMUE)框架,实现社交网络中个性化可解释的预测。 |
representation learning multimodal |
|
|
| 4 |
Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal |
提出Continual Diffuser (CoD),解决离线强化学习中的持续学习难题。 |
reinforcement learning offline reinforcement learning |
|
|
| 5 |
An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning |
介绍合作多智能体强化学习中集中训练分散执行方法 |
reinforcement learning |
|
|
| 6 |
Unifying Causal Representation Learning with the Invariance Principle |
统一因果表示学习与不变性原则,提升高维数据因果推断能力 |
representation learning |
|
|
| 7 |
Tractable Offline Learning of Regular Decision Processes |
提出新方法以克服离线强化学习中的RDP限制 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 8 |
Independence Constrained Disentangled Representation Learning from Epistemological Perspective |
提出一种基于知识论视角和双层潜在空间的解耦表示学习方法,提升可解释性和控制生成质量。 |
representation learning |
|
|
| 9 |
Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation |
提出一种判别-生成蒸馏方法,用于学习保护隐私的学生网络。 |
distillation |
|
|