| 25 |
Transforming Multimodal Models into Action Models for Radiotherapy |
提出基于少样本强化学习的行动模型,将多模态模型应用于放疗计划。 |
reinforcement learning foundation model multimodal |
|
|
| 26 |
CAST: Cross Attention based multimodal fusion of Structure and Text for materials property prediction |
提出CAST:一种基于交叉注意力的结构-文本多模态融合模型,用于材料属性预测。 |
predictive model MAE multimodal |
|
|
| 27 |
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning |
提出基于行为熵的离线强化学习数据集生成方法,提升复杂连续控制任务性能。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 28 |
Illuminating Spaces: Deep Reinforcement Learning and Laser-Wall Partitioning for Architectural Layout Generation |
提出基于深度强化学习和激光墙分割的建筑布局生成方法 |
reinforcement learning deep reinforcement learning |
|
|
| 29 |
Training Language Models to Reason Efficiently |
提出基于强化学习的推理效率优化方法,降低大语言模型推理成本。 |
reinforcement learning large language model chain-of-thought |
|
|
| 30 |
PILAF: Optimal Human Preference Sampling for Reward Modeling |
提出PILAF,通过优化人类偏好采样提升奖励模型对齐效果 |
reinforcement learning preference learning RLHF |
|
|
| 31 |
Towards Cost-Effective Reward Guided Text Generation |
提出一种新型奖励模型以提高文本生成效率 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 32 |
Fairness Aware Reinforcement Learning via Proximal Policy Optimization |
提出公平强化学习方法Fair-PPO以解决多智能体系统中的公平性问题 |
reinforcement learning PPO |
|
|
| 33 |
Revisiting Intermediate-Layer Matching in Knowledge Distillation: Layer-Selection Strategy Doesn't Matter (Much) |
知识蒸馏中层匹配策略不敏感性研究:层选择策略影响甚微 |
distillation |
|
|
| 34 |
Provable Sample-Efficient Transfer Learning Conditional Diffusion Models via Representation Learning |
提出基于表征学习的条件扩散模型迁移学习理论框架,提升样本效率。 |
representation learning |
|
|
| 35 |
Consistency of augmentation graph and network approximability in contrastive learning |
分析对比学习中数据增强图的一致性和网络可逼近性 |
contrastive learning |
|
|
| 36 |
Orthogonal Representation Learning for Estimating Causal Quantities |
提出正交表示学习以提高因果量估计的效率 |
representation learning |
|
|
| 37 |
Autotelic Reinforcement Learning: Exploring Intrinsic Motivations for Skill Acquisition in Open-Ended Environments |
提出自生强化学习,探索开放环境中基于内在动机的技能获取方法 |
reinforcement learning |
|
|
| 38 |
Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning |
提出深度元协调图以解决多智能体强化学习中的协作策略问题 |
reinforcement learning |
✅ |
|
| 39 |
CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning |
CleanSurvival:利用强化学习自动进行生存分析数据预处理 |
reinforcement learning |
✅ |
|
| 40 |
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks |
提出基于强化学习和图神经网络的框架,用于解决逻辑谜题中的外推推理问题 |
reinforcement learning |
|
|
| 41 |
Online Location Planning for AI-Defined Vehicles: Optimizing Joint Tasks of Order Serving and Spatio-Temporal Heterogeneous Model Fine-Tuning |
提出基于MARL的在线位置规划框架,优化AI车辆订单服务和时空异构模型微调联合任务。 |
reinforcement learning foundation model |
|
|
| 42 |
Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks |
探索课程学习在真实软件工程任务中的有效性:CodeT5模型的初步评估 |
curriculum learning |
|
|
| 43 |
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning |
提出自提升技能学习(SISL),解决技能型元强化学习在噪声离线数据下的不稳定问题。 |
reinforcement learning |
|
|
| 44 |
Learning Reward Machines from Partially Observed Policies |
提出基于前缀树策略的奖励机器学习方法以解决逆强化学习问题 |
reinforcement learning inverse reinforcement learning |
|
|