| 1 |
DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving |
DreamerAD:基于潜空间世界模型的自动驾驶高效强化学习 |
reinforcement learning world model world models |
|
|
| 2 |
Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning |
提出基于大语言模型的激励感知奖励设计,用于合作多智能体强化学习 |
reinforcement learning reward design large language model |
|
|
| 3 |
Can we generate portable representations for clinical time series data using LLMs? |
利用LLM生成临床时间序列数据的可迁移表征,提升模型泛化能力 |
predictive model representation learning large language model |
|
|
| 4 |
HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation |
提出HDPO,通过特权自蒸馏解决数学推理中强化学习的“悬崖”问题。 |
reinforcement learning distillation large language model |
|
|
| 5 |
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience |
UI-Voyager:提出一种基于失败经验自进化的GUI智能体,提升移动GUI自动化性能 |
distillation large language model multimodal |
|
|
| 6 |
TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models |
提出TuneShift-KD,通过困惑度差异蒸馏微调模型中的领域知识到新模型。 |
distillation foundation model |
|
|
| 7 |
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents |
CUA-Suite:大规模人工标注视频数据集,助力计算机使用智能体研究 |
world model world models multimodal |
|
|
| 8 |
Marchuk: Efficient Global Weather Forecasting from Mid-Range to Sub-Seasonal Scales via Flow Matching |
Marchuk:基于流匹配的高效全球中长期天气预测模型 |
flow matching |
✅ |
|
| 9 |
CGRL: Causal-Guided Representation Learning for Graph Out-of-Distribution Generalization |
提出CGRL,通过因果引导表示学习提升图神经网络的OOD泛化能力 |
representation learning |
|
|
| 10 |
A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula |
提出基于多轮合成数据和课程学习的强化学习方法,提升代码生成能力。 |
reinforcement learning large language model |
|
|
| 11 |
Causality-Driven Disentangled Representation Learning in Multiplex Graphs |
提出CaDeM框架,通过因果推断解耦多重图中的共享与私有表示 |
representation learning |
|
|
| 12 |
KCLNet: Electrically Equivalence-Oriented Graph Representation Learning for Analog Circuits |
提出KCLNet,用于模拟电路的电气等效性导向图表示学习 |
representation learning |
|
|
| 13 |
Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization |
提出DGO双重引导优化框架,提升LLM在RLVR训练中的经验利用与内化能力 |
reinforcement learning large language model |
|
|
| 14 |
ChargeFlow: Flow-Matching Refinement of Charge-Conditioned Electron Densities |
提出ChargeFlow,通过流匹配细化电荷条件下的电子密度,加速材料科学计算。 |
flow matching |
|
|