| 1 |
Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning |
简单配方有效:视觉-语言-动作模型是基于强化学习的自然持续学习器 |
reinforcement learning vision-language-action VLA |
|
|
| 2 |
Statistical and structural identifiability in representation learning |
提出统计和结构可辨识性概念,提升表征学习模型的稳定性和可解释性 |
representation learning MAE foundation model |
|
|
| 3 |
ARROW: Augmented Replay for RObust World models |
ARROW:通过增强回放提升世界模型的鲁棒性,解决持续强化学习中的灾难性遗忘问题 |
reinforcement learning world model dreamer |
|
|
| 4 |
FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning |
提出FlexRec以解决动态推荐系统的灵活需求问题 |
reinforcement learning instruction following |
|
|
| 5 |
Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization |
提出混合能量感知奖励塑造(H-EARS),提升强化学习在连续控制中的效率与安全性。 |
reinforcement learning deep reinforcement learning reward shaping |
|
|
| 6 |
IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL |
提出IsoCompute Playbook,优化LLM强化学习中采样计算的分配策略。 |
reinforcement learning large language model |
|
|
| 7 |
AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling |
提出AGMARL-DKS,用于动态Kubernetes调度,提升资源利用率和容错性。 |
reinforcement learning |
|
|
| 8 |
Causal Representation Learning with Optimal Compression under Complex Treatments |
提出基于最优压缩的因果表征学习方法,解决复杂干预下的个体处理效应估计问题 |
representation learning |
|
|
| 9 |
Disentangled Representation Learning through Unsupervised Symmetry Group Discovery |
提出基于无监督对称群发现的解耦表示学习方法 |
representation learning |
|
|
| 10 |
Entropy-Preserving Reinforcement Learning |
提出REPO和ADAPO算法,解决策略梯度算法训练中探索多样性降低的问题 |
reinforcement learning |
|
|
| 11 |
Separable neural architectures as a primitive for unified predictive and generative intelligence |
提出可分离神经架构(SNA),统一预测和生成智能,适用于物理、语言和感知等领域。 |
reinforcement learning spatiotemporal |
|
|
| 12 |
Temporal Straightening for Latent Planning |
提出时序拉直方法,提升世界模型中隐空间规划的性能。 |
world model representation learning |
|
|
| 13 |
Automatic Generation of High-Performance RL Environments |
提出一种自动生成高性能强化学习环境的通用方法,显著降低开发成本和时间。 |
reinforcement learning PPO |
|
|
| 14 |
SpectralGuard: Detecting Memory Collapse Attacks in State Space Models |
提出SpectralGuard,用于检测状态空间模型中的内存崩溃攻击 |
Mamba SSM state space model |
|
|
| 15 |
Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models |
提出TreeKD知识蒸馏方法,提升通用大语言模型在分子性质预测中的性能 |
distillation large language model |
|
|
| 16 |
Thermodynamics of Reinforcement Learning Curricula |
利用非平衡热力学,提出强化学习课程学习的几何框架,优化任务调度。 |
reinforcement learning representation learning curriculum learning |
|
|
| 17 |
Spatial PDE-aware Selective State-space with Nested Memory for Mobile Traffic Grid Forecasting |
提出NeST-S6模型,利用嵌套记忆和空间偏微分方程感知选择性状态空间模型,解决移动流量网格预测问题。 |
Mamba SSM MAE |
|
|
| 18 |
Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching |
提出课程采样方法,通过两阶段训练策略提升Flow Matching模型的训练效率和生成质量。 |
flow matching |
|
|
| 19 |
Probing Length Generalization in Mamba via Image Reconstruction |
通过图像重建探究Mamba模型在长度泛化上的局限性 |
Mamba |
|
|