| 1 |
Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking |
提出概率块掩码(PCM)加速VLA强化学习,提升梯度计算效率。 |
reinforcement learning world model world models |
|
|
| 2 |
DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation |
提出DeltaPrompts,通过主动挖掘高差异性提示,提升多模态蒸馏效果。 |
teacher-student distillation multimodal |
|
|
| 3 |
Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds |
提出Mind Dreamer以解决模型基强化学习中的历史束缚问题 |
reinforcement learning world model world models |
|
|
| 4 |
Offline Reinforcement Learning with Universal Horizon Models |
提出通用视野模型以解决离线强化学习中的长期预测问题 |
reinforcement learning offline RL offline reinforcement learning |
✅ |
|
| 5 |
Constrained latent state modeling: A unifying perspective on representation learning under competing constraints |
提出约束隐状态建模,统一视角审视竞争约束下的表征学习 |
representation learning multimodal |
|
|
| 6 |
AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs |
AstraFlow:面向Agentic LLM的数据流强化学习系统 |
reinforcement learning large language model |
|
|
| 7 |
Multi-Fidelity Flow Matching: Cascaded Refinement of PDE Solutions |
提出多重保真度流匹配(MFFM),用于参数化偏微分方程解的级联优化。 |
flow matching spatiotemporal |
|
|
| 8 |
parallelcbf: A composable safety-filter and auditability framework for tensor-parallel reinforcement learning |
ParallelCBF:用于张量并行强化学习的可组合安全过滤器与可审计性框架 |
reinforcement learning behavior cloning |
✅ |
|
| 9 |
BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control |
提出BAPR,结合贝叶斯在线变化检测与鲁棒集成强化学习,解决非平稳连续控制问题。 |
reinforcement learning SAC |
|
|
| 10 |
Looped SSMs: Depth-Recurrence and Input Reshaping for Time Series Classification |
提出循环状态空间模型以提升时间序列分类性能 |
SSM state space model |
|
|
| 11 |
MIND: Decoupling Model-Induced Label Noise via Latent Manifold Disentanglement |
MIND:通过解耦潜在流形来消除模型引入的标签噪声 |
distillation foundation model |
|
|
| 12 |
Dynamics-Level Watermarking of Flow Matching Models with Random Codes |
提出一种基于随机码的流匹配模型动态层水印方法,用于保护生成模型版权。 |
flow matching |
|
|
| 13 |
A Multi-Layer Cloud-IDS Pipeline with LLM and Adaptive Q-Learning Calibration |
提出基于LLM和自适应Q学习校准的多层云IDS流水线,提升云环境安全性。 |
reinforcement learning large language model |
|
|
| 14 |
Tighter Regret Bounds for Contextual Action-Set Reinforcement Learning |
针对上下文动作集强化学习,提出更紧的遗憾界限,提升算法性能。 |
reinforcement learning |
|
|
| 15 |
Pessimistic Risk-Aware Policy Learning in Contextual Bandits |
提出统一分布框架以优化风险感知的离线策略学习 |
policy learning |
|
|
| 16 |
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making |
Ada-Diffuser:用于决策的潜在感知自适应扩散模型,显式建模潜在动态。 |
policy learning latent dynamics |
|
|
| 17 |
VSPO: Vector-Steered Policy Optimization for Behavioral Control |
提出VSPO:通过向量引导策略优化实现语言模型的行为控制 |
distillation reward shaping |
|
|