| 1 |
C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache |
提出C$^3$ache以加速世界动作模型推理 |
world action model world action models vision-language-action |
|
|
| 2 |
Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search |
提出世界模型启发的评估器以优化张量程序搜索 |
world model world models latent dynamics |
|
|
| 3 |
From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model |
提出Cox监督蒸馏方法将生存风险转化为语言模型 |
distillation large language model |
|
|
| 4 |
From Shortcuts to Reasoning: Robust Post-Training of Theory of Mind with Reinforcement Learning |
提出Thinking-RFT以解决ToM模型中的快捷方式问题 |
reinforcement learning foundation model multimodal |
|
|
| 5 |
Stabilizing On-Policy Distillation for MLLM Reasoning with Global Normalization |
提出全球归一化蒸馏策略优化以解决梯度不稳定问题 |
reinforcement learning distillation multimodal |
✅ |
|
| 6 |
Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families |
提出跨模型系列的在政策蒸馏方法以解决tokenizer限制问题 |
teacher-student distillation large language model |
|
|
| 7 |
PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment |
提出PBSD以解决长时间信用分配问题 |
reinforcement learning policy learning distillation |
|
|
| 8 |
Rethinking the Divergence Regularization in LLM RL |
提出DRPO以解决LLM RL中的信任区域优化问题 |
reinforcement learning PPO large language model |
|
|
| 9 |
Addressing Market Regime Changes and Heavy-Tailed Returns in Portfolio Optimization via Bayesian VAR and Elliptical Black-Litterman |
提出BAVAR-BLED算法以解决投资组合优化中的市场状态变化与重尾收益问题 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 10 |
A Unifying Lens on Reward Uncertainty in RLHF |
提出分布式奖励模型以缓解RLHF中的奖励黑客问题 |
reinforcement learning RLHF |
|
|
| 11 |
Escaping the KL Agreement Trap in On-Policy Distillation |
提出KAT以解决在线策略蒸馏中的低KL一致性陷阱问题 |
distillation |
|
|
| 12 |
Graph Mamba Operator: A Latent Simulator for Interacting Particle Systems |
提出Graph Mamba Operator以解决粒子系统建模问题 |
Mamba |
|
|
| 13 |
Distilling Safe LLM Systems via Soft Prompts for On Device Settings |
提出软提示蒸馏方法以解决边缘设备安全LLM部署问题 |
distillation large language model |
|
|
| 14 |
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short |
提出Reasoning Arena以解决可验证奖励不足的问题 |
reinforcement learning large language model |
|
|
| 15 |
Heterophily-Aware Adaptive Knowledge Distillation for Hypergraph Neural Networks |
提出HADES以解决超图神经网络中的异质性问题 |
distillation |
|
|
| 16 |
Safe-RULE: Safe Reinforcement UnLEarning |
提出Safe-RULE以解决离线安全强化学习中的数据中毒问题 |
reinforcement learning policy learning |
|
|
| 17 |
Stage-1 Controls the Entropy Regime, Not the Outcome |
研究Stage-1对熵状态的影响而非结果的控制 |
reinforcement learning distillation |
|
|
| 18 |
Zero Touch Predictive Orchestration: Automating Time-Series Models for the Cloud-Edge Continuum |
提出自动化时间序列预测架构以解决云边缘计算的冷启动问题 |
predictive model MAE |
|
|
| 19 |
Counterfactual Transport Flows for Offline Conservative Trajectory Refinement |
提出反事实传输流以解决离线强化学习中的轨迹优化问题 |
reinforcement learning offline reinforcement learning |
|
|