| 1 |
HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction |
HealthMamba:不确定性感知的时空图状态空间模型,用于有效可靠的医疗设施访问预测 |
Mamba state space model spatiotemporal |
|
|
| 2 |
Constrained Group Relative Policy Optimization |
提出Constrained GRPO,解决带约束的免Critic策略优化问题,提升机器人任务性能。 |
policy learning embodied AI foundation model |
|
|
| 3 |
Path-Guided Flow Matching for Dataset Distillation |
提出路径引导的Flow Matching,用于高效数据集蒸馏,提升下游泛化能力。 |
flow matching distillation |
|
|
| 4 |
Disentangled Representation Learning via Flow Matching |
提出基于Flow Matching的解耦表示学习框架,提升语义对齐和解耦性能。 |
flow matching representation learning |
|
|
| 5 |
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities |
提出ARM机制,通过生成概率改进LLM推理中的强化学习探索,提升多样性。 |
reinforcement learning large language model |
|
|
| 6 |
Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning |
提出Meta-Autointerp方法,用于LLM多智能体强化学习中数据中心的可解释性分析。 |
reinforcement learning large language model |
|
|
| 7 |
On Computation and Reinforcement Learning |
提出计算量受限策略框架,提升强化学习策略的性能和泛化能力 |
reinforcement learning offline RL |
|
|
| 8 |
Distributional Reinforcement Learning with Diffusion Bridge Critics |
提出基于扩散桥Critic的分布强化学习方法DBC,提升连续控制任务性能。 |
reinforcement learning diffusion policy |
|
|
| 9 |
Rewards as Labels: Revisiting RLVR from a Classification Perspective |
提出REAL框架,将可验证奖励视为标签,解决强化学习中梯度误分配和梯度主导问题。 |
reinforcement learning policy learning large language model |
|
|
| 10 |
Mode-Dependent Rectification for Stable PPO Training |
提出Mode-Dependent Rectification,稳定PPO在视觉强化学习中的训练 |
reinforcement learning PPO |
|
|
| 11 |
$f$-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment |
提出基于f散度的LLM对齐算法,提升通用对齐任务性能 |
reinforcement learning |
|
|
| 12 |
Verification of the Implicit World Model in a Generative Model via Adversarial Sequences |
提出对抗序列生成方法,用于验证生成模型在国际象棋领域的隐式世界模型。 |
world model |
|
|
| 13 |
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations |
提出Dr. Kernel,利用强化学习优化Triton内核生成,性能超越现有LLM。 |
reinforcement learning |
|
|
| 14 |
Cross-Domain Offline Policy Adaptation via Selective Transition Correction |
提出选择性转移修正(STC)算法,解决跨领域离线策略迁移中的动态不匹配问题。 |
reinforcement learning policy learning offline RL |
|
|
| 15 |
Learning to Inject: Automated Prompt Injection via Reinforcement Learning |
提出AutoInject,利用强化学习自动生成Prompt注入攻击,提升攻击成功率和迁移性。 |
reinforcement learning |
|
|
| 16 |
CSRv2: Unlocking Ultra-Sparse Embeddings |
CSRv2:解锁超稀疏嵌入,实现高效且高性能的文本和视觉表示 |
representation learning foundation model |
|
|
| 17 |
Steering Large Reasoning Models towards Concise Reasoning via Flow Matching |
FlowSteer:通过流匹配引导大模型生成更简洁的推理过程 |
flow matching |
|
|
| 18 |
A Unified Framework for Rethinking Policy Divergence Measures in GRPO |
提出统一剪切框架以优化GRPO中的策略发散度度量 |
reinforcement learning large language model |
|
|
| 19 |
When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL |
离线目标条件RL中,超参数敏感性并非必然,引导目标函数设计 |
reinforcement learning deep reinforcement learning representation learning |
|
|
| 20 |
A Decomposition-based State Space Model for Multivariate Time-Series Forecasting |
DecompSSM:一种基于分解的状态空间模型用于多元时间序列预测 |
state space model |
|
|
| 21 |
Accelerated Sequential Flow Matching: A Bayesian Filtering Perspective |
提出基于贝叶斯滤波的加速序列流匹配方法,提升实时序列预测效率 |
flow matching |
|
|
| 22 |
ZeroS: Zero-Sum Linear Attention for Efficient Transformers |
提出ZeroS:零和线性注意力机制,提升Transformer效率与性能 |
linear attention |
|
|
| 23 |
Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates |
提出鲁棒神经Lyapunov-障碍证书以解决动态不确定性问题 |
reinforcement learning deep reinforcement learning |
|
|
| 24 |
DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training |
DFPO:通过分布流建模扩展价值函数,实现LLM后训练的鲁棒性和泛化性 |
reinforcement learning PPO |
|
|
| 25 |
Variance Reduction Based Experience Replay for Policy Optimization |
提出基于方差缩减的经验回放方法,提升强化学习策略优化效率 |
reinforcement learning policy learning |
|
|