| 1 |
On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage |
提出新的框架以解决离线强化学习中的复杂性问题 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 2 |
Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL |
提出基于度量空间学习的多模态状态估计方法,提升强化学习在噪声环境下的鲁棒性。 |
reinforcement learning multimodal |
|
|
| 3 |
FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client |
提出FedGRPO,通过群体相对奖励在联邦学习中高效优化保护隐私的大模型。 |
reinforcement learning foundation model |
|
|
| 4 |
In-Context Function Learning in Large Language Models |
利用高斯过程视角分析大语言模型的上下文函数学习能力 |
reinforcement learning large language model |
|
|
| 5 |
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels |
DICE:扩散大语言模型擅长生成CUDA内核,性能超越自回归模型。 |
reinforcement learning large language model |
|
|
| 6 |
TS-Memory: Plug-and-Play Memory for Time Series Foundation Models |
提出TS-Memory以解决时间序列模型适应性问题 |
distillation foundation model |
|
|
| 7 |
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation |
提出广义On-Policy蒸馏框架G-OPD,通过奖励外推提升学生模型性能,甚至超越教师模型。 |
reinforcement learning teacher-student distillation |
|
|
| 8 |
Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards |
提出基于真实超算反馈的在线强化学习方法,提升LLM生成高性能HPC代码能力 |
reinforcement learning large language model |
|
|
| 9 |
From Path Signatures to Sequential Modeling: Incremental Signature Contributions for Offline RL |
提出增量签名贡献方法,用于解决离线强化学习中时序敏感控制问题 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 10 |
Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning |
提出TAMPO,将温度控制视为元策略,自适应提升LLM强化学习效果 |
reinforcement learning large language model |
|
|
| 11 |
Unifying Stable Optimization and Reference Regularization in RLHF |
统一稳定优化与参考正则化,提升RLHF对齐效果 |
reinforcement learning preference learning RLHF |
|
|
| 12 |
Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data |
提出Flow-Guided Neural Operator自监督学习框架,提升时间序列数据表征能力 |
flow matching representation learning masked autoencoder |
|
|
| 13 |
Mitigating Mismatch within Reference-based Preference Optimization |
提出Hybrid-DPO以解决直接偏好优化中的不匹配问题 |
DPO direct preference optimization large language model |
|
|
| 14 |
Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training |
提出基于分布判别理论的On-Policy SFT框架,提升LLM泛化能力 |
reinforcement learning offline RL DPO |
✅ |
|
| 15 |
The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics |
提出PhyIP评估协议以解决物理模型适应性问题 |
world model |
|
|
| 16 |
How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics |
研究采样策略对LLM对齐的影响,揭示迭代对齐中的稳定性和风险 |
direct preference optimization large language model |
|
|
| 17 |
KAN-FIF: Spline-Parameterized Lightweight Physics-based Tropical Cyclone Estimation on Meteorological Satellite |
提出KAN-FIF轻量级框架,用于气象卫星上基于物理的热带气旋估计。 |
MAE multimodal |
✅ |
|
| 18 |
Improved state mixing in higher-order and block diagonal linear recurrent networks |
提出高阶和块对角线性循环网络,提升长序列建模的效率与表达能力。 |
Mamba SSM state space model |
|
|
| 19 |
RAM-Net: Expressive Linear Attention with Selectively Addressable Memory |
提出RAM-Net,通过可选择寻址的显式记忆增强线性注意力的表达能力 |
linear attention |
|
|
| 20 |
Latent-Variable Learning of SPDEs via Wiener Chaos |
提出基于Wiener混沌的SPDEs潜变量学习方法,无需噪声数据即可学习随机偏微分方程。 |
latent dynamics spatiotemporal |
|
|