| 1 |
Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning |
提出Bootstrapped Flow Q-Learning以解决离线强化学习中的计算复杂性问题 |
reinforcement learning policy learning offline reinforcement learning |
|
|
| 2 |
When to Align, When to Predict: A Phase Diagram for Multimodal Learning |
提出统一框架以优化多模态学习中的对齐与预测 |
representation learning multimodal |
✅ |
|
| 3 |
Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning |
提出QGF算法以解决强化学习中的政策优化问题 |
reinforcement learning policy learning offline RL |
|
|
| 4 |
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning |
提出TRACE框架以解决多轮强化学习中的预算分配问题 |
reinforcement learning large language model |
|
|
| 5 |
AuRA: Internalizing Audio Understanding into LLMs as LoRA |
提出AuRA以解决音频理解与大语言模型结合的效率问题 |
distillation large language model multimodal |
|
|
| 6 |
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models |
提出Flow-DPPO以解决流匹配模型的策略优化问题 |
reinforcement learning PPO flow matching |
✅ |
|
| 7 |
Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations |
提出FPQC-SAC以解决低信噪比金融强化学习中的偏差问题 |
reinforcement learning deep reinforcement learning SAC |
✅ |
|
| 8 |
Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation |
提出AR-OPD以解决教师-学生模型间的推理不匹配问题 |
teacher-student distillation privileged information |
|
|
| 9 |
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning |
提出CPPO以解决LLM强化学习中的信任区域问题 |
reinforcement learning PPO |
|
|
| 10 |
Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication |
提出深度强化学习框架以优化半导体制造中的长时间控制问题 |
reinforcement learning deep reinforcement learning |
|
|
| 11 |
One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data |
提出多尺度残差感知表示学习管道以改进时间序列预测 |
representation learning MAE |
|
|
| 12 |
Machine Learning Methods for Studying Latent Neural Activity Dynamics |
综述潜在神经活动动态的机器学习方法 |
latent dynamics contrastive learning foundation model |
|
|
| 13 |
On-sky demonstration of reinforcement learning for adaptive optics control |
提出PO4AO以解决自适应光学控制中的实时优化问题 |
reinforcement learning |
|
|
| 14 |
Geometry-Aware Reinforcement Learning for 2D Irregular Nesting |
提出几何感知强化学习以解决2D不规则排版问题 |
reinforcement learning |
|
|
| 15 |
How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs |
提出FlowTracer以解决大语言模型中的强化学习信用分配问题 |
reinforcement learning large language model |
|
|
| 16 |
Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing |
提出SPGCL以解决图对比学习中正样本利用不足的问题 |
contrastive learning |
|
|
| 17 |
Flexible Flows for Biological Sequence Design |
提出灵活流动模型以优化生物序列设计 |
flow matching classifier-free guidance |
|
|
| 18 |
Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output |
提出基于表示的优势估计以提升人类反馈强化学习效果 |
reinforcement learning RLHF |
|
|