| 1 |
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle |
Shuffle-R1:通过数据中心动态重组提升多模态大语言模型强化学习效率 |
reinforcement learning large language model multimodal |
|
|
| 2 |
Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies |
提出MLES:利用多模态LLM辅助进化搜索生成可解释程序化控制策略 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 3 |
Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning |
分析多模态感知对模仿学习中样本复杂度和优化地形的影响 |
imitation learning multimodal |
|
|
| 4 |
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models |
SPaRFT:基于自步强化微调的大语言模型高效学习框架 |
reinforcement learning curriculum learning large language model |
|
|
| 5 |
RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders |
提出基于强化学习的LLM微调方法,利用隐式用户反馈优化对话式推荐系统 |
reinforcement learning PPO RLHF |
|
|
| 6 |
Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling |
提出基于探索性推理的强化学习框架EGPO,提升LLM函数调用能力 |
reinforcement learning large language model chain-of-thought |
|
|
| 7 |
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification |
提出动态微调(DFT)方法,通过修正奖励结构提升SFT泛化能力 |
reinforcement learning offline RL large language model |
✅ |
|
| 8 |
Advanced Hybrid Transformer LSTM Technique with Attention and TS Mixer for Drilling Rate of Penetration Prediction |
提出混合Transformer-LSTM模型,融合注意力机制与TS-Mixer,用于提升钻井ROP预测精度。 |
representation learning penetration |
|
|
| 9 |
FlowState: Sampling Rate Invariant Time Series Forecasting |
FlowState:一种采样率不变的时间序列预测框架,提升泛化性和效率。 |
SSM state space model foundation model |
|
|
| 10 |
Domain-driven Metrics for Reinforcement Learning: A Case Study on Epidemic Control using Agent-based Simulation |
提出领域驱动的强化学习评估指标,用于基于Agent的疫情控制仿真。 |
reinforcement learning |
|
|
| 11 |
R-Zero: Self-Evolving Reasoning LLM from Zero Data |
R-Zero:一种从零数据自进化的推理大语言模型框架 |
reinforcement learning large language model |
|
|
| 12 |
Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas |
提出基于分布式可重构智能超表面天线的抗干扰无线感知方法 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 13 |
R-Zero: Self-Evolving Reasoning LLM from Zero Data |
提出R-Zero以解决自我进化推理模型的数据依赖问题 |
reinforcement learning large language model |
|
|