| 1 |
Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning |
提出RoleJudge框架,利用音频大语言模型评估语音角色扮演中角色一致性 |
reinforcement learning large language model multimodal |
|
|
| 2 |
From Alignment to Prediction: A Study of Self-Supervised Learning and Predictive Representation Learning |
提出预测表征学习(PRL)范式,扩展自监督学习至数据分布预测。 |
JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture |
|
|
| 3 |
Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety |
提出基于深度强化学习的驾驶员生理状态自适应自动刹车系统,提升道路安全 |
reinforcement learning deep reinforcement learning |
|
|
| 4 |
Beyond State Consistency: Behavior Consistency in Text-Based World Models |
提出行为一致性奖励(BehR)训练范式,提升文本世界模型与真实环境的功能一致性。 |
world model world models |
|
|
| 5 |
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning |
提出证据感知自校正强化学习,提升放射报告生成的临床一致性。 |
reinforcement learning preference learning |
|
|
| 6 |
FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction |
提出FAST框架,结合注意力机制与状态空间模型,用于时空交通预测。 |
Mamba MAE spatiotemporal |
|
|
| 7 |
A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models |
提出基于KL散度的前向敏感度分析方法,加速混合精度SSM-Transformer模型量化部署。 |
SSM state space model large language model |
✅ |
|
| 8 |
Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO |
提出目标解耦架构,解决多时间尺度PPO中Surrogate Hacking问题 |
reinforcement learning PPO representation learning |
|
|
| 9 |
$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data |
提出$π$-Play,通过特权自蒸馏实现多智能体自博弈,无需外部数据,提升搜索智能体训练效率。 |
distillation privileged information |
|
|
| 10 |
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space |
提出PreRL和DSRL,通过预训练空间强化学习提升LLM推理能力 |
reinforcement learning |
|
|
| 11 |
ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation |
提出MVCrec,通过ID和图视角对比学习及多视角注意力融合提升序列推荐性能。 |
contrastive learning |
✅ |
|
| 12 |
TIP: Token Importance in On-Policy Distillation |
提出TIP:基于Token重要性的On-Policy蒸馏方法,提升训练效率并降低内存占用 |
distillation |
✅ |
|
| 13 |
A Comparative Study of Dynamic Programming and Reinforcement Learning in Finite Horizon Dynamic Pricing |
对比动态规划与强化学习在有限期动态定价中的性能与权衡 |
reinforcement learning |
|
|
| 14 |
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off |
DiPO:解耦困惑度策略优化,实现细粒度的探索-利用权衡,提升LLM推理能力。 |
reinforcement learning large language model |
|
|
| 15 |
Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces |
提出Soft Q(λ),一种基于资格迹的熵正则化强化学习离策略多步方法 |
reinforcement learning |
|
|
| 16 |
EMGFlow: Robust and Efficient Surface Electromyography Synthesis via Flow Matching |
EMGFlow:提出基于Flow Matching的表面肌电信号合成方法,提升数据增强效果和效率。 |
flow matching |
✅ |
|
| 17 |
Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization |
提出基于梯度流形优化的联合表示学习与聚类框架,解决高维数据聚类难题。 |
representation learning |
|
|
| 18 |
Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus |
提出基于潜在共识的多智能体Transformer (CMAT),桥接MARL到SARL,提升多智能体协作性能。 |
reinforcement learning PPO |
✅ |
|