| 1 |
MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration |
MPM-LLM4DSE:利用多模态学习和LLM驱动探索实现HLS帕累托前沿优化 |
predictive model large language model multimodal |
✅ |
|
| 2 |
Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following |
高精度奖励胜过多样性:提升指令跟随的鲁棒性与泛化能力 |
reinforcement learning instruction following |
|
|
| 3 |
Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead |
提出 Nightmare Dreamer,通过预测不安全状态进行安全强化学习。 |
reinforcement learning world model dreamer |
|
|
| 4 |
On the Hidden Objective Biases of Group-based Reinforcement Learning |
揭示基于群组强化学习的隐藏目标偏差,为未来算法设计提供指导 |
reinforcement learning large language model |
|
|
| 5 |
FedKDX: Federated Learning with Negative Knowledge Distillation for Enhanced Healthcare AI Systems |
FedKDX:基于负知识蒸馏的联邦学习框架,提升医疗AI系统性能。 |
contrastive learning distillation |
✅ |
|
| 6 |
TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation |
提出TSSR:一种双阶段交换奖励驱动的强化学习方法,用于字符级SMILES生成。 |
reinforcement learning PPO |
|
|
| 7 |
Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art |
针对非平稳环境,综述安全持续强化学习方法的研究进展与挑战。 |
reinforcement learning |
|
|
| 8 |
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights |
DeepWeightFlow:基于重定基流匹配的神经网络权重生成方法 |
flow matching |
|
|
| 9 |
AgentOCR: Reimagining Agent History via Optical Self-Compression |
AgentOCR:通过光学自压缩重构Agent历史,提升效率 |
reinforcement learning large language model |
|
|
| 10 |
Improving Semi-Supervised Contrastive Learning via Entropy-Weighted Confidence Integration of Anchor-Positive Pairs |
提出基于熵加权置信度集成的半监督对比学习方法,提升低标签数据下的分类精度。 |
contrastive learning |
|
|
| 11 |
Not All Steps are Informative: On the Linearity of LLMs' RLVR Training |
揭示LLM的RLVR训练线性特性,提出权重/Logits外推加速训练。 |
reinforcement learning large language model |
|
|