| 1 |
MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration |
MPM-LLM4DSE:利用多模态学习和LLM驱动探索,达到HLS设计空间的帕累托前沿 |
predictive model large language model multimodal |
✅ |
|
| 2 |
Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following |
高精度奖励胜过多样性:提升指令跟随的鲁棒性与泛化性 |
reinforcement learning instruction following |
|
|
| 3 |
Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead |
提出Nightmare Dreamer,通过预测不安全状态进行安全强化学习。 |
reinforcement learning world model dreamer |
|
|
| 4 |
On the Hidden Objective Biases of Group-based Reinforcement Learning |
揭示基于群组强化学习的隐藏目标偏差,为未来设计提供指导 |
reinforcement learning large language model |
|
|
| 5 |
FedKDX: Federated Learning with Negative Knowledge Distillation for Enhanced Healthcare AI Systems |
FedKDX:基于负知识蒸馏的联邦学习框架,提升医疗AI系统性能。 |
contrastive learning distillation |
✅ |
|
| 6 |
TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation |
提出TSSR:一种双阶段交换奖励驱动的强化学习方法,用于字符级SMILES生成。 |
reinforcement learning PPO |
|
|
| 7 |
Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art |
针对非平稳环境,综述安全持续强化学习方法的研究进展。 |
reinforcement learning |
|
|
| 8 |
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights |
DeepWeightFlow:一种用于生成神经网络权重的重定基流匹配方法 |
flow matching |
|
|
| 9 |
AgentOCR: Reimagining Agent History via Optical Self-Compression |
AgentOCR:通过光学自压缩重构Agent历史,提升token效率 |
reinforcement learning large language model |
|
|
| 10 |
Improving Semi-Supervised Contrastive Learning via Entropy-Weighted Confidence Integration of Anchor-Positive Pairs |
提出基于熵加权置信度集成的半监督对比学习方法,提升低标签数据下的分类精度。 |
contrastive learning |
|
|
| 11 |
Not All Steps are Informative: On the Linearity of LLMs' RLVR Training |
揭示LLM的RLVR训练线性特性,提出权重/Logits外推加速训练。 |
reinforcement learning large language model |
|
|