| 1 |
Understanding protein function with a multimodal retrieval-augmented foundation model |
提出PoET-2以解决蛋白质功能预测的挑战 |
representation learning foundation model multimodal |
|
|
| 2 |
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models |
提出AdaPO以解决大规模多模态模型自我评估问题 |
reinforcement learning foundation model multimodal |
|
|
| 3 |
Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies |
提出数据、网络与训练预算策略以提升深度强化学习决策能力 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning |
提出PAC-EIG以解决主动逆强化学习中的可靠性问题 |
reinforcement learning inverse reinforcement learning |
|
|
| 5 |
Reinforcement Learning for Target Zone Blood Glucose Control |
提出强化学习框架以解决1型糖尿病血糖控制问题 |
reinforcement learning policy learning PULSE |
|
|
| 6 |
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning |
提出强化学习方法以解决软件工程中的多轮交互问题 |
reinforcement learning large language model |
|
|
| 7 |
Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach |
提出最小预测充分性模型以优化状态空间模型的选择性问题 |
Mamba SSM state space model |
|
|
| 8 |
Cross-Model Semantics in Representation Learning |
提出结构约束以提升深度网络表示的跨模型兼容性 |
representation learning distillation |
|
|
| 9 |
HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation |
提出HiTeC框架以解决文本属性超图的对比学习问题 |
representation learning contrastive learning |
|
|
| 10 |
Reinforcement Learning in MDPs with Information-Ordered Policies |
提出基于信息有序策略的强化学习算法以优化MDPs |
reinforcement learning |
|
|
| 11 |
Self-Questioning Language Models |
提出自问自答语言模型以提升推理能力 |
reinforcement learning large language model |
|
|
| 12 |
SLA-MORL: SLA-Aware Multi-Objective Reinforcement Learning for HPC Resource Optimization |
提出SLA-MORL以解决云环境中资源优化问题 |
reinforcement learning |
|
|
| 13 |
Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems |
提出物理约束微调流匹配模型以解决逆问题 |
flow matching |
|
|
| 14 |
Pseudo-label Induced Subspace Representation Learning for Robust Out-of-Distribution Detection |
提出伪标签诱导子空间表示学习以解决OOD检测问题 |
representation learning |
|
|
| 15 |
VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision |
提出VRPO以解决噪声监督下的强化学习训练问题 |
reinforcement learning PPO RLHF |
|
|
| 16 |
ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning |
提出在线分布鲁棒强化学习方法以应对训练与部署环境不匹配问题 |
reinforcement learning |
|
|
| 17 |
Increasing Interaction Fidelity: Training Routines for Biomechanical Models in HCI |
提出改进训练方案以提升生物力学模型在HCI中的交互精度 |
reinforcement learning curriculum learning |
|
|