| 17 |
Multimodal Latent Reasoning via Predictive Embeddings |
提出Pearl,通过预测嵌入对齐实现多模态隐空间推理,无需显式工具调用。 |
JEPA depth estimation multimodal |
|
|
| 18 |
Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning |
提出VGM$^2$P,通过值引导MeanFlow解决离线多智能体强化学习中的策略学习效率问题。 |
reinforcement learning policy learning behavior cloning |
|
|
| 19 |
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics |
提出CausalVAE插件式模块,提升世界模型的反事实动态预测可靠性 |
world model world models |
|
|
| 20 |
Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization |
MolReAct:基于LLM引导和反应模板约束的强化学习药物先导化合物优化 |
reinforcement learning large language model |
|
|
| 21 |
Less Approximates More: Harmonizing Performance and Confidence Faithfulness via Hybrid Post-Training for High-Stakes Tasks |
提出HyTuning框架,通过混合后训练提升大模型在高风险任务中的置信度可靠性 |
reinforcement learning distillation large language model |
|
|
| 22 |
TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis |
提出TTVS,通过测试时变分合成提升自探索强化学习,解决专业领域监督数据匮乏问题。 |
reinforcement learning |
|
|
| 23 |
QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch |
QaRL:提出Rollout对齐的量化感知强化学习,加速LLM训练并提升稳定性 |
reinforcement learning large language model |
|
|
| 24 |
Structured Distillation of Web Agent Capabilities Enables Generalization |
提出Agent-as-Annotators框架,通过结构化蒸馏提升Web Agent在复杂环境中的泛化能力。 |
distillation |
|
|
| 25 |
MIPT-SSM: Scaling Language Models with $O(1)$ Inference Cache via Phase Transitions |
提出MIPT-SSM以解决语言模型推理效率问题 |
SSM |
|
|
| 26 |
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards |
研究表明:带噪声奖励的强化学习在LLM训练中具有鲁棒性 |
reinforcement learning large language model |
|
|
| 27 |
Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning |
提出基于多智能体Actor-Critic强化学习的灾害社区恐慌缓解方法 |
reinforcement learning |
|
|
| 28 |
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning |
提出CLOVER框架,利用无线通信图增强多智能体强化学习中的值分解。 |
reinforcement learning |
|
|
| 29 |
StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning |
提出StructRL框架以从分布式强化学习中恢复动态规划结构 |
reinforcement learning |
|
|
| 30 |
From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity |
提出FEAT:联邦几何感知校正方法,提升动态异构联邦持续学习中Exemplar Replay性能 |
distillation geometric consistency |
|
|