| 1 |
Local-Global Multimodal Contrastive Learning for Molecular Property Prediction |
提出LGM-CL框架,通过局部-全局多模态对比学习提升分子性质预测精度。 |
representation learning contrastive learning multimodal |
|
|
| 2 |
Clipping-Free Policy Optimization for Large Language Models |
提出无剪切策略优化以解决大语言模型训练不稳定问题 |
reinforcement learning large language model instruction following |
|
|
| 3 |
Continual Policy Distillation from Distributed Reinforcement Learning Teachers |
提出基于分布式强化学习教师模型的持续策略蒸馏框架,解决终身学习智能体的灾难性遗忘问题。 |
reinforcement learning teacher-student distillation |
|
|
| 4 |
From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning |
提出RLRR框架,通过相对奖励解决基于群体强化学习中的奖励稀疏和不稳定的问题 |
reinforcement learning reward shaping large language model |
|
|
| 5 |
Agile Reinforcement Learning through Separable Neural Architecture |
提出SPAN:一种基于可分离神经架构的敏捷强化学习方法,提升样本效率和策略学习。 |
reinforcement learning deep reinforcement learning policy learning |
|
|
| 6 |
Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning |
提出基于连续约束插值的自动约束策略优化算法,提升离线强化学习性能 |
reinforcement learning offline reinforcement learning behavior cloning |
|
|
| 7 |
Elastic Spectral State Space Models for Budgeted Inference |
提出弹性谱状态空间模型,实现单次训练、任意规模的运行时推理。 |
SSM state space model distillation |
|
|
| 8 |
Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology |
揭示大语言模型中无奖励探索的潜在学习能力,借鉴心理学理论。 |
reinforcement learning large language model |
|
|
| 9 |
Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment |
提出SCIQL,通过鲁棒的风格对齐实现高质量离线强化学习 |
reinforcement learning offline reinforcement learning |
✅ |
|
| 10 |
DRL-Enabled Trajectory Planing for UAV-Assisted VLC: Optimal Altitude and Reward Design |
提出基于DRL的无人机辅助VLC轨迹规划方法,优化飞行高度和奖励函数设计 |
DRL reward design |
|
|
| 11 |
RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning |
提出基于离散化分类Actor和正则化网络的On-Policy强化学习方法,提升连续控制任务性能。 |
reinforcement learning deep reinforcement learning |
|
|
| 12 |
Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation |
通过流图分析和自蒸馏稳定一致性训练,提升生成模型性能 |
policy learning distillation |
|
|
| 13 |
On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care |
提出兼顾镇静镇痛与患者生存的强化学习策略,提升ICU用药安全 |
reinforcement learning deep reinforcement learning |
|
|
| 14 |
CATTO: Balancing Preferences and Confidence in Language Models |
提出CATTO以解决语言模型置信度校准问题 |
DPO direct preference optimization large language model |
|
|
| 15 |
SplineFlow: Flow Matching for Dynamical Systems with B-Spline Interpolants |
SplineFlow:提出基于B样条插值的Flow Matching方法,用于动态系统建模。 |
flow matching |
✅ |
|
| 16 |
OptiMAG: Structure-Semantic Alignment via Unbalanced Optimal Transport |
提出OptiMAG以解决多模态图结构与语义不一致问题 |
representation learning multimodal |
|
|
| 17 |
Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features |
提出级联流匹配模型,用于生成包含混合类型特征的异构表格数据 |
flow matching |
|
|
| 18 |
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning |
MC-GRPO:面向小规模Rollout强化学习的中心化群组相对策略优化 |
reinforcement learning |
✅ |
|
| 19 |
Gradual Fine-Tuning for Flow Matching Models |
提出渐进式微调(GFT)框架,提升Flow Matching模型在分布偏移下的适应性和推理效率。 |
flow matching |
|
|
| 20 |
HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning |
提出HeaPA,通过堆采样和在线查询增强提升LLM强化学习效率。 |
reinforcement learning |
✅ |
|