| 1 |
Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice |
提出双阶段适配器,保证表格基础模型在离散选择预测中的经济有效性 |
distillation foundation model |
|
|
| 2 |
When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control |
RLScale-Bench基准测试揭示:校准后的规则控制器在自适应资源控制中优于主流深度强化学习算法。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization |
提出多任务SAC强化学习框架,用于鲁棒开放量子系统控制,实现时序优化。 |
reinforcement learning SAC PULSE |
|
|
| 4 |
Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning |
GraphGPO:基于图的信用分配方法,提升Agentic强化学习效率 |
reinforcement learning large language model |
|
|
| 5 |
Recursive Flow Matching |
提出递归流匹配(RecFM),加速高精度时空动力学系统建模与预测。 |
flow matching spatiotemporal |
|
|
| 6 |
BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning |
BASIS:利用单次Rollout信息共享进行批量优势估计,提升LLM推理能力 |
reinforcement learning policy learning large language model |
|
|
| 7 |
Causal Representation Learning for Generalisable Recommendation |
提出基于因果表征学习的推荐方法,提升推荐系统在分布偏移下的泛化能力。 |
predictive model representation learning |
|
|
| 8 |
Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation |
提出Teachability-Aware OPD,通过选择可学习的token信号提升On-policy蒸馏效果。 |
teacher-student distillation |
|
|
| 9 |
WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization |
WINDQuant:基于权重信息的神经决策,用于全局混合精度LLM量化 |
reinforcement learning PPO large language model |
|
|
| 10 |
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders |
SAERL:利用稀疏自编码器模型内部信息指导LLM后训练数据工程 |
reinforcement learning large language model |
|
|
| 11 |
Learning Dynamic Graph Representations through Timespan View Contrasts |
提出CLDG和CLDG++框架,通过时序跨度对比学习动态图表示,用于节点分类和异常检测。 |
contrastive learning TAMP |
|
|
| 12 |
Less is More: Early Stopping Rollout for On-Policy Distillation |
提出早期停止Rollout蒸馏方法,解决On-Policy蒸馏中的教师模型退化问题。 |
distillation |
|
|
| 13 |
SQARL: A Size-Agnostic Reinforcement Learning approach for Circuit Allocation in Distributed Quantum Architectures |
提出SQARL:一种规模无关的强化学习方法,用于分布式量子架构中的电路分配 |
reinforcement learning |
|
|
| 14 |
SPHERE-JEPA: Spherical Prediction with Homogeneous Embeddings |
SPHERE-JEPA:通过均匀嵌入的球面预测,提升自监督学习表征质量 |
JEPA |
|
|
| 15 |
Generalist Graph Anomaly Detection via Prototype-Based Distillation |
提出ProMoS,一种基于原型蒸馏的通用图异常检测无监督框架 |
distillation |
|
|
| 16 |
Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts |
提出R2E-IG模型,通过混合专家网络提升车辆路径问题在分布偏移下的泛化能力 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 17 |
Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training |
提出Pilot-Commit框架,通过预算感知的rollout分配,加速基于群组的RL后训练。 |
reinforcement learning large language model |
|
|
| 18 |
Geometry-Aware Contrastive Learning for Few-Shot Automatic Modulation Recognition |
提出DyCo-CL框架,解决少样本自动调制识别中SSL方法的不足。 |
contrastive learning |
|
|
| 19 |
Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards |
提出Focal Reward,解决LLM中基于规则奖励的强化学习训练不平衡问题。 |
reinforcement learning |
|
|
| 20 |
PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design |
PRISM:用于多层薄膜设计的位移编码回归逆谱模型 |
MAE spatial relationship |
|
|
| 21 |
Trust Region Q Adjoint Matching |
提出Trust Region Q-Adjoint Matching,稳定优化预训练流策略的离线强化学习。 |
reinforcement learning offline RL |
|
|
| 22 |
Ratio-Variance Regularized Policy Optimization |
提出R²VPO,通过策略比率方差正则化实现稳定高效的策略优化 |
reinforcement learning PPO |
|
|
| 23 |
Adversarial Training for Robust Coverage Network under Worst-case Facility Losses |
提出双代理深度强化学习框架以解决最大覆盖位置干扰问题 |
reinforcement learning deep reinforcement learning |
|
|