| 1 |
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression |
提出HMPO,通过混合中值策略优化实现CoT压缩,降低推理开销。 |
reinforcement learning large language model instruction following |
|
|
| 2 |
Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design |
综述晶体材料逆向设计中生成模型、多模态学习和闭环工作流程的最新进展。 |
reinforcement learning latent optimization multimodal |
|
|
| 3 |
Policy and World Modeling Co-Training for Language Agents |
提出PaW框架,通过策略与世界建模的协同训练提升语言智能体的性能 |
reinforcement learning world model world models |
|
|
| 4 |
IMWM: Intuition Models Complement World Models for Latent Planning |
IMWM:结合直觉模型与世界模型进行潜在空间规划,提升像素级控制任务性能 |
world model world models |
|
|
| 5 |
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents |
OpenWebRL:探索视觉Web Agent在线多轮强化学习,刷新开源SOTA |
reinforcement learning multimodal |
|
|
| 6 |
Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints |
提出不确定性感知图神经网络,用于城市稀疏传感器下的温度场重建,并考虑部署约束。 |
MAE sparse sensors |
|
|
| 7 |
TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks |
TabPrep:弥合表格基准测试中特征工程的差距,提升模型性能。 |
world model world models foundation model |
✅ |
|
| 8 |
Task-Induced Representational Invariances Depend on Learning Objective in Deep RL |
深度强化学习中任务诱导的表征不变性依赖于学习目标 |
reinforcement learning PPO OMOMO |
|
|
| 9 |
Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation |
揭示DMD学生模型“抄袭”现象:高维蒸馏中几何自由度受限导致 |
distillation |
|
|
| 10 |
On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching |
提出基于敏感度条件的伯努利流匹配方法,提升拓扑优化中模型的泛化能力 |
flow matching |
✅ |
|
| 11 |
A Theoretical Framework for Self-Play Theorem Proving Algorithms |
提出自博弈定理证明算法的理论框架,解决复杂定理生成问题。 |
contrastive learning large language model |
|
|
| 12 |
Quantifying the Energy Floor: Direct Measurement and Replay Buffer Bias in SAC-Based HVAC Control on sbsim |
量化能源下限:SAC在sbsim上HVAC控制的直接测量与回放缓冲区偏差分析 |
SAC |
|
|
| 13 |
FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment |
FedMTFI:异构联邦学习中基于特征重要性的优化多教师知识蒸馏 |
distillation |
|
|
| 14 |
Flexible Online Representation Learning Based on Similarity Matching |
提出基于相似性匹配的灵活在线表示学习算法,适用于聚类、流形平铺和稀疏编码。 |
representation learning |
|
|
| 15 |
VLBM: Variational Latent Basis Modeling for OOD Robust Multivariate Time Series Forecasting |
提出VLBM以解决多变量时间序列预测中的OOD鲁棒性问题 |
MAE PULSE |
✅ |
|