| 1 |
An Enhanced Dual Transformer Contrastive Network for Multimodal Sentiment Analysis |
提出双Transformer对比网络DTCN,用于增强多模态情感分析性能。 |
representation learning contrastive learning multimodal |
|
|
| 2 |
Inference-Time Compute Scaling For Flow Matching |
针对Flow Matching,提出保持线性插值的推理时计算缩放方法,提升生成质量。 |
flow matching large language model |
|
|
| 3 |
UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts |
UniRL-Zero:提出融合语言模型和扩散模型专家的统一强化学习框架 |
reinforcement learning multimodal |
✅ |
|
| 4 |
Plasma Shape Control via Zero-shot Generative Reinforcement Learning |
提出基于零样本生成强化学习的等离子体形状控制方法 |
reinforcement learning imitation learning representation learning |
|
|
| 5 |
Diffusion Models as Dataset Distillation Priors |
提出DAP:利用扩散模型先验提升数据集蒸馏的代表性,无需额外训练。 |
distillation foundation model |
|
|
| 6 |
Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning |
提出COMPASS,解决无监督测试时强化学习中LLM奖励估计难题 |
reinforcement learning large language model |
|
|
| 7 |
Fine-tuning Flow Matching Generative Models with Intermediate Feedback |
提出AC-Flow框架,通过中间反馈微调Flow Matching生成模型,提升文图对齐。 |
flow matching reward shaping |
|
|
| 8 |
TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model |
TrajMamba:高效且语义丰富的车辆轨迹预训练模型,解决轨迹数据利用难题。 |
Mamba distillation |
|
|
| 9 |
Provably Optimal Reinforcement Learning under Safety Filtering |
提出安全过滤下的可证明最优强化学习方法,解决安全约束下的性能下降问题 |
reinforcement learning |
|
|
| 10 |
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning |
研究安全强化学习中拉格朗日方法的性能与稳定性,揭示自动更新乘子的挑战与改进方向。 |
reinforcement learning |
✅ |
|
| 11 |
Demystifying Transition Matching: When and Why It Can Beat Flow Matching |
揭示Transition Matching优势:在分离模态和非零方差目标分布下超越Flow Matching |
flow matching |
|
|
| 12 |
Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods |
构建批量精馏异常检测机器学习方法开发所需的大规模开放实验数据集 |
distillation |
|
|
| 13 |
R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning |
提出R2L:一种可靠强化学习方法,保证回报并优化不确定性下的策略。 |
reinforcement learning |
|
|
| 14 |
Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning |
针对不确定性强化学习,提出高效算法以优化风险指标并提升策略性能 |
reinforcement learning |
|
|
| 15 |
Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs |
提出自洽性认证框架,为LLM推理提供统计保证和测试时训练方法。 |
reinforcement learning large language model |
|
|
| 16 |
TabR1: Taming GRPO for tabular reasoning LLMs |
TabR1:提出基于GRPO的表格推理LLM,提升零样本和小样本学习能力 |
reinforcement learning large language model |
|
|
| 17 |
Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks |
利用物理信息神经网络辅助强化学习优化智能电网能量管理 |
reinforcement learning |
|
|
| 18 |
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning |
EvoSyn:面向可验证学习的通用进化数据合成框架 |
reinforcement learning distillation |
|
|