| 22 |
CTR-Driven Advertising Image Generation with Multimodal Large Language Models |
提出基于多模态大语言模型和CTR优化的广告图像生成方法,提升电商广告效果。 |
reinforcement learning large language model multimodal |
✅ |
|
| 23 |
Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms |
统一框架揭示DPO与RL算法的关联,洞察RLHF算法的内在联系 |
reinforcement learning PPO RLHF |
|
|
| 24 |
Interactive Symbolic Regression through Offline Reinforcement Learning: A Co-Design Framework |
提出Sym-Q:一种基于离线强化学习的交互式符号回归框架,解决表达式搜索难题。 |
reinforcement learning offline reinforcement learning IMoS |
✅ |
|
| 25 |
Double Distillation Network for Multi-Agent Reinforcement Learning |
提出双重蒸馏网络(DDN)以提升多智能体强化学习中的协作策略。 |
reinforcement learning distillation |
|
|
| 26 |
Contrastive Learning for Cold Start Recommendation with Adaptive Feature Fusion |
提出融合对比学习的冷启动推荐模型,解决交互数据稀疏问题 |
contrastive learning multimodal |
|
|
| 27 |
RLOMM: An Efficient and Robust Online Map Matching Framework with Reinforcement Learning |
提出RLOMM,利用强化学习实现高效鲁棒的在线地图匹配 |
reinforcement learning representation learning contrastive learning |
|
|
| 28 |
Teaching Language Models to Critique via Reinforcement Learning |
提出CTRL框架,通过强化学习训练代码生成评论模型,提升LLM代码生成能力。 |
reinforcement learning large language model |
|
|
| 29 |
TopoCL: Topological Contrastive Learning for Time Series |
TopoCL:针对时间序列数据,提出拓扑对比学习方法,提升通用表征能力。 |
representation learning contrastive learning |
|
|
| 30 |
MobiCLR: Mobility Time Series Contrastive Learning for Urban Region Representations |
MobiCLR:提出基于对比学习的城市区域表征模型,挖掘城市流动时序数据。 |
representation learning contrastive learning |
|
|
| 31 |
Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds |
提出AnyMDP及解耦策略蒸馏,实现大规模上下文强化学习的元训练 |
reinforcement learning distillation |
|
|
| 32 |
Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks |
提出任务感知虚拟训练(TAVT),提升元强化学习在分布外任务上的泛化能力 |
reinforcement learning representation learning |
✅ |
|
| 33 |
Elucidating the Preconditioning in Consistency Distillation |
提出Analytic-Precond,通过解析优化预处理加速一致性蒸馏训练。 |
distillation |
|
|
| 34 |
A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems |
提出UKDSL框架,融合知识蒸馏与半监督学习,提升工业广告投放系统性能。 |
distillation |
|
|
| 35 |
Calibrated Unsupervised Anomaly Detection in Multivariate Time-series using Reinforcement Learning |
提出基于强化学习的校准无监督异常检测方法,用于多元时间序列分析。 |
reinforcement learning |
|
|
| 36 |
Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning |
提出乐观ε-贪婪探索算法,解决合作多智能体强化学习中的次优策略问题 |
reinforcement learning |
|
|
| 37 |
Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning |
提出Wolfpack对抗攻击与WALL框架,提升多智能体强化学习的鲁棒性 |
reinforcement learning |
✅ |
|
| 38 |
DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning |
DeepCell:面向电路表示学习的自监督多视图融合框架 |
representation learning |
|
|