| 1 |
AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models |
提出AttriLens-Mol以解决分子属性预测中的推理效率问题 |
reinforcement learning large language model chain-of-thought |
✅ |
|
| 2 |
SVGen: Interpretable Vector Graphics Generation with Large Language Models |
提出SVGen以解决自然语言到SVG图形生成的挑战 |
reinforcement learning curriculum learning large language model |
|
|
| 3 |
Are Large Language Models Dynamic Treatment Planners? An In Silico Study from a Prior Knowledge Injection Angle |
利用大型语言模型优化动态治疗方案以改善临床决策 |
reinforcement learning large language model chain-of-thought |
|
|
| 4 |
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success |
提出VL-DAC以解决现有视觉语言模型训练不足问题 |
reinforcement learning PPO multimodal |
|
|
| 5 |
Emergent time-keeping mechanisms in a deep reinforcement learning agent performing an interval timing task |
提出深度强化学习代理的时间保持机制以解决时间处理问题 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 6 |
FeDaL: Federated Dataset Learning for Time Series Foundation Models |
提出FeDaL以解决时间序列基础模型中的数据集异质性问题 |
representation learning foundation model |
|
|
| 7 |
Dynamic User-controllable Privacy-preserving Few-shot Sensing Framework |
提出PrivCLIP框架以解决用户隐私控制问题 |
contrastive learning motion generation multimodal |
|
|
| 8 |
MambaITD: An Efficient Cross-Modal Mamba Network for Insider Threat Detection |
提出MambaITD以解决内部威胁检测中的多模态融合问题 |
Mamba state space model |
|
|
| 9 |
Symmetric Behavior Regularized Policy Optimization |
提出对称行为正则化策略优化以解决离线强化学习中的分布偏移问题 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 10 |
COPO: Consistency-Aware Policy Optimization |
提出一致性意识的策略优化以解决强化学习中的梯度消失问题 |
reinforcement learning reward design large language model |
✅ |
|
| 11 |
Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment |
提出Agnostics以解决低资源编程语言的后训练问题 |
reinforcement learning large language model |
|
|
| 12 |
Unified Flow Matching for Long Horizon Event Forecasting |
提出统一流匹配框架以解决长时间事件预测问题 |
flow matching |
|
|
| 13 |
Automatic LLM Red Teaming |
提出基于MDP的红队策略以提升LLM安全性 |
reinforcement learning large language model |
|
|
| 14 |
Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation |
提出差分隐私的空中联邦蒸馏框架以提升通信效率与隐私保护 |
distillation |
|
|
| 15 |
WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification |
提出WSS-CL以解决高效机器遗忘问题 |
contrastive learning |
|
|
| 16 |
T3Time: Tri-Modal Time Series Forecasting via Adaptive Multi-Head Alignment and Residual Fusion |
提出T3Time以解决多变量时间序列预测中的适应性不足问题 |
MAE large language model |
✅ |
|
| 17 |
Decoupled Contrastive Learning for Federated Learning |
提出解耦对比学习以解决联邦学习中的数据异质性问题 |
contrastive learning |
|
|