| 1 |
FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions |
FlowLLM:结合LLM与流匹配的晶体材料生成模型,显著提升稳定材料发现效率。 |
flow matching large language model |
|
|
| 2 |
Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation |
提出基于离线强化学习和序列建模的下行链路自适应方法 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 3 |
Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning |
提出Return Augmented Decision Transformer解决离线异构强化学习问题 |
reinforcement learning policy learning decision transformer |
|
|
| 4 |
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback |
提出ONI:利用大语言模型反馈在线生成决策智能体的内在奖励 |
reinforcement learning large language model |
✅ |
|
| 5 |
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval |
提出LeReT框架以提升LLMs的信息检索能力 |
reinforcement learning large language model |
|
|
| 6 |
Offline Behavior Distillation |
提出离线行为蒸馏方法以提高强化学习训练效率 |
reinforcement learning policy learning distillation |
|
|
| 7 |
DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach |
DECRL:一种深度演化聚类联合时序知识图谱表示学习方法 |
representation learning TAMP |
|
|
| 8 |
Resource Governance in Networked Systems via Integrated Variational Autoencoders and Reinforcement Learning |
提出基于VAE和强化学习的资源治理框架,动态调整网络结构优化系统性能。 |
reinforcement learning deep reinforcement learning |
|
|
| 9 |
VPO: Leveraging the Number of Votes in Preference Optimization |
VPO:利用投票数优化偏好,提升语言模型生成质量 |
reinforcement learning RLHF DPO |
|
|
| 10 |
Contrastive Learning and Adversarial Disentanglement for Privacy-Aware Task-Oriented Semantic Communication |
提出CLAD模型,通过对比学习和对抗解耦实现面向任务的隐私保护语义通信。 |
contrastive learning |
|
|
| 11 |
Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm |
提出基于核函数的平均奖励强化学习乐观无悔算法 |
reinforcement learning |
|
|
| 12 |
Mechanistic Interpretability of Reinforcement Learning Agents |
通过剖析强化学习智能体内部机制,揭示其决策过程与潜在偏差 |
reinforcement learning |
|
|
| 13 |
Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation |
提出LoRa-PI算法以解决低秩强化学习问题 |
reinforcement learning |
|
|
| 14 |
Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode |
提出Shadow Mode强化学习,解决物理系统训练难、易损毁问题 |
reinforcement learning |
|
|
| 15 |
Adaptive Network Intervention for Complex Systems: A Hierarchical Graph Reinforcement Learning Approach |
提出层级图强化学习框架HGRL,用于复杂多智能体系统中基于动态网络的干预治理。 |
reinforcement learning |
|
|
| 16 |
Sequential Order-Robust Mamba for Time Series Forecasting |
提出SOR-Mamba,增强Mamba模型在时间序列预测中对通道顺序的鲁棒性。 |
Mamba |
✅ |
|
| 17 |
Higher-order Cross-structural Embedding Model for Time Series Analysis |
提出High-TS模型,通过高阶跨结构嵌入进行时间序列分析。 |
contrastive learning TAMP |
|
|
| 18 |
Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation |
IsCiL:通过可检索技能增量学习实现高效的持续任务适应 |
imitation learning foundation model |
|
|
| 19 |
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences |
COMAL:一种用于对齐LLM与通用偏好的收敛元算法 |
reinforcement learning RLHF |
|
|