| 1 |
CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents |
CORE:基于代码逆向自训练框架与图扩展,提升虚拟代理行为多样性 |
reinforcement learning behavior cloning reward design |
|
|
| 2 |
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics |
MDAgent2:用于分子动力学代码生成和知识问答的大语言模型 |
reinforcement learning large language model |
✅ |
|
| 3 |
Distorted Distributional Policy Evaluation for Offline Reinforcement Learning |
提出扭曲分布策略评估,解决离线强化学习中过度保守问题 |
reinforcement learning DRL offline reinforcement learning |
|
|
| 4 |
SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines |
提出SRAS:一种轻量级强化学习文档选择器,用于边缘原生RAG流水线。 |
reinforcement learning PPO |
|
|
| 5 |
ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense |
ACDZero:基于图嵌入树搜索的自动化网络防御方法 |
reinforcement learning deep reinforcement learning distillation |
|
|
| 6 |
Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning |
评估偏好强化学习中特征依赖噪声的影响,揭示现有噪声鲁棒方法的局限性。 |
reinforcement learning |
|
|
| 7 |
Moments Matter:Stabilizing Policy Optimization using Return Distributions |
利用回报分布矩稳定策略优化,提升连续控制任务的鲁棒性。 |
reinforcement learning deep reinforcement learning PPO |
|
|
| 8 |
UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk |
UnPII:提出一种可量化风险的PII非学习方法,解决LLM中隐私数据删除问题。 |
direct preference optimization large language model |
|
|
| 9 |
Latent Space Element Method |
提出潜空间单元法(LSEM),用于构建可扩展的偏微分方程代理求解器。 |
latent dynamics foundation model |
|
|