| 1 |
DeepSeek-Inspired Exploration of RL-based LLMs and Synergy with Wireless Networks: A Survey |
探索DeepSeek启发的RL-LLM在无线网络中的应用与协同,提升网络优化与模型部署。 |
reinforcement learning embodied AI large language model |
|
|
| 2 |
Out-of-Context Reasoning in Large Language Models |
研究LLM在训练时学习的公理关系推理能力,并提出轻量级表示学习方法。 |
representation learning large language model |
|
|
| 3 |
Enhance Exploration in Safe Reinforcement Learning with Contrastive Representation Learning |
提出基于对比表示学习的安全强化学习探索增强方法 |
reinforcement learning representation learning contrastive learning |
|
|
| 4 |
PIMRL: Physics-Informed Multi-Scale Recurrent Learning for Burst-Sampled Spatiotemporal Dynamics |
PIMRL:针对突发采样时空动力学的物理信息多尺度循环学习 |
latent dynamics spatiotemporal |
|
|
| 5 |
PluralLLM: Pluralistic Alignment in LLMs via Federated Learning |
PluralLLM:通过联邦学习实现LLM中的多元化对齐 |
reinforcement learning preference learning RLHF |
|
|
| 6 |
From Actions to Words: Towards Abstractive-Textual Policy Summarization in RL |
提出SySLLM框架,利用大语言模型实现强化学习策略的抽象文本总结 |
reinforcement learning spatiotemporal large language model |
|
|
| 7 |
Mamba time series forecasting with uncertainty quantification |
提出Mamba-ProbTSF,用于时间序列预测并量化预测不确定性 |
Mamba state space model |
✅ |
|
| 8 |
Policy Teaching via Data Poisoning in Learning from Human Preferences |
通过数据中毒攻击实现人类偏好的策略教学 |
reinforcement learning RLHF DPO |
|
|
| 9 |
Probabilistic Forecasting via Autoregressive Flow Matching |
提出FlowTime,一种基于自回归Flow Matching的概率时间序列预测模型 |
flow matching |
|
|
| 10 |
Collaborative Speculative Inference for Efficient LLM Inference Serving |
提出CoSine,通过协同推测加速LLM推理服务,提升资源利用率和吞吐量。 |
SSM large language model |
|
|
| 11 |
TacticExpert: Spatial-Temporal Graph Language Model for Basketball Tactics |
TacticExpert:提出时空图语言模型,用于篮球战术建模与预测。 |
contrastive learning large language model |
|
|
| 12 |
Accuracy of Discretely Sampled Stochastic Policies in Continuous-time Reinforcement Learning |
针对连续时间强化学习,提出离散采样随机策略的精度分析框架 |
reinforcement learning |
|
|
| 13 |
Inter-environmental world modeling for continuous and compositional dynamics |
提出基于李群作用的世界建模方法,用于连续组合动态环境下的通用智能体控制。 |
world model |
|
|
| 14 |
Fixed-Point RNNs: Interpolating from Diagonal to Dense |
提出基于定点RNN的序列建模方法,在效率和表达性之间取得平衡 |
Mamba SSM |
|
|
| 15 |
SortingEnv: An Extendable RL-Environment for an Industrial Sorting Process |
提出SortingEnv,用于优化工业分拣系统并研究智能体在演化环境中的行为。 |
reinforcement learning PPO |
|
|
| 16 |
Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution |
提出基于约束的自适应超图学习框架,端到端解决车辆路径问题 |
reinforcement learning representation learning |
|
|