| 1 |
Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing |
Decision MetaMamba:异构序列混合增强离线强化学习中的选择性SSM |
offline RL Mamba SSM |
|
|
| 2 |
Uncertainty-Aware Rank-One MIMO Q Network Framework for Accelerated Offline Reinforcement Learning |
提出不确定性感知的Rank-One MIMO Q网络,加速离线强化学习并提升性能。 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 3 |
RAmmStein: Regime Adaptation in Mean-reverting Markets with Stein Thresholds -- Optimal Impulse Control in Concentrated AMMs |
RAmmStein:基于Stein阈值和均值回复的集中式AMM机制自适应 |
reinforcement learning deep reinforcement learning PULSE |
|
|
| 4 |
Advantage-based Temporal Attack in Reinforcement Learning |
提出基于优势函数的时序对抗Transformer,提升强化学习模型的鲁棒性 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 5 |
LAD: Learning Advantage Distribution for Reasoning |
提出LAD:通过学习优势分布提升大模型推理能力,增强多样性 |
reinforcement learning multimodal |
|
|
| 6 |
On the Equivalence of Random Network Distillation, Deep Ensembles, and Bayesian Inference |
揭示随机网络蒸馏、深度集成和贝叶斯推断的等价性,用于高效不确定性量化。 |
distillation |
|
|
| 7 |
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning |
提出DSDR双尺度多样性正则化框架,提升LLM推理中基于强化学习的探索能力 |
reinforcement learning large language model |
✅ |
|
| 8 |
Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning |
提出基于表征学习的孟德尔随机化方法,解决工具变量-结果混淆问题 |
representation learning |
|
|
| 9 |
Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation |
提出基于伪标签和知识蒸馏的自动和弦识别增强方法 |
distillation |
|
|
| 10 |
SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning |
提出SenTSR-Bench,通过知识注入增强时序数据诊断推理能力 |
reinforcement learning large language model |
|
|
| 11 |
Federated Causal Representation Learning in State-Space Systems for Decentralized Counterfactual Reasoning |
提出联邦因果表征学习框架,解决工业互联系统中分散反事实推理难题 |
representation learning |
|
|
| 12 |
Sparse Masked Attention Policies for Reliable Generalization |
提出稀疏掩码注意力策略,提升强化学习策略的泛化可靠性 |
reinforcement learning PPO |
|
|