| 1 |
Canonicalizing Multimodal Contrastive Representation Learning |
提出正交映射以实现多模态对比表示学习的统一性 |
representation learning multimodal |
|
|
| 2 |
Spatio-temporal dual-stage hypergraph MARL for human-centric multimodal corridor traffic signal control |
提出STDSH-MARL以解决多模态交通信号控制问题 |
reinforcement learning deep reinforcement learning multimodal |
|
|
| 3 |
SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer |
SMAC:通过分数匹配的Actor-Critic方法实现鲁棒的离线到在线迁移 |
reinforcement learning TD3 offline RL |
|
|
| 4 |
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy |
提出2Mamba,通过简化和改进Mamba-2,在长文本建模中实现精度与效率的平衡。 |
Mamba linear attention |
|
|
| 5 |
LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy |
提出LexiSafe框架以解决离线安全强化学习中的安全问题 |
reinforcement learning offline RL |
|
|
| 6 |
Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning |
提出最优无约束自蒸馏方法以提升岭回归性能 |
distillation |
|
|
| 7 |
A Theoretical Framework for Modular Learning of Robust Generative Models |
提出模块化生成模型训练框架,提升LLM在混合数据上的鲁棒性与效率 |
distillation large language model |
|
|
| 8 |
MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning |
MASPO:统一梯度利用、概率质量和信号可靠性的LLM鲁棒推理与高效采样 |
reinforcement learning large language model |
|
|
| 9 |
RLGT: A reinforcement learning framework for extremal graph theory |
提出RLGT框架,系统化图论极值问题,提升强化学习求解效率。 |
reinforcement learning |
|
|
| 10 |
TIFO: Time-Invariant Frequency Operator for Stationarity-Aware Representation Learning in Time Series |
提出时不变频率算子TIFO,解决非平稳时间序列预测中的分布偏移问题。 |
representation learning |
|
|
| 11 |
Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning |
提出行动图策略以解决多智能体强化学习中的协调问题 |
reinforcement learning |
|
|
| 12 |
VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation |
VP-VAE:通过自适应向量扰动改进向量量化变分自编码器 |
representation learning VQ-VAE |
|
|
| 13 |
MARS: Margin-Aware Reward-Modeling with Self-Refinement |
提出MARS以解决奖励模型训练中的不确定性问题 |
PPO RLHF |
|
|