| 1 |
Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings |
提出CHARM,利用多模态JEPA学习语义时序嵌入,提升异构时序数据建模能力。 |
JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture |
|
|
| 2 |
Subspace-Decomposed JEPAs: Disentangling Progression and Content in Latent World Models |
提出SD-JEPA以解决任务进展与内容编码分离问题 |
world model world models JEPA |
|
|
| 3 |
A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models |
离线强化学习与逆强化学习综述:统一动态离散选择模型与熵正则化逆强化学习。 |
reinforcement learning offline RL inverse reinforcement learning |
|
|
| 4 |
Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach |
提出可行奖励集方法以解决逆强化学习中的演示者不完美问题 |
reinforcement learning inverse reinforcement learning large language model |
|
|
| 5 |
GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring |
GlucoFM:用于连续血糖监测的双流基础模型,提升代谢预测性能。 |
representation learning foundation model |
|
|
| 6 |
Student Capacity Moderates Knowledge Distillation Effectiveness: A Systematic Study Across ResNet Teacher-Student Pairs on CIFAR-10 |
研究学生网络容量对ResNet图像分类知识蒸馏效果的影响,揭示容量匹配的重要性。 |
teacher-student distillation |
|
|
| 7 |
Effective Biological Representation Learning by Masking Gene Expression |
TxFM:通过掩码基因表达实现有效的生物表征学习 |
representation learning foundation model |
|
|
| 8 |
The Terminal Representation in Reinforcement Learning |
提出终端表征(TR),一种无需特征分解且低维度的强化学习状态表征方法。 |
reinforcement learning representation learning reward shaping |
|
|
| 9 |
EchoRL: Reinforcement Learning via Rollout Echoing |
EchoRL:通过回声式Rollout增强强化学习,解决奖励退化问题。 |
reinforcement learning large language model |
|
|
| 10 |
UniRTL: Unifying Code and Graph for Robust RTL Representation Learning |
UniRTL:融合代码与图结构的鲁棒RTL表示学习框架 |
representation learning multimodal |
|
|
| 11 |
Automating Formal Verification with Reinforcement Learning and Recursive Inference |
利用强化学习和递归推理自动化形式化验证程序生成与证明 |
reinforcement learning large language model |
|
|
| 12 |
Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning |
提出线性滤波器,解决部分可观测强化学习中线性循环记忆网络的理论有效性问题 |
reinforcement learning policy learning |
|
|
| 13 |
Multivariate Distributional Reinforcement Learning Using Sliced Divergences |
提出基于切片散度的多元分布强化学习方法,解决高维回报分布建模难题 |
reinforcement learning DRL |
|
|
| 14 |
Convergence of Two-Timescale Markovian Stochastic Approximations with Applications in Reinforcement Learning |
提出两时间尺度马尔可夫随机逼近以解决强化学习中的收敛问题 |
reinforcement learning policy learning |
|
|
| 15 |
The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems |
针对工业能源系统控制,分析强化学习在现实部署中的挑战 |
reinforcement learning reward design |
|
|
| 16 |
Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences |
提出联邦变分偏好对齐框架以解决用户偏好冲突问题 |
preference learning RLHF large language model |
|
|
| 17 |
Skill Reuse as Compression in Agentic RL |
提出ReuseRL,通过技能复用压缩提升Agentic RL的泛化能力 |
reinforcement learning large language model |
|
|
| 18 |
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization |
DRIFT:解耦Rollout与重要性加权微调,提升多轮交互优化效率 |
reinforcement learning large language model |
✅ |
|
| 19 |
Generalized Intention Modeling in Multi-Agent Reinforcement Learning |
提出任务自适应的混合意图建模框架,提升多智能体强化学习性能 |
reinforcement learning |
|
|
| 20 |
Trust-Region Behavior Blending for On-Policy Distillation |
提出Trust-Region Behavior Blending,提升On-policy蒸馏的早期训练效果 |
distillation |
|
|
| 21 |
De-attribute to Forget for LLM Unlearning |
提出DareU框架,通过数据归因奖励的强化学习实现LLM的有效解学习。 |
reinforcement learning large language model |
|
|
| 22 |
DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning |
DARTS:面向LLM强化学习,通过分布感知的主动Rollout轨迹塑造加速训练 |
reinforcement learning |
|
|
| 23 |
Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning |
DUAL:高效且具有不确定性感知的扩散框架,用于离线到在线强化学习 |
reinforcement learning |
|
|
| 24 |
When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks? |
PromptPO:利用LLM作为黑盒优化器解决序列RL任务 |
reinforcement learning large language model |
|
|
| 25 |
Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization |
研究Transformer注意力头学习动态,揭示位置编码与符号推理的泛化能力差异 |
world model world models |
|
|
| 26 |
Memory by Design: Probabilistic Sequence Layers |
提出设计模型框架,通过显式记忆假设推导高效循环序列映射。 |
Mamba linear attention |
|
|
| 27 |
Learning Hyperspherical Time-Frequency Representations for Time-Series Out-of-Distribution Detection |
提出基于超球面时间-频率表征的时间序列分布外检测方法 |
representation learning contrastive learning |
✅ |
|