| 1 |
A Multimodal XAI Framework for Trustworthy CNNs and Bias Detection in Deep Representation Learning |
提出多模态XAI框架,用于提升CNN可信度并检测深度表征学习中的偏见 |
representation learning multimodal |
|
|
| 2 |
Expert or not? assessing data quality in offline reinforcement learning |
提出Bellman Wasserstein距离(BWD)用于评估离线强化学习数据集质量 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 3 |
Deep SPI: Safe Policy Improvement via World Models |
提出DeepSPI算法,通过世界模型实现安全策略改进,提升在线强化学习性能 |
reinforcement learning PPO offline RL |
|
|
| 4 |
Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning |
提出模块化动态稀疏训练框架MST,提升深度强化学习模型的可扩展性。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 5 |
Shielded RecRL: Explanation Generation for Recommender Systems without Ranking Degradation |
提出Shielded RecRL,在不降低排序性能的前提下为推荐系统生成解释 |
reinforcement learning PPO RLHF |
|
|
| 6 |
Stratos: An End-to-End Distillation Pipeline for Customized LLMs under Distributed Cloud Environments |
Stratos:分布式云环境下定制化LLM端到端蒸馏流水线 |
teacher-student distillation large language model |
|
|
| 7 |
GraphShaper: Geometry-aware Alignment for Improving Transfer Learning in Text-Attributed Graphs |
提出GraphShaper以解决图结构多样性导致的迁移学习问题 |
contrastive learning large language model foundation model |
|
|
| 8 |
K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding |
提出K-frames:一种场景驱动的任意数量关键帧选择方法,用于长视频理解。 |
reinforcement learning large language model multimodal |
|
|
| 9 |
Self-Verifying Reflection Helps Transformers with CoT Reasoning |
提出自验证反思框架,提升小型Transformer在CoT推理中的性能 |
reinforcement learning large language model chain-of-thought |
|
|
| 10 |
Escaping Local Optima in the Waddington Landscape: A Two-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis |
提出一种两阶段TRPO-PPO算法,用于单细胞扰动分析中逃离Waddington景观局部最优。 |
reinforcement learning PPO |
|
|
| 11 |
MEASURE: Multi-scale Minimal Sufficient Representation Learning for Domain Generalization in Sleep Staging |
提出MEASURE框架,通过多尺度最小充分表征学习提升睡眠分期领域泛化能力。 |
representation learning contrastive learning |
✅ |
|
| 12 |
Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning |
提出剪枝策略,提升强化学习在对抗环境下的鲁棒性并进行理论保证。 |
reinforcement learning |
|
|
| 13 |
Laminar: A Scalable Asynchronous RL Post-Training Framework |
Laminar:一种可扩展的异步RL后训练框架,解决GPU利用率低的问题。 |
reinforcement learning large language model |
|
|
| 14 |
Rethinking Knowledge Distillation: A Data Dependent Regulariser With a Negative Asymmetric Payoff |
重新审视知识蒸馏:一种具有负非对称收益的数据依赖正则化方法 |
distillation |
|
|
| 15 |
Finite-time Convergence Analysis of Actor-Critic with Evolving Reward |
提出有限时间收敛分析以解决动态奖励问题 |
reinforcement learning curriculum learning reward shaping |
|
|
| 16 |
Heterogeneous RBCs via deep multi-agent reinforcement learning |
提出MARL-BC框架,结合深度多智能体强化学习与RBC模型,模拟异质性宏观经济。 |
reinforcement learning |
|
|
| 17 |
Diffusion Models for Reinforcement Learning: Foundations, Taxonomy, and Development |
综述扩散模型在强化学习中的应用:理论基础、分类与发展 |
reinforcement learning |
✅ |
|
| 18 |
Chimera: State Space Models Beyond Sequences |
Chimera:提出一种超越序列建模的状态空间模型,统一处理不同拓扑结构的数据。 |
state space model |
|
|
| 19 |
Can GRPO Help LLMs Transcend Their Pretraining Origin? |
研究表明GRPO对LLM的增强受限于预训练偏差,仅能微调而非创造新能力 |
reinforcement learning large language model |
|
|
| 20 |
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning |
提出Mamba模型以解决低维目标的上下文学习问题 |
Mamba |
|
|
| 21 |
Towards Fast Coarse-graining and Equation Discovery with Foundation Inference Models |
利用预训练Foundation Inference Models加速粗粒化和方程发现 |
latent dynamics representation learning |
|
|