cs.LG(2025-12-19)
📊 共 9 篇论文
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱八:物理动画 (Physics-based Animation) (3)
支柱九:具身大模型 (Embodied Foundation Models) (2)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Trust-Region Adaptive Policy Optimization | 提出TRAPO,交错SFT与RL优化LLM推理能力,显著提升数学推理性能。 | reinforcement learning large language model | ||
| 2 | Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning | 提出基于多智能体强化学习的电力市场长期设计评估框架,助力实现深度脱碳目标。 | reinforcement learning | ||
| 3 | AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens | AdvJudge-Zero:通过对抗控制令牌翻转LLM评判器的二元决策 | RLHF DPO | ||
| 4 | A Theoretical Analysis of State Similarity Between Markov Decision Processes | 提出广义双模拟度量GBSM,用于评估马尔可夫决策过程间的状态相似性。 | reinforcement learning representation learning |
🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | MINPO: Memory-Informed Neural Pseudo-Operator to Resolve Nonlocal Spatiotemporal Dynamics | 提出MINPO,利用记忆信息神经伪算子解决非局部时空动力学问题 | spatiotemporal | ||
| 6 | Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing | 提出基于非凸性控制和一步RSB消息传递的稀疏信号完美重构方法 | AMP | ||
| 7 | Learning solution operator of dynamical systems with diffusion maps kernel ridge regression | 提出基于扩散映射核岭回归(DM-KRR)的动力系统解算子学习方法,提升长期预测精度。 | spatiotemporal |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing | 提出FlashCodec和UnifiedServe,通过GPU内调度和资源共享加速多阶段MLLM推理。 | large language model multimodal | ||
| 9 | Weighted Stochastic Differential Equation to Implement Wasserstein-Fisher-Rao Gradient Flow | 提出基于加权随机微分方程的Wasserstein-Fisher-Rao梯度流方法,提升生成模型采样效率。 | multimodal |