cs.LG(2025-09-18)

📊 共 13 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (7 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (3) 支柱八:物理动画 (Physics-based Animation) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
1 Self-Improving Embodied Foundation Models 提出一种自提升具身基础模型方法,用于机器人自主技能学习与泛化。 reinforcement learning imitation learning large language model
2 Exploring multimodal implicit behavior learning for vehicle navigation in simulated cities 提出数据增强隐式行为克隆,解决城市车辆导航中的多模态决策问题 behavior cloning multimodal
3 Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning Fleming-R1:通过强化学习实现专家级医学推理 reinforcement learning large language model chain-of-thought
4 FlowRL: Matching Reward Distributions for LLM Reasoning FlowRL:通过匹配奖励分布提升LLM推理能力,解决过优化问题。 reinforcement learning PPO large language model
5 ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning 提出DSCL框架,通过双重动态采样与课程学习提升RL工具学习效率。 reinforcement learning curriculum learning
6 Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation EVOL-RL:一种无标签自进化语言模型框架,通过多数投票选择和新颖性驱动变异实现模型提升。 reinforcement learning large language model
7 Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning 提出数据重写框架,解决SFT中Off-Policy学习的分布偏移问题 policy learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
8 Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs 提出EvoReasoner和EvoKG,增强LLM在时序知识图谱上的推理能力。 large language model
9 CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs 提出CoopQ以解决LLMs低资源部署中的混合精度量化问题 large language model
10 Predicting Language Models' Success at Zero-Shot Probabilistic Prediction 研究LLM在零样本概率预测中的性能,并提出无标签指标预测LLM在特定任务上的表现。 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
11 Solar Forecasting with Causality: A Graph-Transformer Approach to Spatiotemporal Dependencies SolarCAST:利用因果图Transformer预测太阳辐射,提升可再生能源管理 spatiotemporal multimodal
12 Accurate typhoon intensity forecasts using a non-iterative spatiotemporal transformer model 提出TIFNet,一种非迭代时空Transformer模型,显著提升台风强度预测精度。 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
13 Diffusion-Based Scenario Tree Generation for Multivariate Time Series Prediction and Multistage Stochastic Optimization 提出基于扩散模型的场景树生成框架DST,用于多元时间序列预测和多阶段随机优化。 MPC reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页