cs.LG(2025-10-30)

📊 共 30 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (14 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (14 🔗3) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
1 RL-Exec: Impact-Aware Reinforcement Learning for Opportunistic Optimal Liquidation, Outperforms TWAP and a Book-Liquidity VWAP on BTC-USD Replays RL-Exec:基于强化学习的冲击感知型最优清算策略,优于TWAP和Book-Liquidity VWAP reinforcement learning PPO TAMP
2 ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems ReSpec:优化强化学习系统中推测解码的框架 reinforcement learning distillation large language model
3 Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning 探讨RLVR在数学推理中的局限性与改进方法 reinforcement learning reward design large language model
4 Offline Clustering of Preference Learning with Active-data Augmentation 提出Off-C$^2$PL和A$^2$-Off-C$^2$PL算法,解决离线偏好学习中的用户聚类和数据不平衡问题。 reinforcement learning preference learning
5 Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase Jasmine:一个简单、高性能且可扩展的基于JAX的世界模型代码库 world model
6 Defeating the Training-Inference Mismatch via FP16 使用FP16精度解决LLM强化学习微调中训练-推理不一致问题 reinforcement learning large language model
7 Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning 揭示策略学习中经验福利最大化与条件平均处理效应估计的等价性 policy learning
8 Data-Efficient RLVR via Off-Policy Influence Guidance 提出CROPI,利用离线影响函数指导RLVR数据选择,提升LLM推理能力。 reinforcement learning large language model
9 Co-Evolving Latent Action World Models 提出CoLA-World,通过协同进化学习潜在动作世界模型,提升视频生成质量和视觉规划能力。 world model
10 Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning 提出基于低频截断的自适应上下文长度优化MARL框架,解决长期依赖问题。 reinforcement learning
11 A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation 提出基于博弈论时空强化学习的公共资源协同分配框架 reinforcement learning
12 Efficient Generative AI Boosts Probabilistic Forecasting of Sudden Stratospheric Warmings 提出基于Flow Matching的生成式AI模型FM-Cast,高效预测平流层突发性增温 flow matching spatiotemporal
13 Clone Deterministic 3D Worlds 提出几何正则化世界模型(GRWM),用于高保真克隆确定性3D世界。 world model contrastive learning
14 Think Outside the Policy: In-Context Steered Policy Optimization 提出ICPO,利用上下文学习引导策略优化,提升大型推理模型在可验证奖励强化学习中的推理能力。 reinforcement learning reward shaping

🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)

#题目一句话要点标签🔗
15 Integrating Ontologies with Large Language Models for Enhanced Control Systems in Chemical Engineering 提出一种本体集成的大语言模型框架,用于增强化工控制系统的性能。 large language model
16 LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation LSM-MS2:用于桥接谱图识别与生物学解释的深度学习基础模型 foundation model
17 GeoPep: A geometry-aware masked language model for protein-peptide binding site prediction GeoPep:一种几何感知掩码语言模型,用于预测蛋白-肽结合位点 foundation model multimodal
18 ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models 提出ALMGuard,通过安全捷径激活和Mel梯度稀疏掩码防御音频-语言模型对抗攻击。 large language model multimodal
19 Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence Pelican-VL 1.0:用于具身智能的开源基础大脑模型 multimodal
20 Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification 预训练预测模型作为时间序列分类的强大零样本特征提取器 foundation model
21 Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving Loquetier:用于统一LLM微调和服务的虚拟化多LoRA框架 large language model
22 LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits LoRAQuant:面向LoRA的混合精度量化方法,实现超低比特量化 large language model
23 Polybasic Speculative Decoding Through a Theoretical Perspective 提出Polybasic推测解码框架,加速LLM推理并提供理论支撑。 large language model
24 LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection 利用LLM作为上下文元学习器进行模型和超参数选择 large language model
25 GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation 提出GraphKeeper,通过知识解耦与保持解决图领域增量学习中的灾难性遗忘问题。 foundation model
26 CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs 提出CAS-Spec,利用动态可切换推理加速策略加速LLM的无损推断。 large language model
27 Angular Steering: Behavior Control via Rotation in Activation Space 提出Angular Steering,通过激活空间旋转实现大语言模型行为控制。 large language model
28 LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline LLMBisect:利用比较分析流水线打破Bug二分查找的壁垒 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
29 Aeolus: A Multi-structural Flight Delay Dataset Aeolus:一个用于提升航班延误预测的多结构飞行延误数据集 spatiotemporal foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
30 Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space 提出基于采样的最优控制方法AISP,用于LLM的测试时对齐 model predictive control large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页