cs.LG（2025-10-30）

📊 共 30 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (14 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (14 🔗3) 支柱八：物理动画 (Physics-based Animation) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	RL-Exec: Impact-Aware Reinforcement Learning for Opportunistic Optimal Liquidation, Outperforms TWAP and a Book-Liquidity VWAP on BTC-USD Replays	RL-Exec：基于强化学习的冲击感知型最优清算策略，优于TWAP和Book-Liquidity VWAP	reinforcement learning PPO TAMP
2	ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems	ReSpec：优化强化学习系统中推测解码的框架	reinforcement learning distillation large language model
3	Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning	探讨RLVR在数学推理中的局限性与改进方法	reinforcement learning reward design large language model	✅
4	Offline Clustering of Preference Learning with Active-data Augmentation	提出Off-C$^2$PL和A$^2$-Off-C$^2$PL算法，解决离线偏好学习中的用户聚类和数据不平衡问题。	reinforcement learning preference learning
5	Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase	Jasmine：一个简单、高性能且可扩展的基于JAX的世界模型代码库	world model
6	Defeating the Training-Inference Mismatch via FP16	使用FP16精度解决LLM强化学习微调中训练-推理不一致问题	reinforcement learning large language model
7	Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning	揭示策略学习中经验福利最大化与条件平均处理效应估计的等价性	policy learning
8	Data-Efficient RLVR via Off-Policy Influence Guidance	提出CROPI，利用离线影响函数指导RLVR数据选择，提升LLM推理能力。	reinforcement learning large language model
9	Co-Evolving Latent Action World Models	提出CoLA-World，通过协同进化学习潜在动作世界模型，提升视频生成质量和视觉规划能力。	world model
10	Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning	提出基于低频截断的自适应上下文长度优化MARL框架，解决长期依赖问题。	reinforcement learning
11	A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation	提出基于博弈论时空强化学习的公共资源协同分配框架	reinforcement learning
12	Efficient Generative AI Boosts Probabilistic Forecasting of Sudden Stratospheric Warmings	提出基于Flow Matching的生成式AI模型FM-Cast，高效预测平流层突发性增温	flow matching spatiotemporal
13	Clone Deterministic 3D Worlds	提出几何正则化世界模型(GRWM)，用于高保真克隆确定性3D世界。	world model contrastive learning
14	Think Outside the Policy: In-Context Steered Policy Optimization	提出ICPO，利用上下文学习引导策略优化，提升大型推理模型在可验证奖励强化学习中的推理能力。	reinforcement learning reward shaping

🔬 支柱九：具身大模型 (Embodied Foundation Models) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Integrating Ontologies with Large Language Models for Enhanced Control Systems in Chemical Engineering	提出一种本体集成的大语言模型框架，用于增强化工控制系统的性能。	large language model
16	LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation	LSM-MS2：用于桥接谱图识别与生物学解释的深度学习基础模型	foundation model
17	GeoPep: A geometry-aware masked language model for protein-peptide binding site prediction	GeoPep：一种几何感知掩码语言模型，用于预测蛋白-肽结合位点	foundation model multimodal
18	ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models	提出ALMGuard，通过安全捷径激活和Mel梯度稀疏掩码防御音频-语言模型对抗攻击。	large language model multimodal	✅
19	Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence	Pelican-VL 1.0：用于具身智能的开源基础大脑模型	multimodal
20	Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification	预训练预测模型作为时间序列分类的强大零样本特征提取器	foundation model
21	Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving	Loquetier：用于统一LLM微调和服务的虚拟化多LoRA框架	large language model	✅
22	LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits	LoRAQuant：面向LoRA的混合精度量化方法，实现超低比特量化	large language model
23	Polybasic Speculative Decoding Through a Theoretical Perspective	提出Polybasic推测解码框架，加速LLM推理并提供理论支撑。	large language model
24	LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection	利用LLM作为上下文元学习器进行模型和超参数选择	large language model
25	GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation	提出GraphKeeper，通过知识解耦与保持解决图领域增量学习中的灾难性遗忘问题。	foundation model
26	CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs	提出CAS-Spec，利用动态可切换推理加速策略加速LLM的无损推断。	large language model
27	Angular Steering: Behavior Control via Rotation in Activation Space	提出Angular Steering，通过激活空间旋转实现大语言模型行为控制。	large language model	✅
28	LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline	LLMBisect：利用比较分析流水线打破Bug二分查找的壁垒	large language model

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	Aeolus: A Multi-structural Flight Delay Dataset	Aeolus：一个用于提升航班延误预测的多结构飞行延误数据集	spatiotemporal foundation model	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
30	Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space	提出基于采样的最优控制方法AISP，用于LLM的测试时对齐	model predictive control large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页