cs.LG（2026-04-27）

📊 共 24 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (10 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (8) 支柱一：机器人控制 (Robot Control) (3 🔗1) 支柱八：物理动画 (Physics-based Animation) (3)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents	AgenticCache：面向具身AI代理的缓存驱动异步规划框架	embodied AI large language model	✅
2	A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws	提出极限理论以理解基础模型中的涌现智能	foundation model
3	FlashOverlap: Minimizing Tail Latency in Communication Overlap for Distributed LLM Training	FlashOverlap：通过最小化尾部延迟优化分布式LLM训练中的通信重叠	large language model
4	Learning to Think from Multiple Thinkers	研究多思维链（CoT）学习，提出高效主动学习算法解决思维差异性难题	chain-of-thought
5	Few-Shot Cross-Device Transfer for Quantum Noise Modeling on Real Hardware	提出一种小样本跨设备迁移学习方法，用于量子噪声建模与误差缓解。	zero-shot transfer
6	Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment	提出Meta-Aligner，通过双向偏好-策略优化实现多目标LLM对齐	large language model
7	Explaining Temporal Graph Predictions With Shapley Values	提出基于Shapley值的时序图神经网络可解释性方法	TAMP
8	Coverage-Based Calibration for Post-Training Quantization via Weighted Set Cover over Outlier Channels	提出COVERCAL，通过加权集合覆盖优化离群通道，提升后训练量化校准效果。	large language model
9	Fix Initial Codes and Iteratively Refine Textual Directions Toward Safe Multi-Turn Code Correction	提出迭代优化文本方向的IRTD方法，用于安全多轮代码修正	large language model
10	Continual Calibration: Coverage Can Collapse Before Accuracy in Lifelong LLM Fine-Tuning	提出校准回放方法，解决终身学习LLM微调中覆盖率早于准确率崩溃的问题	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
11	BitRL: Reinforcement Learning with 1-bit Quantized Language Models for Resource-Constrained Edge Deployment	BitRL：利用1-bit量化语言模型实现资源受限边缘设备上的强化学习	reinforcement learning large language model
12	A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning	提出基于免奖励学习的多目标强化学习方法，提升策略学习效率和性能。	reinforcement learning policy learning
13	An Aircraft Upset Recovery System with Reinforcement Learning	提出基于强化学习的飞机姿态异常恢复系统，提升飞行安全	reinforcement learning SAC
14	An Automatic Ground Collision Avoidance System with Reinforcement Learning	提出基于强化学习的自动地面防撞系统，提升高级教练机的安全性与作战效能。	reinforcement learning
15	Perfecting Aircraft Maneuvers with Reinforcement Learning	利用强化学习优化飞机特技动作，辅助飞行员训练	reinforcement learning
16	TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents	提出TCOD，通过时序课程学习解决多轮自主Agent在线蒸馏中的KL不稳定性问题	distillation
17	GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility	提出GradMAP，通过梯度多智能体近端学习实现电网边缘灵活性控制。	reinforcement learning PPO
18	Model-Free Inference of Investor Preferences: A Relative Entropy IRL Approach	提出基于相对熵逆强化学习的投资者偏好推断方法，无需已知转移概率。	reinforcement learning inverse reinforcement learning

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Dual Control of Linear Systems from Bilinear Observations with Belief Space Model Predictive Control	提出基于信念空间模型预测控制的双线性观测线性系统双重控制方法	MPC model predictive control
20	SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning	SpecRLBench：用于评估基于线性时序逻辑的强化学习泛化能力的基准测试	manipulation reinforcement learning	✅
21	Leveraging Human Feedback for Semantically-Relevant Skill Discovery	提出语义相关技能发现(SRSD)，利用人类反馈提升强化学习技能发现的语义多样性和相关性。	locomotion reinforcement learning

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Task-guided Spatiotemporal Network with Diffusion Augmentation for EEG-based Dementia Diagnosis and MMSE Prediction	提出任务引导的时空网络，结合扩散增强，用于脑电图的老年痴呆症诊断和MMSE预测。	spatiotemporal
23	IMPA-Net: Meteorology-Aware Multi-Scale Attention and Dynamic Loss for Extreme Convective Radar Nowcasting	IMPA-Net：气象感知多尺度注意力与动态损失用于极端对流雷达临近预报	spatiotemporal
24	Multi-scale Dynamic Wake Modeling of Floating Offshore Wind Turbines via Fourier Neural Operators and Physics-Informed Neural Networks	利用傅里叶神经算子预测漂浮式海上风机多尺度动态尾流，实现实时控制与优化。	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页