cs.LG（2026-03-13）

📊 共 28 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (11 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (10) 支柱一：机器人控制 (Robot Control) (4) 支柱八：物理动画 (Physics-based Animation) (3)

🔬 支柱二：RL算法与架构 (RL & Architecture) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels	LeWorldModel：提出一种稳定的端到端像素级联合嵌入预测架构，用于学习世界模型。	world model world models JEPA
2	Representation Learning for Spatiotemporal Physical Systems	提出时空物理系统表征学习框架，评估自监督方法在物理参数估计中的有效性	representation learning spatiotemporal	✅
3	Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems	提出AnchorRec，通过锚定对齐解决多模态推荐系统中的位置坍塌问题	representation learning multimodal	✅
4	Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback	提出Swap引导的偏好学习SPL，解决个性化RLHF中的后验坍塌问题	reinforcement learning preference learning RLHF	✅
5	A Multi-task Large Reasoning Model for Molecular Science	提出多任务大模型，融合推理与分子知识，提升分子科学任务性能。	reinforcement learning foundation model chain-of-thought
6	Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs	提出DLDMF以解决参数化PDEs的泛化与时间外推问题	latent dynamics spatiotemporal
7	Maximizing Incremental Information Entropy for Contrastive Learning	提出IE-CL，通过最大化增量信息熵提升对比学习在小批量下的表征学习性能。	representation learning contrastive learning
8	PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses	PISmith：基于强化学习的提示注入防御红队评估框架	reinforcement learning	✅
9	Enhanced Drug-drug Interaction Prediction Using Adaptive Knowledge Integration	提出自适应知识融合框架，提升LLM在药物相互作用预测中的准确性	reinforcement learning large language model
10	Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages	提出基于熵引导步选择和逐步优势的强化学习方法，用于扩散语言模型后训练。	reinforcement learning	✅
11	A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric	在Cramér度量下，论文提出分布贝尔曼算子的谱分析新方法，为DRL提供理论基础。	reinforcement learning DRL

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity	提出跨层GPU异构性以降低多模态大语言模型推理成本	large language model multimodal
13	Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models	提出精确联邦持续卸载学习方法，用于冻结基座模型上的岭回归头	foundation model
14	Accelerating materials discovery using foundation model based In-context active learning	提出基于预训练模型的上下文主动学习方法ICAL，加速材料发现。	foundation model
15	TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning	TERMINATOR：学习思维链推理中提前停止的最优退出点，减少过度思考。	chain-of-thought
16	Taming the Long Tail: Efficient Item-wise Sharpness-Aware Minimization for LLM-based Recommender Systems	提出EISAM，解决LLM推荐系统中长尾问题，提升尾部物品推荐性能。	large language model instruction following
17	When Drafts Evolve: Speculative Decoding Meets Online Learning	提出OnlineSpec，通过在线学习持续优化草稿模型，加速推测解码。	large language model foundation model
18	BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning	提出BoSS：一种最佳策略选择器，作为深度主动学习的Oracle，提升大规模数据集上的性能。	foundation model
19	Breaking the Tuning Barrier: Zero-Hyperparameters Yield Multi-Corner Analysis Via Learned Priors	提出基于学习先验的零超参数方法，解决电路多角分析的调参难题。	foundation model
20	Design-Specification Tiling for ICL-based CAD Code Generation	提出设计规范平铺(DST)方法，提升ICL在CAD代码生成中的性能	large language model
21	LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing	LightMoE：通过专家替换减少MoE模型冗余，实现高效压缩	large language model

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
22	PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization	PhysMoDPO：基于偏好优化的物理可信的人形运动生成	humanoid humanoid robot whole-body control
23	FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control	FastDSAC：释放最大熵RL在高维人形控制中的潜力	humanoid humanoid control reinforcement learning
24	CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning	提出CALF框架以解决分布式强化学习中的通信延迟问题	sim-to-real reinforcement learning
25	Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics	揭示线性化注意力机制非收敛NTK动态及其影响可塑性的双重含义	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Graph In-Context Operator Networks for Generalizable Spatiotemporal Prediction	提出GICON，通过图神经网络和上下文学习提升时空预测泛化性	spatiotemporal
27	Competition-Aware CPC Forecasting with Near-Market Coverage	提出竞争感知CPC预测方法以解决市场波动问题	spatiotemporal foundation model
28	Adaptive Diffusion Posterior Sampling for Data and Model Fusion of Complex Nonlinear Dynamical Systems	提出自适应扩散后验采样方法，用于复杂非线性动力系统的数据与模型融合。	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页