cs.LG（2026-04-21）

📊 共 24 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (14) 支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱一：机器人控制 (Robot Control) (4 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Distillation Traps and Guards: A Calibration Knob for LLM Distillability	提出基于强化学习微调的LLM蒸馏校准方法，实现可控的知识蒸馏与模型保护。	teacher-student distillation large language model
2	LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation	LBLLM：通过三阶段蒸馏实现大语言模型的轻量级二值化	distillation large language model
3	Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning	提出GDMD，利用梯度强化学习指导扩散模型蒸馏，实现高质量少步生成。	reinforcement learning distillation
4	Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback	提出基于策略梯度原始-对偶方法的安全RLHF算法，解决无限时域约束下的安全强化学习问题	reinforcement learning RLHF large language model
5	Intentional Updates for Streaming Reinforcement Learning	提出Intentional Updates，解决流式强化学习中步长选择不稳问题	reinforcement learning deep reinforcement learning
6	Safe Continual Reinforcement Learning in Non-stationary Environments	针对非平稳环境，研究安全持续强化学习算法以平衡安全与遗忘问题	reinforcement learning
7	Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification	提出一种无标签自监督解耦表征学习框架，用于解决结构健康监测中运营变异性干扰下的损伤识别问题。	representation learning
8	LASER: Learning Active Sensing for Continuum Field Reconstruction	提出LASER框架，通过主动感知学习实现连续场的高精度重建	reinforcement learning world model world models
9	Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling	Nexusformer：通过非线性注意力扩展实现Transformer的稳定和可继承缩放	linear attention
10	RL-ABC: Reinforcement Learning for Accelerator Beamline Control	RL-ABC：基于强化学习的加速器束线控制框架，提升粒子传输效率。	reinforcement learning
11	Fine-Tuning Small Reasoning Models for Quantum Field Theory	微调小型推理模型解决量子场论问题，并开源数据与代码。	reinforcement learning large language model
12	EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training	EVPO：通过解释方差自适应评论家利用，提升LLM后训练效果	reinforcement learning PPO
13	TACENR: Task-Agnostic Contrastive Explanations for Node Representations	提出TACENR，用于图节点表示的任务无关对比解释	representation learning contrastive learning
14	LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit	大型语言模型明知错误仍迎合用户：揭示共享的谄媚-谎言回路	RLHF DPO

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
15	RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models	RDP LoRA：基于几何驱动的大语言模型参数高效微调方法	large language model
16	Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention	提出基于随机注意力的科学基础模型校准方法，提升预测不确定性	foundation model
17	Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection	利用大语言模型生成混淆的XSS攻击载荷，并评估其对机器学习检测的有效性	large language model
18	Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors	NodePFN：通过合成图先验学习节点分类的后验预测分布，实现跨图泛化	large language model
19	FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion	FedProxy：通过代理SLM和异构感知融合实现LLM的联邦微调	large language model
20	Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees	提出DSR神经符号框架，通过操作符树结构化自动形式化过程，显著提升定理证明性能。	large language model

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Accelerating trajectory optimization with Sobolev-trained diffusion policies	利用Sobolev训练的扩散策略加速轨迹优化	trajectory optimization imitation learning diffusion policy
22	Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning	提出LoRA结构稀疏正则化方法，提升离线强化学习Critic学习的稳定性和性能	locomotion reinforcement learning SAC
23	FASTER: Value-Guided Sampling for Fast RL	FASTER：通过价值引导采样加速强化学习，降低扩散策略的计算成本。	manipulation reinforcement learning VLA	✅
24	HardNet++: Nonlinear Constraint Enforcement in Neural Networks	HardNet++：神经网络中基于非线性约束执行的通用方法	model predictive control

⬅️ 返回 cs.LG 首页 · 🏠 返回主页