cs.LG(2026-04-21)

📊 共 24 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (14) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (4 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
1 Distillation Traps and Guards: A Calibration Knob for LLM Distillability 提出基于强化学习微调的LLM蒸馏校准方法,实现可控的知识蒸馏与模型保护。 teacher-student distillation large language model
2 LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation LBLLM:通过三阶段蒸馏实现大语言模型的轻量级二值化 distillation large language model
3 Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning 提出GDMD,利用梯度强化学习指导扩散模型蒸馏,实现高质量少步生成。 reinforcement learning distillation
4 Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback 提出基于策略梯度原始-对偶方法的安全RLHF算法,解决无限时域约束下的安全强化学习问题 reinforcement learning RLHF large language model
5 Intentional Updates for Streaming Reinforcement Learning 提出Intentional Updates,解决流式强化学习中步长选择不稳问题 reinforcement learning deep reinforcement learning
6 Safe Continual Reinforcement Learning in Non-stationary Environments 针对非平稳环境,研究安全持续强化学习算法以平衡安全与遗忘问题 reinforcement learning
7 Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification 提出一种无标签自监督解耦表征学习框架,用于解决结构健康监测中运营变异性干扰下的损伤识别问题。 representation learning
8 LASER: Learning Active Sensing for Continuum Field Reconstruction 提出LASER框架,通过主动感知学习实现连续场的高精度重建 reinforcement learning world model world models
9 Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling Nexusformer:通过非线性注意力扩展实现Transformer的稳定和可继承缩放 linear attention
10 RL-ABC: Reinforcement Learning for Accelerator Beamline Control RL-ABC:基于强化学习的加速器束线控制框架,提升粒子传输效率。 reinforcement learning
11 Fine-Tuning Small Reasoning Models for Quantum Field Theory 微调小型推理模型解决量子场论问题,并开源数据与代码。 reinforcement learning large language model
12 EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training EVPO:通过解释方差自适应评论家利用,提升LLM后训练效果 reinforcement learning PPO
13 TACENR: Task-Agnostic Contrastive Explanations for Node Representations 提出TACENR,用于图节点表示的任务无关对比解释 representation learning contrastive learning
14 LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit 大型语言模型明知错误仍迎合用户:揭示共享的谄媚-谎言回路 RLHF DPO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
15 RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models RDP LoRA:基于几何驱动的大语言模型参数高效微调方法 large language model
16 Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention 提出基于随机注意力的科学基础模型校准方法,提升预测不确定性 foundation model
17 Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection 利用大语言模型生成混淆的XSS攻击载荷,并评估其对机器学习检测的有效性 large language model
18 Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors NodePFN:通过合成图先验学习节点分类的后验预测分布,实现跨图泛化 large language model
19 FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion FedProxy:通过代理SLM和异构感知融合实现LLM的联邦微调 large language model
20 Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees 提出DSR神经符号框架,通过操作符树结构化自动形式化过程,显著提升定理证明性能。 large language model

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
21 Accelerating trajectory optimization with Sobolev-trained diffusion policies 利用Sobolev训练的扩散策略加速轨迹优化 trajectory optimization imitation learning diffusion policy
22 Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning 提出LoRA结构稀疏正则化方法,提升离线强化学习Critic学习的稳定性和性能 locomotion reinforcement learning SAC
23 FASTER: Value-Guided Sampling for Fast RL FASTER:通过价值引导采样加速强化学习,降低扩散策略的计算成本。 manipulation reinforcement learning VLA
24 HardNet++: Nonlinear Constraint Enforcement in Neural Networks HardNet++:神经网络中基于非线性约束执行的通用方法 model predictive control

⬅️ 返回 cs.LG 首页 · 🏠 返回主页