cs.LG（2026-05-07）

📊 共 95 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (42 🔗7) 支柱二：RL算法与架构 (RL & Architecture) (42 🔗3) 支柱一：机器人控制 (Robot Control) (5 🔗1) 支柱八：物理动画 (Physics-based Animation) (4) 支柱四：生成式动作 (Generative Motion) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (42 篇)

#	题目	一句话要点	标签	🔗
1	Crafting Reversible SFT Behaviors in Large Language Models	提出LCDD与SFT-Eraser，实现大语言模型SFT行为的可逆稀疏化控制	large language model
2	Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models	提出Agentic AI范式，解决大模型在分布外泛化中的固有局限性	foundation model
3	Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models	揭示表格Transformer模型推理冗余，提出单层循环模型实现性能媲美。	foundation model	✅
4	MINER: Mining Multimodal Internal Representation for Efficient Retrieval	提出MINER，挖掘多模态内部表征，实现高效视觉文档检索	multimodal
5	Band Together: Untargeted Adversarial Training with Multimodal Coordination against Evasion-based Promotion Attacks	提出UAT-MC，通过多模态协同对抗训练提升推荐系统抵御规避式攻击的能力。	multimodal	✅
6	Federation of Experts: Communication Efficient Distributed Inference for Large Language Models	提出专家联邦（FoE）架构，提升大规模语言模型分布式推理的通信效率。	large language model
7	Towards Generation-Efficient Uncertainty Estimation in Large Language Models	提出高效不确定性估计方法，减少大语言模型生成需求，加速不可靠响应识别。	large language model
8	TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models	提出TFM-Retouche，一种轻量级的表格数据输入空间适配器，提升表格基础模型的性能。	foundation model
9	TabCF: Distributional Control Function Estimation with Tabular Foundation Models	TabCF：利用表格型基础模型进行分布控制函数估计，实现高效因果推断	foundation model	✅
10	Verifier-Backed Hard Problem Generation for Mathematical Reasoning	提出VHG框架，通过验证器增强的自博弈生成高质量数学推理难题	large language model
11	Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less	优化器-模型一致性：全量微调使用与预训练相同的优化器能减少遗忘	large language model
12	When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds	基于ℓ₁范数下界理论，揭示SignSGD优于SGD的条件与原因	foundation model
13	The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity	揭示LLM注意力黑洞的结构性根源，提出head-wise RMSNorm加速收敛。	large language model
14	SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders	提出SoftSAE，通过动态Top-K选择机制实现自适应稀疏自编码器，提升表征能力。	large language model
15	How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation	提出动态预算分配方法以优化多轮LLM评估	large language model
16	Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization	提出权重衰减机制以优化Transformer损失景观	large language model
17	FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning	FedAttr：联邦LLM微调中面向隐私保护的客户端归因方法	large language model
18	PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization	提出PACZero以解决大语言模型隐私保护与效用平衡问题	large language model
19	Invariant-Based Diagnostics for Graph Benchmarks	提出基于图不变性的诊断框架，评估图神经网络是否真正学习图结构。	foundation model
20	SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask	SparseForge：通过Hessian引导的软掩码退火实现高效的半结构化LLM稀疏化	large language model
21	Preliminary Insights in Chronos Frequency Data Understanding and Reconstruction	初步分析Chronos模型对频域信息的理解与重构能力	foundation model
22	Teaching LLMs Program Semantics via Symbolic Execution Traces	利用符号执行轨迹教导LLM理解程序语义，显著提升程序验证任务中的缺陷检测能力。	chain-of-thought
23	One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning	DualSFT：一种用于LLM微调中参数和数据联合选择的双重评分算法	large language model
24	BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification	提出BoostLLM，通过Boosting思想微调LLM，提升小样本表格分类性能。	large language model
25	Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards	提出矩阵解耦浓度方法以解决自回归序列的稀疏奖励问题	large language model
26	Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions	提出Prompt-only SV，通过联合训练steering factor和方向，在不牺牲生成质量的前提下引导LLM行为。	large language model
27	A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning	建立自回归思维链学习的在线学习理论，揭示思维链对降低错误界限的关键作用	large language model chain-of-thought
28	A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis	提出一种基于提示词工程的可复现优化协议，以提升大语言模型在证据合成任务中的表现。	large language model
29	McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware	提出McNdroid基准数据集，通过多模态融合解决Android恶意软件检测中的概念漂移问题。	multimodal
30	Hypothesis generation and updating in large language models	通过数论游戏探究大语言模型的假设生成与更新机制及其贝叶斯推理偏差	large language model
31	LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites	提出LLMSpace碳足迹建模框架，量化低地球轨道（LEO）卫星部署大模型推理的碳排放。	large language model	✅
32	SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders	提出SoftSAE：一种基于可微Soft Top-K机制的自适应稀疏自编码器，实现输入依赖的动态特征选择。	large language model	✅
33	Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks	Delulu：经验证的多语言代码幻觉检测基准，用于中间填充任务	large language model	✅
34	Response Time Enhances Alignment with Heterogeneous Preferences	引入响应时间信号以解决异构偏好下的LLM对齐偏差问题	large language model
35	One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators	提出神经算子学习条件概率，解决不确定性建模中的条件概率泛化问题	foundation model
36	Dataset Watermarking for Closed LLMs with Provable Detection	提出首个针对闭源大模型的数据集水印方法，实现可证明的训练数据溯源检测	large language model
37	Conformal Agent Error Attribution	提出基于共形预测的智能体错误归因框架，实现多智能体系统故障的精准定位与自动回滚。	large language model	✅
38	Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA	提出GLoRA：一种面向联邦LoRA的规范不变低秩聚合框架，解决参数聚合中的语义失配问题。	large language model
39	VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading	提出VisMMoE系统，通过视觉-专家亲和性优化实现高效视觉-语言混合专家模型卸载	multimodal
40	Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio	提出基于信噪比（SNR）的模块化学习率缩放方法（MoLS），解决大模型训练中的梯度噪声不平衡问题。	large language model
41	CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning	提出CRAFT框架：通过基于遗忘感知的表示空间干预实现大语言模型持续学习	large language model
42	On the Blessing of Pre-training in Weak-to-Strong Generalization	揭示弱监督向强模型泛化（W2SG）的本质：预训练作为几何暖启动的关键作用	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (42 篇)

#	题目	一句话要点	标签	🔗
43	Entropy-Regularized Adjoint Matching for Offline RL	提出最大熵伴随匹配(ME-AM)方法，解决离线强化学习中的流行度偏差和支持绑定问题。	reinforcement learning offline RL offline reinforcement learning
44	Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark	提出MTG-Causal-RL基准，用于评估复杂卡牌游戏中因果强化学习算法	reinforcement learning PPO world model
45	Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning	提出自适应Q-分块（AQC）方法，解决离线到在线强化学习中动作分块尺寸固定的问题。	reinforcement learning VLA
46	On the Safety of Graph Representation Learning	提出GRL-Safety图表示学习安全评估基准，揭示现有方法在部署压力下的可靠性问题。	representation learning foundation model	✅
47	SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation	SNAPO：通过可微仿真实现最优控制的平滑神经伴随策略优化	reinforcement learning differentiable simulation
48	A Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignment	提出Pair-GRPO家族，通过显隐偏好约束提升RLHF对齐的稳定性和泛化性	reinforcement learning preference learning RLHF
49	A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions	提出FP-FM算法，通过函数投影实现生成模型对未知分布的少样本快速适应	flow matching language conditioned
50	Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer	SlimDT：通过序列建模外部注入条件信息，提升Decision Transformer效率与性能	reinforcement learning offline reinforcement learning decision transformer
51	Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level	提出非对称On-Policy蒸馏(AOPD)，提升数学推理任务中token级别模仿学习效果。	reinforcement learning distillation
52	Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems	提出一致性蒸馏流动匹配方法，加速高精度动力系统物理场重建。	flow matching distillation
53	Dynamic Treatment on Networks	提出Q-Ising框架，解决网络中动态干预策略的优化问题	reinforcement learning offline RL offline reinforcement learning
54	Operator-Guided Invariance Learning for Continuous Reinforcement Learning	提出VPSD-RL，通过算子引导的不变性学习提升连续强化学习的数据效率和鲁棒性。	reinforcement learning
55	Flow Matching with Arbitrary Auxiliary Paths	提出AuxPath-FM，通过任意辅助路径扩展Flow Matching生成模型	flow matching
56	Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex	提出列表式策略优化LPO，提升LLM推理能力并保证优化稳定性和响应多样性	reinforcement learning large language model
57	PRISM: Iterative Cross-Modal Posterior Refinement for Dynamic Text-Attributed Graphs	提出PRISM框架，通过迭代跨模态后验精炼提升动态文本属性图表示学习。	representation learning multimodal
58	Normalized Architectures are Natively 4-Bit	提出nGPT架构，原生支持4比特量化训练，提升大模型效率。	Mamba large language model	✅
59	Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference	Feather：通过强化学习优化LLM推理中批大小与前缀同质性的调度器	reinforcement learning large language model
60	Soft Deterministic Policy Gradient with Gaussian Smoothing	提出基于高斯平滑的软确定性策略梯度(Soft-DPG)，解决稀疏奖励下的策略梯度不稳定问题	reinforcement learning deep reinforcement learning
61	Optimal Transport for LLM Reward Modeling from Noisy Preference	提出SelectiveRM框架，利用最优传输解决LLM奖励建模中噪声偏好问题	reinforcement learning RLHF
62	How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment	提出影子掩码蒸馏（Shadow Mask Distillation）方法，解决强化学习后训练中KV缓存压缩导致的策略偏差问题。	reinforcement learning PPO RLHF
63	Offline Reinforcement Learning for Rotation Profile Control in Tokamaks	提出基于离线强化学习的托卡马克等离子体旋转剖面控制方法	reinforcement learning offline RL offline reinforcement learning
64	Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings	提出C3PO因果感知基础模型，解决离散选择环境下的双层价格优化问题	imitation learning foundation model
65	Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning	提出最大熵伴随匹配（ME-AM）框架，解决离线强化学习中的流行度偏差与支持集限制问题。	reinforcement learning offline reinforcement learning flow matching
66	$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses	提出基于通用f-散度正则化的在线RLHF理论框架，实现最优遗憾界与收敛性分析	reinforcement learning RLHF large language model
67	MDN: Parallelizing Stepwise Momentum for Delta Linear Attention	提出Momentum DeltaNet (MDN)，通过分块并行动量机制优化线性注意力模型	Mamba linear attention large language model	✅
68	Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs	提出掩码奖励行为树（MRBT）框架，结合LLM与神经符号强化学习实现组合任务的高效求解	reinforcement learning reward shaping large language model
69	Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using LLM Judges with Closed-Loop Reinforcement Learning Feedback	提出基于LLM判别器与闭环强化学习的智能体股票预测行为评估框架	reinforcement learning SAC large language model
70	Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators	提出Echo架构：利用谱Koopman算子实现无KV缓存的关联记忆检索	Mamba SSM chain-of-thought
71	Revisiting Adam for Streaming Reinforcement Learning	重审流式强化学习中的Adam优化器：提出Adaptive Q(λ)以实现高效在线学习	reinforcement learning deep reinforcement learning
72	Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level	提出非对称在线策略蒸馏（AOPD）方法，通过令牌级反馈优化解决强化学习中的训练瓶颈	reinforcement learning distillation
73	Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing	提出Near-Policy Distillation，加速自回归模型知识蒸馏，缓解分布不匹配问题。	reinforcement learning distillation
74	RepFlow: Representation Enhanced Flow Matching for Causal Effect Estimation	提出RepFlow框架，通过表征增强与条件流匹配实现因果效应估计	flow matching representation learning
75	AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling	提出AeroJEPA架构，通过联合嵌入预测实现可扩展的3D空气动力学场建模与语义表征学习。	Joint-Embedding Predictive Architecture joint-embedding predictive architecture latent optimization
76	A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models	提出统一的生成模型框架以解析扩散与流匹配问题	flow matching
77	Towards Differentially Private Reinforcement Learning with General Function Approximation	提出首个基于通用函数逼近的差分隐私在线强化学习理论框架	reinforcement learning
78	Adaptive Memory Decay for Log-Linear Attention	提出自适应记忆衰减机制，优化对数线性注意力模型的长程上下文建模能力	linear attention
79	Physics-Based Flow Matching for Full-Field Prediction of Silicon Photonic Devices	提出PIC-Flow生成式神经代理模型，通过物理约束流匹配实现硅光子器件全场电磁场预测。	flow matching
80	Gradient Extrapolation-Based Policy Optimization	提出梯度外推策略优化（GXPO），通过高效梯度预测提升大模型推理强化学习性能	reinforcement learning large language model
81	Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR	提出选择性资格迹（S-trace）方法，通过细粒度信用分配优化RLVR中的推理能力	reinforcement learning large language model
82	FedeKD: Energy-Based Gating for Robust Federated Knowledge Distillation under Heterogeneous Settings	提出FedeKD框架，利用基于能量的门控机制解决异构联邦知识蒸馏中的负迁移问题。	distillation
83	Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics	提出语义状态抽象接口（SSAI）框架，通过多轴新闻分解实现LLM增强型投资组合决策的可解释性诊断。	PPO SAC
84	Measuring Learning Progress via Gradient-Momentum Coupling	提出梯度-动量耦合（GMC）方法，通过优化动力学量化学习进度以提升强化学习的探索效率。	reinforcement learning curriculum learning

🔬 支柱一：机器人控制 (Robot Control) (5 篇)

#	题目	一句话要点	标签	🔗
85	AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning	AdaGamma：提出状态依赖折扣的强化学习方法，提升时序自适应性	manipulation reinforcement learning PPO
86	Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs	提出Memory Inception，通过潜空间KV缓存操控实现LLM的精准引导。	manipulation large language model
87	Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies	提出IEL算法，通过学习同构嵌入提升离线强化学习多阶段规划能力	locomotion reinforcement learning policy learning	✅
88	Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation	提出语义包络提升方法，应对平台操纵的安全审计指标，确保在线安全监管有效性	manipulation
89	A Systematic Investigation of The RL-Jailbreaker in LLMs	系统性解构大语言模型中的强化学习越狱攻击：揭示环境形式化对攻击成功率的决定性影响	manipulation reinforcement learning reward shaping

🔬 支柱八：物理动画 (Physics-based Animation) (4 篇)

#	题目	一句话要点	标签
90	Generalising Travel Time Prediction To Varying Route Choices In Urban Networks	提出通用旅行时间预测器（GenTTP），解决城市网络中因路径选择变化导致的预测泛化难题。	spatiotemporal
91	Dual-Scale Temporal Fusion Reveals Structured Predictability in Subseasonal-to-Seasonal Temperature Prediction	提出双尺度时序融合框架，通过解耦气候背景与天气演变提升S2S温度预测精度	spatiotemporal
92	TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond	提出TraXion预训练框架，通过建模多实体时空事件流（MESES）解决移动性及跨领域预测难题	spatiotemporal
93	Towards Scalable One-Step Generative Modeling for Autoregressive Dynamical System Forecasting	提出MeLISA模型：一种基于像素空间MeanFlow的自回归生成式代理模型，实现高效长程动力学预测。	spatiotemporal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
94	Continuous First, Discrete Later: VQ-VAEs Without Dimensional Collapse	提出AE预热策略以解决VQ-VAE中的维度坍缩问题，显著提升表征质量与重建性能	VQ-VAE

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
95	Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation	提出基于材料感知的哈密顿风险场（MHRF），实现机器人安全导航中的选择性避障与风险抑制。	semantic map

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-07）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (42 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (42 篇)

🔬 支柱一：机器人控制 (Robot Control) (5 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (4 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理