cs.LG（2026-05-28）

📊 共 59 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (30 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (21 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱一：机器人控制 (Robot Control) (2) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (30 篇)

#	题目	一句话要点	标签	🔗
1	Momentum Based Reward Design for Low Emission Traffic Signal Control	提出基于动量的奖励函数，用于优化低排放交通信号控制	reinforcement learning deep reinforcement learning DRL
2	Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning	提出Hista和Numca，有效提升LLM强化学习中的状态价值估计	reinforcement learning PPO large language model
3	PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning	PEARL：基于教学对齐强化学习训练苏格拉底式辅导模型	reinforcement learning large language model
4	How's it going? Reinforcement learning in language models recruits a functional welfare axis	强化学习在语言模型中激活功能性福利轴，影响模型行为	reinforcement learning
5	MIC: Maximizing Informational Capacity in Adaptive Representations via Isotropic Subspace Alignment	提出MIC框架，通过各向同性子空间对齐最大化自适应表征的信息容量，尤其在高压缩场景下。	representation learning distillation
6	Rethinking Post-Training Recipes for Multimodal Time-Series Forecasting	提出PostTime，通过后训练LLM修正数值时间序列预测，实现多模态时间序列预测。	reinforcement learning foundation model multimodal
7	TRACER: Persistent Regularization for Robust Multimodal Finetuning	提出TRACER，通过持续正则化提升多模态微调的鲁棒性和泛化能力	contrastive learning distillation multimodal	✅
8	GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models	提出GDSD：通过引导去噪器自蒸馏进行扩散语言模型的强化学习	reinforcement learning distillation large language model	✅
9	Chess-World-Model: A 10M-Game Benchmark for Exact State Tracking from Chess Move Sequences	提出Chess-World-Model：一个基于1000万棋局的精确状态追踪基准	world model world models Mamba
10	LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation	LoopFM：利用历史表征，提升推荐系统中垂直模型对基础模型的知识迁移效率	distillation foundation model
11	Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification	提出基于小波变换和谱流匹配的DSFM模型，用于生成fMRI时间序列并识别脑部疾病。	flow matching spatiotemporal	✅
12	Learning to Perceive the World Through Control: Empowerment-Based Representation Learning	提出基于控制的表征学习方法，通过最大化控制能力提取控制相关特征。	reinforcement learning representation learning
13	Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics	提出基于状态增强和共识机制的可扩展约束多智能体强化学习方法，解决可分离动态系统中的资源约束问题。	reinforcement learning policy learning
14	Bounded Behavioral Indistinguishability for Black-Box LLM Distillation	提出有界行为不可区分性以改进黑箱LLM蒸馏评估	teacher-student distillation
15	Calibrated Preference Learning: The Case of Label Ranking	针对标签排序任务，论文提出校准学习框架，提升排序预测的可靠性。	preference learning RLHF
16	In-Context Reward Adaptation for Robust Preference Modeling	提出In-Context Reward Adaptation框架，解决RLHF中奖励模型泛化性问题	reinforcement learning RLHF large language model
17	ESPO: Early-Stopping Proximal Policy Optimization	ESPO：提前停止近端策略优化，加速LLM强化学习并提升数学推理能力	reinforcement learning PPO large language model
18	Information-Directed Offline-to-Online Reinforcement Learning	提出信息导向的离线到在线强化学习方法以解决探索问题	reinforcement learning offline RL
19	LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation	提出LARK以解决推理蒸馏中的轨迹选择问题	distillation	✅
20	Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets	提出一种基于统计嵌入的表格数据相似性、检索和可解释对齐方法	predictive model large language model
21	On Distributional Reinforcement Learning in Chaotic Dynamical Systems	提出基于Wasserstein距离的分布强化学习，解决混沌动力系统中的高方差问题	reinforcement learning
22	RL2ML: Finite-Rollout Surrogate Objectives from Reinforcement Learning to Maximum Likelihood	提出RL2ML，连接强化学习与最大似然，优化语言模型训练。	reinforcement learning
23	Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies	提出AWD正则化方法，解决ES微调LLM中的遗忘问题，提升持续学习能力	reinforcement learning large language model
24	Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption	提出SW-DRSO框架，增强集合表征学习在推理时元素损坏下的鲁棒性	representation learning
25	A Predictive Law for On-Policy Self-Distillation From World Feedback	提出一种预测性规律，用于从世界反馈中进行On-Policy自蒸馏，提升后训练效率。	distillation
26	LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training	提出LaRA框架，通过层级表征分析检测RL后训练中LLM的数据污染问题	reinforcement learning large language model
27	Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets	提出异构带宽预算下的联邦探测-逻辑蒸馏优化分配方法	distillation
28	On-Policy Replay for Continual Supervised Fine-Tuning	提出On-Policy Replay方法，解决持续监督微调中大语言模型的灾难性遗忘问题。	distillation large language model	✅
29	When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer	通过强化学习和新颖性奖励，提升LLM在谜题到数学的跨领域推理能力	reinforcement learning chain-of-thought
30	Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents	提出MF-Diffuser以解决多智能体离线强化学习的扩展问题	reinforcement learning offline reinforcement learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

#	题目	一句话要点	标签	🔗
31	Representation Collapse in Sequential Post-Training of Large Language Models	研究序列后训练中大语言模型的表征坍塌现象，并提出干预方法。	large language model chain-of-thought
32	MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding	MIRAGE：自适应多模态门控的全脑fMRI编码模型，提升预测精度与可解释性。	foundation model multimodal
33	OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction	提出OOD-GraphLLM，解决药物协同预测中的分布外泛化问题	large language model	✅
34	Fingerprinting Inference Systems of Large Language Models	提出LLM推理系统指纹识别方法，通过分析LLM的prompt-response行为来识别底层组件。	large language model
35	NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models	NumLeak：利用公开数值基准作为基础模型中的潜在标签，揭示记忆泄露问题	foundation model
36	When Do Graph Foundation Models Transfer? A Data-Centric Theory	提出数据驱动的图神经网络迁移学习理论，分析领域差异对模型输出的影响	foundation model
37	Inferring the Size of Large Language Models From Popular Text Memorization	提出一种黑盒方法，仅通过文本输出来推断大型语言模型的参数规模下限。	large language model
38	CLUBench: A Clustering Benchmark	CLUBench：构建全面的聚类基准，促进算法选择与部署。	large language model foundation model
39	CSULoRA: Closest Safe Update Low-Rank Adaptation	提出CSULoRA，通过最邻近安全更新实现LoRA的安全对齐微调。	large language model
40	CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs	CacheProbe：审计网关API中Prompt缓存隔离的安全性	large language model
41	The Long-Term Effects of Data Selection in LLM Fine-Tuning	研究LLM微调中数据选择的长期影响，揭示短视选择的潜在问题并提出改进方案。	large language model
42	When, why, and how do diffusion posterior samplers fail? A finite-sample lens	通过有限样本视角分析扩散后验采样器失效的原因与机制	multimodal
43	SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?	SoundnessBench：评估AI科学家判断科研提案合理性的能力	large language model
44	CalArena: A Large-Scale Post-Hoc Calibration Benchmark	CalArena：大规模事后校准基准测试，促进可靠概率估计研究	foundation model
45	Learning to Extrapolate to New Tasks: A Relational Approach to Task Extrapolation	提出关系任务外推器RTE，解决模型在新任务上的外推泛化难题	foundation model
46	Convergence Theory for Iterative LLM-Based Neural Architecture Search: A Parametric Cross-Entropy Framework with Closed-Form Proxy Reliability	提出基于LLM的NAS收敛性理论，并用参数化交叉熵框架进行验证。	large language model
47	Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection	通过电路级分析揭示LLM漏洞检测机制，发现其依赖安全模式识别而非直接漏洞检测。	large language model
48	Feedback-to-Rubrics: Can We Learn Expert Criteria from Inline Comments?	提出Feedback-to-Rubrics方法，从内联评论中学习可复用的专家评估准则。	large language model
49	On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference	提出LoRA-Curve，探索LoRA空间中的低损耗路径，提升贝叶斯推理不确定性估计。	large language model
50	SCOPE: A Lightweight-training LLM Framework for Air Traffic Control Readback Monitoring	提出SCOPE轻量级训练框架，用于空管指令复诵监控，提升效率与准确率。	large language model
51	Solving Integer Linear Programming with Parallel Tempering	提出基于Parallel Tempering的无求解器整数线性规划方法	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签
52	EMAG: Differentiable 4D Gaussian Mixture Splatting for EEG Spatial Super-Resolution	EMAG：提出基于可微4D高斯混合溅射的脑电空间超分辨率方法	splatting
53	Early Prediction of Future Behavioral Strategy from Process Traces	提出PLVM模型，利用早期行为轨迹预测未来行为策略，应用于人机协作系统。	affordance
54	A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture	提出一种基于计算机视觉的水产养殖中鱼类对入侵物体反应评估方法	depth estimation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
55	BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies	BOKBO：为VLA策略提供校准后的拒绝执行，保障安全操作。	manipulation vision-language-action VLA
56	Learning to Perturb Hidden Representations for Generalizable Deep Learning	提出LPA，自适应扰动深度神经网络隐藏层激活，提升模型泛化性	manipulation

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
57	Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM	提出Text2BFM框架以解决长文本运动生成问题	text-to-motion motion generation motion representation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
58	Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts	构建偏差感知基准以评估物理基础模型的可泛化性	spatiotemporal foundation model

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
59	Privacy-Enhanced Zero-Order Federated Learning via xMK-CKKS over Wireless Channels	提出基于xMK-CKKS的隐私增强零阶联邦学习方案	OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-28）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (30 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理