cs.LG(2026-05-28)

📊 共 59 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (30 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (21 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (30 篇)

#题目一句话要点标签🔗
1 Momentum Based Reward Design for Low Emission Traffic Signal Control 提出基于动量的奖励函数,用于优化低排放交通信号控制 reinforcement learning deep reinforcement learning DRL
2 Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning 提出Hista和Numca,有效提升LLM强化学习中的状态价值估计 reinforcement learning PPO large language model
3 PEARL: Training Socratic Tutors with Pedagogically Aligned Reinforcement Learning PEARL:基于教学对齐强化学习训练苏格拉底式辅导模型 reinforcement learning large language model
4 How's it going? Reinforcement learning in language models recruits a functional welfare axis 强化学习在语言模型中激活功能性福利轴,影响模型行为 reinforcement learning
5 MIC: Maximizing Informational Capacity in Adaptive Representations via Isotropic Subspace Alignment 提出MIC框架,通过各向同性子空间对齐最大化自适应表征的信息容量,尤其在高压缩场景下。 representation learning distillation
6 Rethinking Post-Training Recipes for Multimodal Time-Series Forecasting 提出PostTime,通过后训练LLM修正数值时间序列预测,实现多模态时间序列预测。 reinforcement learning foundation model multimodal
7 TRACER: Persistent Regularization for Robust Multimodal Finetuning 提出TRACER,通过持续正则化提升多模态微调的鲁棒性和泛化能力 contrastive learning distillation multimodal
8 GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models 提出GDSD:通过引导去噪器自蒸馏进行扩散语言模型的强化学习 reinforcement learning distillation large language model
9 Chess-World-Model: A 10M-Game Benchmark for Exact State Tracking from Chess Move Sequences 提出Chess-World-Model:一个基于1000万棋局的精确状态追踪基准 world model world models Mamba
10 LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation LoopFM:利用历史表征,提升推荐系统中垂直模型对基础模型的知识迁移效率 distillation foundation model
11 Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification 提出基于小波变换和谱流匹配的DSFM模型,用于生成fMRI时间序列并识别脑部疾病。 flow matching spatiotemporal
12 Learning to Perceive the World Through Control: Empowerment-Based Representation Learning 提出基于控制的表征学习方法,通过最大化控制能力提取控制相关特征。 reinforcement learning representation learning
13 Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics 提出基于状态增强和共识机制的可扩展约束多智能体强化学习方法,解决可分离动态系统中的资源约束问题。 reinforcement learning policy learning
14 Bounded Behavioral Indistinguishability for Black-Box LLM Distillation 提出有界行为不可区分性以改进黑箱LLM蒸馏评估 teacher-student distillation
15 Calibrated Preference Learning: The Case of Label Ranking 针对标签排序任务,论文提出校准学习框架,提升排序预测的可靠性。 preference learning RLHF
16 In-Context Reward Adaptation for Robust Preference Modeling 提出In-Context Reward Adaptation框架,解决RLHF中奖励模型泛化性问题 reinforcement learning RLHF large language model
17 ESPO: Early-Stopping Proximal Policy Optimization ESPO:提前停止近端策略优化,加速LLM强化学习并提升数学推理能力 reinforcement learning PPO large language model
18 Information-Directed Offline-to-Online Reinforcement Learning 提出信息导向的离线到在线强化学习方法以解决探索问题 reinforcement learning offline RL
19 LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation 提出LARK以解决推理蒸馏中的轨迹选择问题 distillation
20 Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets 提出一种基于统计嵌入的表格数据相似性、检索和可解释对齐方法 predictive model large language model
21 On Distributional Reinforcement Learning in Chaotic Dynamical Systems 提出基于Wasserstein距离的分布强化学习,解决混沌动力系统中的高方差问题 reinforcement learning
22 RL2ML: Finite-Rollout Surrogate Objectives from Reinforcement Learning to Maximum Likelihood 提出RL2ML,连接强化学习与最大似然,优化语言模型训练。 reinforcement learning
23 Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies 提出AWD正则化方法,解决ES微调LLM中的遗忘问题,提升持续学习能力 reinforcement learning large language model
24 Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption 提出SW-DRSO框架,增强集合表征学习在推理时元素损坏下的鲁棒性 representation learning
25 A Predictive Law for On-Policy Self-Distillation From World Feedback 提出一种预测性规律,用于从世界反馈中进行On-Policy自蒸馏,提升后训练效率。 distillation
26 LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training 提出LaRA框架,通过层级表征分析检测RL后训练中LLM的数据污染问题 reinforcement learning large language model
27 Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets 提出异构带宽预算下的联邦探测-逻辑蒸馏优化分配方法 distillation
28 On-Policy Replay for Continual Supervised Fine-Tuning 提出On-Policy Replay方法,解决持续监督微调中大语言模型的灾难性遗忘问题。 distillation large language model
29 When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer 通过强化学习和新颖性奖励,提升LLM在谜题到数学的跨领域推理能力 reinforcement learning chain-of-thought
30 Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents 提出MF-Diffuser以解决多智能体离线强化学习的扩展问题 reinforcement learning offline reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (21 篇)

#题目一句话要点标签🔗
31 Representation Collapse in Sequential Post-Training of Large Language Models 研究序列后训练中大语言模型的表征坍塌现象,并提出干预方法。 large language model chain-of-thought
32 MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding MIRAGE:自适应多模态门控的全脑fMRI编码模型,提升预测精度与可解释性。 foundation model multimodal
33 OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction 提出OOD-GraphLLM,解决药物协同预测中的分布外泛化问题 large language model
34 Fingerprinting Inference Systems of Large Language Models 提出LLM推理系统指纹识别方法,通过分析LLM的prompt-response行为来识别底层组件。 large language model
35 NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models NumLeak:利用公开数值基准作为基础模型中的潜在标签,揭示记忆泄露问题 foundation model
36 When Do Graph Foundation Models Transfer? A Data-Centric Theory 提出数据驱动的图神经网络迁移学习理论,分析领域差异对模型输出的影响 foundation model
37 Inferring the Size of Large Language Models From Popular Text Memorization 提出一种黑盒方法,仅通过文本输出来推断大型语言模型的参数规模下限。 large language model
38 CLUBench: A Clustering Benchmark CLUBench:构建全面的聚类基准,促进算法选择与部署。 large language model foundation model
39 CSULoRA: Closest Safe Update Low-Rank Adaptation 提出CSULoRA,通过最邻近安全更新实现LoRA的安全对齐微调。 large language model
40 CacheProbe: Auditing Prompt Cache Isolation in Gateway APIs CacheProbe:审计网关API中Prompt缓存隔离的安全性 large language model
41 The Long-Term Effects of Data Selection in LLM Fine-Tuning 研究LLM微调中数据选择的长期影响,揭示短视选择的潜在问题并提出改进方案。 large language model
42 When, why, and how do diffusion posterior samplers fail? A finite-sample lens 通过有限样本视角分析扩散后验采样器失效的原因与机制 multimodal
43 SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones? SoundnessBench:评估AI科学家判断科研提案合理性的能力 large language model
44 CalArena: A Large-Scale Post-Hoc Calibration Benchmark CalArena:大规模事后校准基准测试,促进可靠概率估计研究 foundation model
45 Learning to Extrapolate to New Tasks: A Relational Approach to Task Extrapolation 提出关系任务外推器RTE,解决模型在新任务上的外推泛化难题 foundation model
46 Convergence Theory for Iterative LLM-Based Neural Architecture Search: A Parametric Cross-Entropy Framework with Closed-Form Proxy Reliability 提出基于LLM的NAS收敛性理论,并用参数化交叉熵框架进行验证。 large language model
47 Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection 通过电路级分析揭示LLM漏洞检测机制,发现其依赖安全模式识别而非直接漏洞检测。 large language model
48 Feedback-to-Rubrics: Can We Learn Expert Criteria from Inline Comments? 提出Feedback-to-Rubrics方法,从内联评论中学习可复用的专家评估准则。 large language model
49 On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference 提出LoRA-Curve,探索LoRA空间中的低损耗路径,提升贝叶斯推理不确定性估计。 large language model
50 SCOPE: A Lightweight-training LLM Framework for Air Traffic Control Readback Monitoring 提出SCOPE轻量级训练框架,用于空管指令复诵监控,提升效率与准确率。 large language model
51 Solving Integer Linear Programming with Parallel Tempering 提出基于Parallel Tempering的无求解器整数线性规划方法 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
52 EMAG: Differentiable 4D Gaussian Mixture Splatting for EEG Spatial Super-Resolution EMAG:提出基于可微4D高斯混合溅射的脑电空间超分辨率方法 splatting
53 Early Prediction of Future Behavioral Strategy from Process Traces 提出PLVM模型,利用早期行为轨迹预测未来行为策略,应用于人机协作系统。 affordance
54 A Novel Computer Vision Approach for Assessing Fish Responses to Intrusive Objects in Aquaculture 提出一种基于计算机视觉的水产养殖中鱼类对入侵物体反应评估方法 depth estimation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
55 BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies BOKBO:为VLA策略提供校准后的拒绝执行,保障安全操作。 manipulation vision-language-action VLA
56 Learning to Perturb Hidden Representations for Generalizable Deep Learning 提出LPA,自适应扰动深度神经网络隐藏层激活,提升模型泛化性 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
57 Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM 提出Text2BFM框架以解决长文本运动生成问题 text-to-motion motion generation motion representation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
58 Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts 构建偏差感知基准以评估物理基础模型的可泛化性 spatiotemporal foundation model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
59 Privacy-Enhanced Zero-Order Federated Learning via xMK-CKKS over Wireless Channels 提出基于xMK-CKKS的隐私增强零阶联邦学习方案 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页