cs.LG(2026-05-07)

📊 共 95 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (42 🔗7) 支柱二:RL算法与架构 (RL & Architecture) (42 🔗3) 支柱一:机器人控制 (Robot Control) (5 🔗1) 支柱八:物理动画 (Physics-based Animation) (4) 支柱四:生成式动作 (Generative Motion) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (42 篇)

#题目一句话要点标签🔗
1 Crafting Reversible SFT Behaviors in Large Language Models 提出LCDD与SFT-Eraser,实现大语言模型SFT行为的可逆稀疏化控制 large language model
2 Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models 提出Agentic AI范式,解决大模型在分布外泛化中的固有局限性 foundation model
3 Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models 揭示表格Transformer模型推理冗余,提出单层循环模型实现性能媲美。 foundation model
4 MINER: Mining Multimodal Internal Representation for Efficient Retrieval 提出MINER,挖掘多模态内部表征,实现高效视觉文档检索 multimodal
5 Band Together: Untargeted Adversarial Training with Multimodal Coordination against Evasion-based Promotion Attacks 提出UAT-MC,通过多模态协同对抗训练提升推荐系统抵御规避式攻击的能力。 multimodal
6 Federation of Experts: Communication Efficient Distributed Inference for Large Language Models 提出专家联邦(FoE)架构,提升大规模语言模型分布式推理的通信效率。 large language model
7 Towards Generation-Efficient Uncertainty Estimation in Large Language Models 提出高效不确定性估计方法,减少大语言模型生成需求,加速不可靠响应识别。 large language model
8 TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models 提出TFM-Retouche,一种轻量级的表格数据输入空间适配器,提升表格基础模型的性能。 foundation model
9 TabCF: Distributional Control Function Estimation with Tabular Foundation Models TabCF:利用表格型基础模型进行分布控制函数估计,实现高效因果推断 foundation model
10 Verifier-Backed Hard Problem Generation for Mathematical Reasoning 提出VHG框架,通过验证器增强的自博弈生成高质量数学推理难题 large language model
11 Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less 优化器-模型一致性:全量微调使用与预训练相同的优化器能减少遗忘 large language model
12 When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds 基于ℓ₁范数下界理论,揭示SignSGD优于SGD的条件与原因 foundation model
13 The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity 揭示LLM注意力黑洞的结构性根源,提出head-wise RMSNorm加速收敛。 large language model
14 SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders 提出SoftSAE,通过动态Top-K选择机制实现自适应稀疏自编码器,提升表征能力。 large language model
15 How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation 提出动态预算分配方法以优化多轮LLM评估 large language model
16 Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization 提出权重衰减机制以优化Transformer损失景观 large language model
17 FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning FedAttr:联邦LLM微调中面向隐私保护的客户端归因方法 large language model
18 PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization 提出PACZero以解决大语言模型隐私保护与效用平衡问题 large language model
19 Invariant-Based Diagnostics for Graph Benchmarks 提出基于图不变性的诊断框架,评估图神经网络是否真正学习图结构。 foundation model
20 SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask SparseForge:通过Hessian引导的软掩码退火实现高效的半结构化LLM稀疏化 large language model
21 Preliminary Insights in Chronos Frequency Data Understanding and Reconstruction 初步分析Chronos模型对频域信息的理解与重构能力 foundation model
22 Teaching LLMs Program Semantics via Symbolic Execution Traces 利用符号执行轨迹教导LLM理解程序语义,显著提升程序验证任务中的缺陷检测能力。 chain-of-thought
23 One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning DualSFT:一种用于LLM微调中参数和数据联合选择的双重评分算法 large language model
24 BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification 提出BoostLLM,通过Boosting思想微调LLM,提升小样本表格分类性能。 large language model
25 Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards 提出矩阵解耦浓度方法以解决自回归序列的稀疏奖励问题 large language model
26 Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions 提出Prompt-only SV,通过联合训练steering factor和方向,在不牺牲生成质量的前提下引导LLM行为。 large language model
27 A Theory of Online Learning with Autoregressive Chain-of-Thought Reasoning 建立自回归思维链学习的在线学习理论,揭示思维链对降低错误界限的关键作用 large language model chain-of-thought
28 A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis 提出一种基于提示词工程的可复现优化协议,以提升大语言模型在证据合成任务中的表现。 large language model
29 McNdroid: A Longitudinal Multimodal Benchmark for Robust Drift Detection in Android Malware 提出McNdroid基准数据集,通过多模态融合解决Android恶意软件检测中的概念漂移问题。 multimodal
30 Hypothesis generation and updating in large language models 通过数论游戏探究大语言模型的假设生成与更新机制及其贝叶斯推理偏差 large language model
31 LLMSpace: Carbon Footprint Modeling for Large Language Model Inference on LEO Satellites 提出LLMSpace碳足迹建模框架,量化低地球轨道(LEO)卫星部署大模型推理的碳排放。 large language model
32 SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders 提出SoftSAE:一种基于可微Soft Top-K机制的自适应稀疏自编码器,实现输入依赖的动态特征选择。 large language model
33 Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks Delulu:经验证的多语言代码幻觉检测基准,用于中间填充任务 large language model
34 Response Time Enhances Alignment with Heterogeneous Preferences 引入响应时间信号以解决异构偏好下的LLM对齐偏差问题 large language model
35 One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators 提出神经算子学习条件概率,解决不确定性建模中的条件概率泛化问题 foundation model
36 Dataset Watermarking for Closed LLMs with Provable Detection 提出首个针对闭源大模型的数据集水印方法,实现可证明的训练数据溯源检测 large language model
37 Conformal Agent Error Attribution 提出基于共形预测的智能体错误归因框架,实现多智能体系统故障的精准定位与自动回滚。 large language model
38 Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA 提出GLoRA:一种面向联邦LoRA的规范不变低秩聚合框架,解决参数聚合中的语义失配问题。 large language model
39 VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading 提出VisMMoE系统,通过视觉-专家亲和性优化实现高效视觉-语言混合专家模型卸载 multimodal
40 Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio 提出基于信噪比(SNR)的模块化学习率缩放方法(MoLS),解决大模型训练中的梯度噪声不平衡问题。 large language model
41 CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning 提出CRAFT框架:通过基于遗忘感知的表示空间干预实现大语言模型持续学习 large language model
42 On the Blessing of Pre-training in Weak-to-Strong Generalization 揭示弱监督向强模型泛化(W2SG)的本质:预训练作为几何暖启动的关键作用 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (42 篇)

#题目一句话要点标签🔗
43 Entropy-Regularized Adjoint Matching for Offline RL 提出最大熵伴随匹配(ME-AM)方法,解决离线强化学习中的流行度偏差和支持绑定问题。 reinforcement learning offline RL offline reinforcement learning
44 Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark 提出MTG-Causal-RL基准,用于评估复杂卡牌游戏中因果强化学习算法 reinforcement learning PPO world model
45 Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning 提出自适应Q-分块(AQC)方法,解决离线到在线强化学习中动作分块尺寸固定的问题。 reinforcement learning VLA
46 On the Safety of Graph Representation Learning 提出GRL-Safety图表示学习安全评估基准,揭示现有方法在部署压力下的可靠性问题。 representation learning foundation model
47 SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation SNAPO:通过可微仿真实现最优控制的平滑神经伴随策略优化 reinforcement learning differentiable simulation
48 A Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignment 提出Pair-GRPO家族,通过显隐偏好约束提升RLHF对齐的稳定性和泛化性 reinforcement learning preference learning RLHF
49 A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions 提出FP-FM算法,通过函数投影实现生成模型对未知分布的少样本快速适应 flow matching language conditioned
50 Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer SlimDT:通过序列建模外部注入条件信息,提升Decision Transformer效率与性能 reinforcement learning offline reinforcement learning decision transformer
51 Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level 提出非对称On-Policy蒸馏(AOPD),提升数学推理任务中token级别模仿学习效果。 reinforcement learning distillation
52 Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems 提出一致性蒸馏流动匹配方法,加速高精度动力系统物理场重建。 flow matching distillation
53 Dynamic Treatment on Networks 提出Q-Ising框架,解决网络中动态干预策略的优化问题 reinforcement learning offline RL offline reinforcement learning
54 Operator-Guided Invariance Learning for Continuous Reinforcement Learning 提出VPSD-RL,通过算子引导的不变性学习提升连续强化学习的数据效率和鲁棒性。 reinforcement learning
55 Flow Matching with Arbitrary Auxiliary Paths 提出AuxPath-FM,通过任意辅助路径扩展Flow Matching生成模型 flow matching
56 Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex 提出列表式策略优化LPO,提升LLM推理能力并保证优化稳定性和响应多样性 reinforcement learning large language model
57 PRISM: Iterative Cross-Modal Posterior Refinement for Dynamic Text-Attributed Graphs 提出PRISM框架,通过迭代跨模态后验精炼提升动态文本属性图表示学习。 representation learning multimodal
58 Normalized Architectures are Natively 4-Bit 提出nGPT架构,原生支持4比特量化训练,提升大模型效率。 Mamba large language model
59 Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference Feather:通过强化学习优化LLM推理中批大小与前缀同质性的调度器 reinforcement learning large language model
60 Soft Deterministic Policy Gradient with Gaussian Smoothing 提出基于高斯平滑的软确定性策略梯度(Soft-DPG),解决稀疏奖励下的策略梯度不稳定问题 reinforcement learning deep reinforcement learning
61 Optimal Transport for LLM Reward Modeling from Noisy Preference 提出SelectiveRM框架,利用最优传输解决LLM奖励建模中噪声偏好问题 reinforcement learning RLHF
62 How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment 提出影子掩码蒸馏(Shadow Mask Distillation)方法,解决强化学习后训练中KV缓存压缩导致的策略偏差问题。 reinforcement learning PPO RLHF
63 Offline Reinforcement Learning for Rotation Profile Control in Tokamaks 提出基于离线强化学习的托卡马克等离子体旋转剖面控制方法 reinforcement learning offline RL offline reinforcement learning
64 Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings 提出C3PO因果感知基础模型,解决离散选择环境下的双层价格优化问题 imitation learning foundation model
65 Entropy-Regularized Adjoint Matching for Offline Reinforcement Learning 提出最大熵伴随匹配(ME-AM)框架,解决离线强化学习中的流行度偏差与支持集限制问题。 reinforcement learning offline reinforcement learning flow matching
66 $f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses 提出基于通用f-散度正则化的在线RLHF理论框架,实现最优遗憾界与收敛性分析 reinforcement learning RLHF large language model
67 MDN: Parallelizing Stepwise Momentum for Delta Linear Attention 提出Momentum DeltaNet (MDN),通过分块并行动量机制优化线性注意力模型 Mamba linear attention large language model
68 Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs 提出掩码奖励行为树(MRBT)框架,结合LLM与神经符号强化学习实现组合任务的高效求解 reinforcement learning reward shaping large language model
69 Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using LLM Judges with Closed-Loop Reinforcement Learning Feedback 提出基于LLM判别器与闭环强化学习的智能体股票预测行为评估框架 reinforcement learning SAC large language model
70 Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators 提出Echo架构:利用谱Koopman算子实现无KV缓存的关联记忆检索 Mamba SSM chain-of-thought
71 Revisiting Adam for Streaming Reinforcement Learning 重审流式强化学习中的Adam优化器:提出Adaptive Q(λ)以实现高效在线学习 reinforcement learning deep reinforcement learning
72 Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level 提出非对称在线策略蒸馏(AOPD)方法,通过令牌级反馈优化解决强化学习中的训练瓶颈 reinforcement learning distillation
73 Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing 提出Near-Policy Distillation,加速自回归模型知识蒸馏,缓解分布不匹配问题。 reinforcement learning distillation
74 RepFlow: Representation Enhanced Flow Matching for Causal Effect Estimation 提出RepFlow框架,通过表征增强与条件流匹配实现因果效应估计 flow matching representation learning
75 AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling 提出AeroJEPA架构,通过联合嵌入预测实现可扩展的3D空气动力学场建模与语义表征学习。 Joint-Embedding Predictive Architecture joint-embedding predictive architecture latent optimization
76 A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models 提出统一的生成模型框架以解析扩散与流匹配问题 flow matching
77 Towards Differentially Private Reinforcement Learning with General Function Approximation 提出首个基于通用函数逼近的差分隐私在线强化学习理论框架 reinforcement learning
78 Adaptive Memory Decay for Log-Linear Attention 提出自适应记忆衰减机制,优化对数线性注意力模型的长程上下文建模能力 linear attention
79 Physics-Based Flow Matching for Full-Field Prediction of Silicon Photonic Devices 提出PIC-Flow生成式神经代理模型,通过物理约束流匹配实现硅光子器件全场电磁场预测。 flow matching
80 Gradient Extrapolation-Based Policy Optimization 提出梯度外推策略优化(GXPO),通过高效梯度预测提升大模型推理强化学习性能 reinforcement learning large language model
81 Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR 提出选择性资格迹(S-trace)方法,通过细粒度信用分配优化RLVR中的推理能力 reinforcement learning large language model
82 FedeKD: Energy-Based Gating for Robust Federated Knowledge Distillation under Heterogeneous Settings 提出FedeKD框架,利用基于能量的门控机制解决异构联邦知识蒸馏中的负迁移问题。 distillation
83 Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics 提出语义状态抽象接口(SSAI)框架,通过多轴新闻分解实现LLM增强型投资组合决策的可解释性诊断。 PPO SAC
84 Measuring Learning Progress via Gradient-Momentum Coupling 提出梯度-动量耦合(GMC)方法,通过优化动力学量化学习进度以提升强化学习的探索效率。 reinforcement learning curriculum learning

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
85 AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning AdaGamma:提出状态依赖折扣的强化学习方法,提升时序自适应性 manipulation reinforcement learning PPO
86 Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs 提出Memory Inception,通过潜空间KV缓存操控实现LLM的精准引导。 manipulation large language model
87 Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies 提出IEL算法,通过学习同构嵌入提升离线强化学习多阶段规划能力 locomotion reinforcement learning policy learning
88 Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation 提出语义包络提升方法,应对平台操纵的安全审计指标,确保在线安全监管有效性 manipulation
89 A Systematic Investigation of The RL-Jailbreaker in LLMs 系统性解构大语言模型中的强化学习越狱攻击:揭示环境形式化对攻击成功率的决定性影响 manipulation reinforcement learning reward shaping

🔬 支柱八:物理动画 (Physics-based Animation) (4 篇)

#题目一句话要点标签🔗
90 Generalising Travel Time Prediction To Varying Route Choices In Urban Networks 提出通用旅行时间预测器(GenTTP),解决城市网络中因路径选择变化导致的预测泛化难题。 spatiotemporal
91 Dual-Scale Temporal Fusion Reveals Structured Predictability in Subseasonal-to-Seasonal Temperature Prediction 提出双尺度时序融合框架,通过解耦气候背景与天气演变提升S2S温度预测精度 spatiotemporal
92 TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond 提出TraXion预训练框架,通过建模多实体时空事件流(MESES)解决移动性及跨领域预测难题 spatiotemporal
93 Towards Scalable One-Step Generative Modeling for Autoregressive Dynamical System Forecasting 提出MeLISA模型:一种基于像素空间MeanFlow的自回归生成式代理模型,实现高效长程动力学预测。 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
94 Continuous First, Discrete Later: VQ-VAEs Without Dimensional Collapse 提出AE预热策略以解决VQ-VAE中的维度坍缩问题,显著提升表征质量与重建性能 VQ-VAE

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
95 Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation 提出基于材料感知的哈密顿风险场(MHRF),实现机器人安全导航中的选择性避障与风险抑制。 semantic map

⬅️ 返回 cs.LG 首页 · 🏠 返回主页