cs.LG(2026-02-12)

📊 共 33 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (20 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
1 On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage 提出新的框架以解决离线强化学习中的复杂性问题 reinforcement learning offline RL offline reinforcement learning
2 Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL 提出基于度量空间学习的多模态状态估计方法,提升强化学习在噪声环境下的鲁棒性。 reinforcement learning multimodal
3 FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client 提出FedGRPO,通过群体相对奖励在联邦学习中高效优化保护隐私的大模型。 reinforcement learning foundation model
4 In-Context Function Learning in Large Language Models 利用高斯过程视角分析大语言模型的上下文函数学习能力 reinforcement learning large language model
5 DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels DICE:扩散大语言模型擅长生成CUDA内核,性能超越自回归模型。 reinforcement learning large language model
6 TS-Memory: Plug-and-Play Memory for Time Series Foundation Models 提出TS-Memory以解决时间序列模型适应性问题 distillation foundation model
7 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation 提出广义On-Policy蒸馏框架G-OPD,通过奖励外推提升学生模型性能,甚至超越教师模型。 reinforcement learning teacher-student distillation
8 Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards 提出基于真实超算反馈的在线强化学习方法,提升LLM生成高性能HPC代码能力 reinforcement learning large language model
9 From Path Signatures to Sequential Modeling: Incremental Signature Contributions for Offline RL 提出增量签名贡献方法,用于解决离线强化学习中时序敏感控制问题 reinforcement learning offline RL offline reinforcement learning
10 Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning 提出TAMPO,将温度控制视为元策略,自适应提升LLM强化学习效果 reinforcement learning large language model
11 Unifying Stable Optimization and Reference Regularization in RLHF 统一稳定优化与参考正则化,提升RLHF对齐效果 reinforcement learning preference learning RLHF
12 Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data 提出Flow-Guided Neural Operator自监督学习框架,提升时间序列数据表征能力 flow matching representation learning masked autoencoder
13 Mitigating Mismatch within Reference-based Preference Optimization 提出Hybrid-DPO以解决直接偏好优化中的不匹配问题 DPO direct preference optimization large language model
14 Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training 提出基于分布判别理论的On-Policy SFT框架,提升LLM泛化能力 reinforcement learning offline RL DPO
15 The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics 提出PhyIP评估协议以解决物理模型适应性问题 world model
16 How Sampling Shapes LLM Alignment: From One-Shot Optima to Iterative Dynamics 研究采样策略对LLM对齐的影响,揭示迭代对齐中的稳定性和风险 direct preference optimization large language model
17 KAN-FIF: Spline-Parameterized Lightweight Physics-based Tropical Cyclone Estimation on Meteorological Satellite 提出KAN-FIF轻量级框架,用于气象卫星上基于物理的热带气旋估计。 MAE multimodal
18 Improved state mixing in higher-order and block diagonal linear recurrent networks 提出高阶和块对角线性循环网络,提升长序列建模的效率与表达能力。 Mamba SSM state space model
19 RAM-Net: Expressive Linear Attention with Selectively Addressable Memory 提出RAM-Net,通过可选择寻址的显式记忆增强线性注意力的表达能力 linear attention
20 Latent-Variable Learning of SPDEs via Wiener Chaos 提出基于Wiener混沌的SPDEs潜变量学习方法,无需噪声数据即可学习随机偏微分方程。 latent dynamics spatiotemporal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
21 SafeNeuron: Neuron-Level Safety Alignment for Large Language Models SafeNeuron:提出神经元级别的安全对齐方法,提升大语言模型对抗神经元剪枝攻击的鲁棒性。 large language model multimodal
22 PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models PASCAL:一种面向推理型大语言模型的阶段感知调度算法 large language model chain-of-thought
23 Manifold-Aware Temporal Domain Generalization for Large Language Models 提出MaT-LoRA,通过流形感知的时序领域泛化方法提升LLM在时序数据上的性能。 large language model
24 SkillRater: Untangling Capabilities in Multimodal Data SkillRater:解耦多模态数据中的能力,提升视觉语言模型性能 multimodal
25 Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal Brain4FMs:脑电信号基础模型评估基准,促进可扩展和可迁移学习。 foundation model
26 It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks TIME:面向下一代时间序列预测基准,解决现有基准的数据、任务和评估局限性。 large language model foundation model
27 Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing 提出加速Prompt压力测试(APST)框架,评估LLM在重复推理下的安全可靠性 large language model
28 Krause Synchronization Transformers 提出Krause注意力机制,通过局部同步缓解Transformer中的表征坍塌问题。 large language model
29 RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis RooflineBench:提出基于Roofline模型的片上LLM基准测试框架 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
30 Is Online Linear Optimization Sufficient for Strategic Robustness? 提出在线线性优化算法以提升竞标策略的鲁棒性 manipulation
31 Temporally Unified Adversarial Perturbations for Time Series Forecasting 提出时序统一对抗扰动(TUAPs)方法,提升时间序列预测模型的安全性。 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
32 SpaTeoGL: Spatiotemporal Graph Learning for Interpretable Seizure Onset Zone Analysis from Intracranial EEG SpaTeoGL:时空图学习用于颅内脑电癫痫起始区可解释性分析 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
33 Learn from Your Mistakes: Self-Correcting Masked Diffusion Models 提出ProSeCo,通过自校正机制提升Masked扩散模型的生成质量与效率。 MDM

⬅️ 返回 cs.LG 首页 · 🏠 返回主页