cs.LG(2026-02-24)

📊 共 20 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training 提出Actor-Curator,通过策略提升Bandit算法实现LLM后训练的协同自适应课程学习。 reinforcement learning curriculum learning large language model
2 Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning 提出LoDADA,通过局部动态感知领域自适应解决离线强化学习中的动态差异问题 reinforcement learning offline RL offline reinforcement learning
3 SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards SELAUR:基于不确定性感知奖励的自进化LLM Agent reinforcement learning reward design reward shaping
4 Scaling State-Space Models on Multiple GPUs with Tensor Parallelism 提出一种通信高效的张量并行方案,加速选择性状态空间模型在多GPU上的推理。 Mamba SSM state space model
5 TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer TrajGPT-R:提出基于强化学习增强的生成式预训练Transformer,用于生成城市出行轨迹 reinforcement learning offline reinforcement learning inverse reinforcement learning
6 Test-Time Training with KV Binding Is Secretly Linear Attention 揭示KV绑定测试时训练本质:实为线性注意力机制 linear attention
7 Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning 提出多智能体模仿学习的新方法以解决纳什均衡问题 imitation learning
8 Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm 提出ECO:一种高效的神经组合优化离线自博弈学习范式 DPO direct preference optimization Mamba
9 Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty 提出Fuz-RL,一种模糊逻辑引导的鲁棒强化学习框架,提升不确定性下的安全性。 reinforcement learning
10 A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies 提出THEMES框架,利用广义学徒学习捕获学生动态演化的教学策略 reinforcement learning deep reinforcement learning DRL

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
11 UrbanFM: Scaling Urban Spatio-Temporal Foundation Models UrbanFM:构建城市时空大模型,实现跨城市零样本泛化 foundation model
12 Oracle-Robust Online Alignment for Large Language Models 提出一种鲁棒的在线对齐方法以解决大语言模型的偏好反馈问题 large language model
13 Estimation of Confidence Bounds in Binary Classification using Wilson Score Kernel Density Estimation 提出Wilson Score核密度分类方法,用于二分类置信度边界估计,适用于关键检测任务。 foundation model
14 Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training 揭示Pass@k优化降低Pass@1的Prompt干扰现象,并提供理论解释。 large language model
15 Extending $μ$P: Spectral Conditions for Feature Learning Across Optimizers 提出基于谱条件的μP扩展框架,实现跨优化器特征学习与零样本迁移 large language model
16 QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs QEDBench:量化大学数学证明自动评估中的对齐差距 large language model
17 Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,λ}$ Targets 标准Transformer在非参数回归中实现最优极小最大速率 large language model
18 Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA 提出稀疏正交LoRA的无线联邦多任务LLM微调方法,解决异构数据下的知识冲突问题。 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
19 SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models 提出SOM-VQ以解决向量量化表示的语义结构缺失问题 manipulation motion generation VQ-VAE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
20 A Long-Short Flow-Map Perspective for Drifting Models 提出基于长短流映射分解的漂移模型新视角,优化密度演化。 PULSE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页