cs.LG(2026-02-11)

📊 共 36 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (17 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (16 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (17 篇)

#题目一句话要点标签🔗
1 Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models Pram:利用多模态语言模型解决多商品流问题,实现优化与效率的平衡 reinforcement learning multimodal
2 Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss 提出基于Jaccard系数Sigmoid损失的对比学习方法,用于多标签心电图分类 contrastive learning large language model multimodal
3 A Multimodal Conditional Mixture Model with Distribution-Level Physics Priors 提出基于混合密度网络的物理信息多模态条件混合模型,解决科学计算中多模态分布学习问题。 flow matching multimodal
4 RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization RePO:通过重述策略优化桥接在线学习与离线知识,提升LLM领域知识对齐效果。 reinforcement learning policy learning large language model
5 Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling 提出BNRM,通过贝叶斯非负奖励建模缓解RLHF中的奖励攻击问题 reinforcement learning RLHF large language model
6 Enhancing Ride-Hailing Forecasting at DiDi with Multi-View Geospatial Representation Learning from the Web 提出MVGR-Net,利用多视角地理空间表征学习提升网约车需求预测精度 representation learning large language model
7 Semi-Supervised Cross-Domain Imitation Learning 提出半监督跨域模仿学习算法,解决专家数据稀缺问题 policy learning imitation learning
8 Driving Reaction Trajectories via Latent Flow Matching 提出LatentRxnFlow,通过潜在流匹配建模化学反应轨迹,提升反应预测的透明性和可诊断性。 flow matching latent dynamics
9 Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards 提出非对称Prompt权重RL方法,加速可验证奖励下的策略学习 reinforcement learning
10 General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies 提出通用灵活的f散度,提升离线强化学习在低随机性和多样化策略数据集上的性能。 offline RL
11 Resource-Efficient Model-Free Reinforcement Learning for Board Games 提出一种资源高效的免模型强化学习算法,用于解决棋盘游戏中的决策问题。 reinforcement learning
12 SimuScene: Training and Benchmarking Code Generation to Simulate Physical Scenarios SimuScene:训练和评估LLM生成代码以模拟物理场景 reinforcement learning large language model
13 VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training VESPO:变分序列级软策略优化,用于稳定的大语言模型离线训练 reinforcement learning large language model
14 LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization 提出PiT-PO框架,通过强化学习自适应调整LLM,用于发现科学方程。 reinforcement learning large language model
15 Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering 提出Control Reinforcement Learning,通过学习SAE特征引导实现token级别机制分析。 reinforcement learning
16 Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning 提出二元流匹配,通过预测损失空间对齐实现二元数据生成模型的鲁棒学习。 flow matching
17 OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories 提出OSIL算法,利用非优轨迹学习离线安全模仿策略 policy learning imitation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
18 TabICLv2: A better, faster, scalable, and open tabular foundation model TabICLv2:一种更优、更快、可扩展且开放的表格数据基础模型 foundation model
19 MoToRec: Sparse-Regularized Multimodal Tokenization for Cold-Start Recommendation 提出MoToRec,通过稀疏正则化多模态Tokenization解决冷启动推荐问题 multimodal
20 Time Series Foundation Models for Energy Load Forecasting on Consumer Hardware: A Multi-Dimensional Zero-Shot Benchmark 提出能源负荷预测零样本基准,评估时间序列预训练模型在消费级硬件上的性能。 foundation model
21 dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning 提出dnaHNet:一种可扩展的分层基因组序列学习基础模型 foundation model
22 GENIUS: Generative Fluid Intelligence Evaluation Suite 提出GENIUS以评估生成流体智能能力 multimodal
23 Weight Decay Improves Language Model Plasticity 通过调整权重衰减提升语言模型的可塑性,增强下游任务微调性能 large language model
24 MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs MoEEdit:面向MoE LLM的高效且路由稳定的知识编辑框架 large language model
25 Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers 通过相位调制理论分析RoPE,为长文本Transformer提供基准参数选择的理论依据。 large language model
26 Learning Mixture Density via Natural Gradient Expectation Maximization 提出基于自然梯度期望最大化的混合密度网络训练方法,加速收敛并避免模式崩塌。 multimodal
27 Deep Bootstrap 提出基于条件扩散模型的深度Bootstrap框架,用于非参数回归。 multimodal
28 Gauss-Newton Unlearning for the LLM Era 提出K-FADE:基于Gauss-Newton的LLM高效可维护的不可学习方法 large language model
29 Predictive-State Communication: Innovation Coding and Reconciliation under Delay 提出预测状态通信,利用预测模型减少通信开销并容忍延迟 large language model
30 Constructing Industrial-Scale Optimization Modeling Benchmark 提出MIPLIB-NL:一个工业级优化建模基准,用于评估LLM在自然语言到优化公式转换中的能力。 large language model
31 QTALE: Quantization-Robust Token-Adaptive Layer Execution for LLMs QTALE:面向LLM的量化鲁棒Token自适应层执行框架 large language model
32 LightGTS-Cov: Covariate-Enhanced Time Series Forecasting 提出LightGTS-Cov以解决时间序列预测中的协变量问题 foundation model
33 Modular Multi-Task Learning for Chemical Reaction Prediction LoRA助力化学反应预测,实现参数高效的多任务学习 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
34 Motion Capture is Not the Target Domain: Scaling Synthetic Data for Learning Motion Representations 利用合成数据扩展运动表征学习,解决可穿戴设备人体活动识别问题 sim-to-real human motion motion representation
35 Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking 提出Kalman线性注意力(KLA),通过并行贝叶斯滤波实现高效语言建模和状态跟踪。 manipulation Mamba SSM

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
36 Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise 提出GRAB-MDM,解决高维噪声下多视角流形学习的鲁棒融合问题 MDM

⬅️ 返回 cs.LG 首页 · 🏠 返回主页