cs.LG(2026-01-29)

📊 共 48 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (25 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (22 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (25 篇)

#题目一句话要点标签🔗
1 Robust Multimodal Representation Learning in Healthcare 提出双流特征解耦框架,解决医疗多模态表征学习中的偏差问题 representation learning multimodal
2 Factored Causal Representation Learning for Robust Reward Modeling in RLHF 提出分解式因果表示学习,增强RLHF中奖励模型的鲁棒性 reinforcement learning RLHF representation learning
3 Heterogeneous Vertiport Selection Optimization for On-Demand Air Taxi Services: A Deep Reinforcement Learning Approach 提出基于深度强化学习的异构垂直起降场选择优化方法,提升按需空中出租车服务效率。 reinforcement learning deep reinforcement learning multimodal
4 Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models 提出视觉解耦扩散自编码器以解决基础模型的反事实生成问题 distillation foundation model
5 Rethinking Federated Graph Foundation Models: A Graph-Language Alignment-based Approach 提出FedGALA框架,通过图文对齐解决联邦图基础模型中的知识损失与异构问题。 contrastive learning foundation model
6 NetMamba+: A Framework of Pre-trained Models for Efficient and Accurate Network Traffic Classification NetMamba+:用于高效准确网络流量分类的预训练模型框架 Mamba multimodal
7 Expected Return Causes Outcome-Level Mode Collapse in Reinforcement Learning and How to Fix It with Inverse Probability Scaling 提出逆概率缩放的GRPO算法,解决强化学习中期望回报导致的模式崩塌问题 reinforcement learning multimodal
8 Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning 提出难度感知强化学习DiPO,缓解大型推理模型中的过度思考问题 reinforcement learning chain-of-thought
9 READY: Reward Discovery for Meta-Black-Box Optimization READY:基于奖励发现的元黑盒优化方法,利用LLM自动设计奖励函数。 reinforcement learning reward design large language model
10 When does predictive inverse dynamics outperform behavior cloning? 提出预测逆动力学模型,在模仿学习中实现更优的偏差-方差权衡 imitation learning behavior cloning
11 The Surprising Difficulty of Search in Model-Based Reinforcement Learning 模型预测控制中搜索并非万能:缓解分布偏移比提高模型精度更重要 reinforcement learning model-based RL
12 Prior-Informed Flow Matching for Graph Reconstruction 提出Prior-Informed Flow Matching (PIFM)用于图重建,提升重建精度。 flow matching
13 Negatives-Dominant Contrastive Learning for Generalization in Imbalanced Domains 提出负样本主导的对比学习方法,解决不平衡域泛化问题。 contrastive learning
14 Constrained Meta Reinforcement Learning with Provable Test-Time Safety 提出可验证测试时安全性的约束元强化学习算法 reinforcement learning
15 Curriculum Learning for LLM Pretraining: An Analysis of Learning Dynamics 课程学习提升LLM预训练稳定性,通过控制梯度方差优化模型 curriculum learning
16 Epistemic Uncertainty Quantification for Pre-trained VLMs via Riemannian Flow Matching 提出REPVLM,通过黎曼流匹配量化预训练VLM的认知不确定性 flow matching
17 Generative Design of Ship Propellers using Conditional Flow Matching 利用条件流匹配生成式设计船用螺旋桨 flow matching
18 Reinforcement Learning for Adaptive Composition of Quantum Circuit Optimisation Passes 提出基于强化学习的量子电路优化Pass自适应组合方法 reinforcement learning
19 Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening 提出可扩展的Power Sampling方法,通过分布锐化实现LLM高效无训练推理 reinforcement learning large language model
20 Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning 提出基于局部奖励和依赖图的MARL方法,显式解决多智能体信用分配问题。 reinforcement learning
21 HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing 提出HER框架以解决LLM角色扮演中的认知模拟问题 reinforcement learning
22 Grounding and Enhancing Informativeness and Utility in Dataset Distillation 提出InfoUtil框架,通过信息量和效用最大化实现数据集蒸馏性能提升 distillation
23 Physics-Guided Tiny-Mamba Transformer for Reliability-Aware Early Fault Warning 提出物理引导的Tiny-Mamba Transformer以解决旋转机械早期故障预警问题 Mamba
24 Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification LENS:通过指令净化进行强化学习推理,提升LLM在复杂任务中的探索效率和训练稳定性。 reinforcement learning
25 Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks 提出信号自适应信任域(SATR)优化RSNN,提升高维强化学习控制性能。 reinforcement learning PPO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)

#题目一句话要点标签🔗
26 Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning 提出视觉引导的关键Token正则化方法ViKeR,用于多模态大语言模型的可控遗忘 large language model multimodal
27 Making Foundation Models Probabilistic via Singular Value Ensembles 提出奇异值集成(SVE),通过高效参数调整实现概率化Foundation Model,提升不确定性量化。 foundation model
28 Per-parameter Task Arithmetic for Unlearning in Large Language Models 提出PerTA,通过参数级任务算术提升大语言模型的可控遗忘能力 large language model
29 Embracing Aleatoric Uncertainty in Medical Multimodal Learning with Missing Modalities 提出AUM框架,通过建模不确定性解决医学多模态学习中模态缺失问题 multimodal
30 A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth 提出Judge-Aware排序框架,解决LLM评估中评判者可靠性差异问题 large language model
31 LLM4Fluid: Large Language Models as Generalizable Neural Solvers for Fluid Dynamics LLM4Fluid:利用大语言模型作为流体动力学通用神经求解器 large language model
32 LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models LAMP:面向大语言模型的Look-Ahead混合精度推理 large language model
33 More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD) 提出Reset-and-Discard方法,提升固定预算下大语言模型推理的覆盖率。 large language model
34 Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning 研究代码复杂度对LLM推理能力的影响,提出数据中心的代码选择策略。 large language model chain-of-thought
35 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation ConceptMoE:自适应token概念压缩实现隐式计算分配,提升LLM效率与性能。 large language model multimodal
36 From Consistency to Complementarity: Aligned and Disentangled Multi-modal Learning for Time Series Understanding and Reasoning MADI:对齐解耦多模态学习,用于时间序列理解与推理 large language model
37 Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient Inference 提出LLM Shepherding框架,通过少量LLM提示指导SLM,实现成本效益更高的推理。 large language model
38 Value-Based Pre-Training with Downstream Feedback V-Pretraining:利用下游反馈指导预训练,提升模型下游任务性能 foundation model
39 TBDFiltering: Sample-Efficient Tree-Based Data Filtering 提出TBDFiltering,一种基于树结构的文本数据高效过滤方法,提升LLM训练数据质量。 large language model
40 DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training DASH:用于高吞吐可复现LLM训练的确定性注意力调度 large language model
41 Nonparametric LLM Evaluation from Preference Data 提出DMLEval,一种基于偏好数据的非参数化LLM评估框架 large language model
42 Effective LoRA Adapter Routing using Task Representations LORAUTER:利用任务表征实现高效的LoRA适配器路由 large language model
43 Knowledge Vector Weakening: Efficient Training-free Unlearning for Large Vision-Language Models 提出知识向量弱化(KVW),实现大型视觉-语言模型的免训练高效卸载学习 multimodal
44 Age Matters: Analyzing Age-Related Discussions in App Reviews 分析应用评论中与年龄相关的讨论,助力开发者构建更具包容性的应用 large language model
45 Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity 基于梯度相似性的协同神经学习,缓解大型语言模型中的灾难性遗忘 large language model
46 Accurate Network Traffic Matrix Prediction via LEAD: an LLM-Enhanced Adapter-Based Conditional Diffusion Model LEAD:一种LLM增强的、基于Adapter的条件扩散模型,用于精确的网络流量矩阵预测 large language model
47 Statsformer: Validated Ensemble Learning with LLM-Derived Semantic Priors Statsformer:利用LLM语义先验的验证式集成学习框架 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
48 Sim-MSTNet: sim2real based Multi-task SpatioTemporal Network Traffic Forecasting 提出Sim-MSTNet,利用Sim2Real解决网络流量预测中数据稀缺和多任务学习难题。 sim2real domain randomization spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页