cs.LG(2026-05-14)

📊 共 70 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (36 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (29 🔗6) 支柱一:机器人控制 (Robot Control) (3) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (36 篇)

#题目一句话要点标签🔗
1 Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models Octopus:多模态大语言模型中基于无历史梯度正交的持续学习框架 large language model multimodal
2 GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation GeoViSTA:用于多模态环境表征的地理空间视觉-表格Transformer foundation model multimodal
3 Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing 提出Shodh-MoE,通过稀疏混合专家路由解决多物理场建模中的负迁移问题。 foundation model
4 NeuroAtlas: Benchmarking Foundation Models for Clinical EEG and Brain-Computer Interfaces NeuroAtlas:临床脑电和脑机接口基础模型的大规模基准测试 foundation model
5 DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System DT-Transformer:基于大规模真实健康系统数据的疾病轨迹预测基础模型 foundation model
6 GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning GPart:通过全局参数划分实现端到端等距微调,提升参数效率。 large language model
7 TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability 任务感知剪枝提升模型泛化能力,改善OOD数据表现 large language model
8 When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition 提出QAOD框架,通过问题-答案正交分解检测大语言模型中的幻觉问题。 large language model
9 Tadpole: Autoencoders as Foundation Models for 3D PDEs with Online Learning Tadpole:基于自编码器的三维偏微分方程在线学习基础模型 foundation model
10 Causal Foundation Models with Continuous Treatments 提出首个连续性干预因果基础模型,用于预测各种未见任务中的因果效应。 foundation model
11 GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding 提出GQLA以解决硬件适应性大语言模型解码问题 large language model
12 A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models 提出一种硬件感知的逐层量化方法SOP,用于大语言模型的后训练量化。 large language model
13 A Mutual Information Lower Bound for Multimodal Regression Active Learning 提出MI-LB主动学习方法,解决多模态回归中不确定性采样问题 multimodal
14 GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning GFMate:通过测试时Prompt调优增强图基础模型能力 foundation model
15 Exploring Geographic Relative Space in Large Language Models through Activation Patching 利用激活修补探索大语言模型中的地理相对空间处理 large language model
16 AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction 提出AIM-DDI,解决药物相互作用预测中多模态融合的架构依赖问题。 multimodal
17 $f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data 提出f-Trajectory Balance损失族,用于优化GFlowNets、生成模型和LLM large language model
18 Margin-Adaptive Confidence Ranking for Reliable LLM Judgement 提出基于边际自适应置信度排序的可靠LLM判断方法 large language model
19 LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling 提出LPDS框架,通过逻辑保持难度缩放评估LLM的鲁棒性 large language model
20 Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory SeqMem-Eval:用于诊断评估LLM序列记忆的细粒度框架 large language model
21 GQA-μP: The maximal parameterization update for grouped query attention 提出GQA-μP:分组查询注意力机制的最大参数化更新方法,实现超参数迁移。 large language model
22 An Interpretable Latency Model for Speculative Decoding in LLM Serving 提出一种可解释的延迟模型,用于分析LLM服务中推测解码的性能瓶颈。 large language model
23 PDRNN: Modular Data-driven Pedestrian Dead Reckoning on Loosely Coupled Radio- and Inertial-Signalstreams 提出PDRNN:一种模块化的数据驱动行人航位推算系统,融合无线电和惯性信号。 multimodal
24 InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting InfoSFT:通过信息感知的Token权重学习,提升LLM泛化能力并减少遗忘 chain-of-thought
25 BCI-Based Assessment of Ocular Response Time Using Dynamic Time Warping Leveraging an RDWT-Driven Deep Neural Framework 提出基于RDWT驱动的深度神经网络框架,结合动态时间规整,用于脑外伤患者眼动反应时评估。 multimodal
26 Selective Safety Steering via Value-Filtered Decoding 提出基于价值过滤解码的选择性安全引导方法,提升LLM安全性并减少不必要干预。 large language model
27 IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments IsoNet:面向复杂声学环境的空间感知音视频目标语音提取 multimodal
28 The Rate-Distortion-Polysemanticity Tradeoff in SAEs 提出SAE中的率-失真-多义性权衡,揭示单义性限制对性能的影响 large language model
29 Silent Collapse in Recursive Learning Systems 揭示递归学习系统中“静默崩溃”现象,提出MTR框架实现早期预警与主动预防。 large language model
30 Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits 提出基于纯探索Bandit算法的高效多目标Prompt优化方法 large language model
31 Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows Lang2MLIP:利用自主Agent工作流实现端到端语言驱动的机器学习原子间势开发 large language model
32 Test-Time Learning with an Evolving Library EvoLib:提出一种基于演化知识库的测试时学习框架,无需参数更新即可提升大语言模型性能。 large language model
33 Exemplar Partitioning for Mechanistic Interpretability 提出示例划分方法以实现机制可解释性 large language model
34 Language-Induced Priors for Domain Adaptation 提出语言诱导先验(LIP)框架,解决目标域数据稀缺时的领域自适应问题。 large language model
35 Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology 揭示Transformer残差流动态特性:耦合谱几何与网络拓扑结构 large language model
36 EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization EnergyLens:面向多GPU LLM 推理优化,实现预测性能耗能感知探索 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (29 篇)

#题目一句话要点标签🔗
37 Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance FEST:基于少量样本引导的可验证奖励强化学习,提升样本效率 reinforcement learning large language model chain-of-thought
38 Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy ActFocus:通过Token级能量分析解决Agentic强化学习中的动作瓶颈问题 reinforcement learning PPO large language model
39 DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models DiffusionOPD:扩散模型中基于在线策略蒸馏的多任务统一框架 reinforcement learning PPO distillation
40 Self-Distilled Agentic Reinforcement Learning 提出SDAR,通过自蒸馏提升LLM Agent在复杂交互任务中的强化学习效果 reinforcement learning distillation
41 Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions 提出认知不确定性引导的知识蒸馏框架,用于提升学生错误概念分类的准确性。 distillation
42 PreFT: Prefill-only finetuning for efficient inference 提出PreFT以解决多适配器服务效率问题 reinforcement learning large language model
43 Not All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communication 提出语义重要性感知的星座图设计,提升语义通信系统在信道干扰下的鲁棒性。 reinforcement learning deep reinforcement learning
44 Toward World Modeling of Physiological Signals with Chaos-Theoretic Balancing and Latent Dynamics NormWear-2:利用混沌理论平衡和潜在动态建模生理信号,实现多尺度预测。 world model world models latent dynamics
45 DRL-STAF: A Deep Reinforcement Learning Framework for State-Aware Forecasting of Complex Multivariate Hidden Markov Processes DRL-STAF:用于复杂多元隐马尔可夫过程状态感知预测的深度强化学习框架 reinforcement learning deep reinforcement learning DRL
46 Controllable Molecular Generative Foundation Models CoMole:可控分子生成基础模型,用于异构设计任务。 reinforcement learning MAE foundation model
47 Peng's Q($λ$) for Conservative Value Estimation in Offline Reinforcement Learning 提出保守Peng's Q($λ$) (CPQL)算法,用于离线强化学习中的保守价值估计 reinforcement learning offline RL offline reinforcement learning
48 Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning 研究塑性干预对深度强化学习后门攻击的影响,提出SCC框架和检测指标。 reinforcement learning deep reinforcement learning DRL
49 GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero GRLO:探索从零开始在开放环境中实现通用强化学习 reinforcement learning RLHF large language model
50 Fast Rates for Inverse Reinforcement Learning 提出熵正则化的最小-最大逆强化学习以加速学习速率 reinforcement learning inverse reinforcement learning
51 Representation Without Reward: A JEPA Audit for LLM Fine-Tuning 通过JEPA审计评估LLM微调效果:表征与奖励的解耦研究 JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture
52 Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement Crys-JEPA:通过嵌入筛选和生成细化加速晶体发现 JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture
53 Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment 提出BBCritic,将GUI评判重构为连续语义对齐问题,显著提升GUI智能体的泛化能力。 contrastive learning affordance zero-shot transfer
54 Time-Varying Deep State Space Models for Sequences with Switching Dynamics 提出时变深度状态空间模型,用于处理具有切换动态的序列建模问题。 SSM state space model
55 Learning from Language Feedback via Variational Policy Distillation 提出变分策略蒸馏(VPD)框架,解决语言反馈强化学习中教师策略停滞问题。 reinforcement learning distillation
56 AudioMosaic: Contrastive Masked Audio Representation Learning AudioMosaic:基于对比学习和掩码的音频表征学习方法 representation learning contrastive learning
57 Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients 提出混合策略优化(HPO)算法,解决混合离散-连续动作空间中的强化学习问题。 reinforcement learning PPO differentiable simulation
58 Lagrangian Flow Matching: A Least-Action Framework for Principled Path Design Lagrangian Flow Matching:基于最小作用量原理的概率路径设计 flow matching
59 Training on Documents About Monitoring Leads to CoT Obfuscation 研究表明,模型通过学习监控文档可混淆CoT推理过程,逃避检测。 reinforcement learning chain-of-thought
60 Curriculum Learning of Physics-Informed Neural Networks based on Spatial Correlation 提出基于空间相关的课程学习PINN框架,提升偏微分方程求解精度。 curriculum learning
61 ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization 提出ROAD框架以解决离线到在线强化学习中的数据混合问题 reinforcement learning
62 MoRe: Modular Representations for Principled Continual Representation Learning on Squantial Data MoRe:通过模块化表示实现序列数据上的持续表示学习 representation learning
63 Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation 提出OPSA:通过在线自蒸馏减少LLM安全对齐中的安全税。 distillation
64 Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks 提出选择性对齐知识蒸馏(SeAl-KD)方法,提升脉冲神经网络(SNN)性能。 distillation
65 Quantum Advantage in Multi Agent Reinforcement Learning 基于量子纠缠的多智能体强化学习框架,实现超越经典极限的智能体协作 reinforcement learning

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
66 Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations Slot-MPC:基于对象中心表示和模型预测控制的目标条件机器人操作 manipulation MPC model predictive control
67 Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry 提出矩阵空间强化学习(MSRL),通过重用局部转移几何结构提升序贯决策中的泛化能力。 MPC reinforcement learning predictive model
68 Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling 提出DRATS算法,通过自适应任务采样解决多任务强化学习中的数据不平衡问题。 manipulation reinforcement learning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
69 Guided Diffusion Sampling for Precipitation Forecast Interventions 提出基于扩散模型引导采样的降水干预方法,实现极端降水事件的有效控制。 physically plausible

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
70 Privacy Evaluation of Generative Models for Trajectory Generation 评估轨迹生成模型隐私性:揭示生成模型在轨迹数据上的隐私风险 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页