cs.LG(2026-02-28)

📊 共 74 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (40) 支柱九:具身大模型 (Embodied Foundation Models) (28) 支柱一:机器人控制 (Robot Control) (3) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (40 篇)

#题目一句话要点标签🔗
1 Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials Zatom-1:用于3D分子和材料的多模态流动基础模型,统一生成与预测任务。 flow matching foundation model multimodal
2 Understanding protein function with a multimodal retrieval-augmented foundation model 提出PoET-2:一种多模态检索增强蛋白质基础模型,用于提升蛋白质功能理解。 representation learning foundation model multimodal
3 $ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models 提出$ϕ$-DPO框架以解决大规模多模态模型中的公平性问题 DPO direct preference optimization multimodal
4 Reinforcement-aware Knowledge Distillation for LLM Reasoning 提出RLAD:一种强化学习感知的知识蒸馏方法,用于提升LLM推理能力。 reinforcement learning PPO teacher-student
5 Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing Decision MetaMamba:异构序列混合增强离线强化学习中的选择性SSM offline RL Mamba SSM
6 Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory 利用随机矩阵理论进行谱分析,提升大语言模型的可靠性和效率 distillation large language model
7 Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection 提出基于对抗逆强化学习的机械故障检测方法,无需故障标签。 reinforcement learning inverse reinforcement learning
8 Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning 提出难度感知熵正则化方法,提升LLM推理效率并保持精度。 reinforcement learning large language model chain-of-thought
9 MetaOthello: A Controlled Study of Multiple World Models in Transformers MetaOthello:研究Transformer中多个世界模型的受控实验平台 world model foundation model
10 PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA 提出PSQE,通过增强伪种子质量提升无监督多模态实体对齐性能 contrastive learning large language model multimodal
11 Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits 提出针对离线策略学习的$f$-散度正则化分析方法 reinforcement learning policy learning offline reinforcement learning
12 UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs 提出UniQL框架以解决边缘设备上大语言模型的资源限制问题 Mamba SSM state space model
13 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation 提出广义On-Policy蒸馏框架G-OPD,通过奖励外推提升学生模型性能,甚至超越教师模型。 reinforcement learning teacher-student distillation
14 Multilingual Safety Alignment Via Sparse Weight Editing 提出基于稀疏权重编辑的多语言安全对齐方法,解决低资源语言安全防护不足问题。 reinforcement learning RLHF large language model
15 Regularized Online RLHF with Generalized Bilinear Preferences 提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题 preference learning RLHF
16 Soft Sequence Policy Optimization 提出软序列策略优化(SSPO),提升LLM在数学推理任务中的训练稳定性和性能。 reinforcement learning PPO large language model
17 Entropy-Controlled Flow Matching 提出熵控制流匹配方法以解决信息几何问题 flow matching
18 Code World Models for Parameter Control in Evolutionary Algorithms 利用LLM构建代码世界模型,实现进化算法参数自适应控制 world model
19 When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals 提出基于能量的生理信号概念漂移理论与正则化方法,提升心电信号模型稳定性。 representation learning multimodal
20 UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs UpSkill:基于互信息技能学习提升LLM在结构化响应中的多样性 reinforcement learning large language model
21 Transformers converge to invariant algorithmic cores 揭示Transformer不变算法核心:跨训练和尺度共享的低维结构 predictive model large language model
22 Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning 提出GeoDPO,通过翻译器引导的强化学习提升视觉语言模型中的几何感知能力 reinforcement learning
23 Multi-agent imitation learning with function approximation: Linear Markov games and beyond 针对线性马尔可夫博弈,提出基于函数逼近的多智能体模仿学习方法 imitation learning
24 Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks 提出层级群组策略优化(HGPO)以解决长时程Agent任务中的上下文不一致问题。 reinforcement learning large language model
25 Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization 提出EMPO$^2$,结合记忆与混合策略优化,提升LLM Agent探索能力与泛化性 reinforcement learning large language model
26 EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning EvolveGen:基于强化学习的硬件模型检测算法级基准生成框架 reinforcement learning
27 RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors? RL-Obfuscation:利用强化学习使语言模型逃避潜在空间监控 reinforcement learning large language model
28 Fast and Flexible Probabilistic Forecasting of Dynamical Systems using Flow Matching and Physical Perturbation 提出基于流匹配与物理扰动的动态系统快速概率预测方法 flow matching
29 Statistical Advantage of Softmax Attention: Insights from Single-Location Regression 单位置回归任务揭示Softmax注意力机制的统计优势 linear attention large language model
30 Simplex-to-Euclidean Bijections for Categorical Flow Matching 提出基于单纯形-欧几里得空间双射的分类流匹配方法,用于学习单纯形上的概率分布。 flow matching
31 On the Equivalence of Random Network Distillation, Deep Ensembles, and Bayesian Inference 揭示随机网络蒸馏、深度集成和贝叶斯推断的等价性,用于高效不确定性量化。 distillation
32 AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression AngelSlim:腾讯混元团队推出的大模型压缩工具包,提升效率与可及性 distillation multimodal
33 On the Interpolation Error of Nonlinear Attention versus Linear Regression 研究表明非线性Attention在高维情形下插值误差通常大于线性回归,但结构化信号可缩小差距。 linear attention
34 One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow 提出基于自蒸馏和确定性流的单步扩散采样器,加速采样并稳定证据估计。 distillation
35 Autoregressive Visual Decoding from EEG Signals 提出AVDE:一种轻量高效的脑电信号自回归视觉解码框架,用于脑机接口应用。 contrastive learning VQ-VAE
36 Prediction of Diffusion Coefficients in Mixtures with Tensor Completion 提出混合张量补全方法,结合贝叶斯框架与主动学习,提升混合物扩散系数预测精度。 predictive model PULSE
37 Aligning Few-Step Diffusion Models with Dense Reward Difference Learning 提出SDPO,通过密集奖励差异学习对齐少步扩散模型与下游目标 reinforcement learning diffusion policy
38 WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks WebGym:构建大规模真实Web环境,提升视觉Web代理任务性能 reinforcement learning policy learning
39 Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks 通过激活子空间瓶颈提升状态空间模型的可解释性与可操控性 Mamba SSM
40 Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability 提出残差Koopman谱分析方法,用于预测和预防Transformer训练中的不稳定性 Mamba SSM

🔬 支柱九:具身大模型 (Embodied Foundation Models) (28 篇)

#题目一句话要点标签🔗
41 BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning 提出BrepCoder,用于多任务B-rep推理的统一多模态大语言模型 large language model multimodal
42 RETLLM: Training and Data-Free MLLMs for Multimodal Information Retrieval 提出RetLLM,一种无需训练和数据的多模态信息检索MLLM框架 large language model multimodal
43 Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion 融合机器学习集成模型与大语言模型,利用投票融合提升心脏病预测精度。 large language model
44 Global River Forecasting with a Topology-Informed AI Foundation Model 提出GraphRiverCast,利用拓扑信息AI基础模型实现全局河流预测 foundation model
45 RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format RAIN-Merging:一种无梯度方法,增强大模型指令跟随能力并保留推理格式 instruction following
46 InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models InnerQ:硬件感知、免调优的KV缓存量化方案,加速大语言模型推理。 large language model
47 Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models 提出DiSP框架,缓解多模态扩散语言模型中的后门攻击 multimodal
48 What Topological and Geometric Structure Do Biological Foundation Models Learn? Evidence from 141 Hypotheses 通过大规模假设筛选,揭示生物基础模型学习到的拓扑和几何结构 foundation model
49 Large Language Model Compression with Global Rank and Sparsity Optimization 提出全局秩和稀疏优化的大语言模型压缩方法,提升压缩性能。 large language model
50 SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress SIGMA:阿里巴巴AliExpress的语义驱动指令生成式多任务推荐系统 large language model instruction following
51 Predicting LLM Reasoning Performance with Small Proxy Model 提出rBridge,利用小规模代理模型预测大规模LLM的推理性能,降低数据集优化成本。 large language model zero-shot transfer
52 Sustainable LLM Inference using Context-Aware Model Switching 提出上下文感知模型切换以解决大型语言模型能耗问题 large language model
53 Support Tokens, Stability Margins, and a New Foundation for Robust LLMs 提出基于支持Token和稳定裕度的鲁棒LLM新框架,提升模型稳定性。 foundation model
54 OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data OmniZip:学习一种统一轻量化的多模态数据无损压缩器 large language model
55 Manifold of Failure: Behavioral Attraction Basins in Language Models 提出基于质量多样性搜索的框架,系统性地绘制大语言模型失效域流形。 large language model
56 pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training pQuant:通过解耦线性量化感知训练实现高效的低比特语言模型 large language model
57 Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA 提出语义管道预测(STP),利用JEPA提升LLM数据效率,突破缩放定律限制。 large language model
58 Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement LITE:通过增强平坦方向动态加速LLM预训练 large language model
59 Physics-informed neural particle flow for the Bayesian update step 提出物理信息神经粒子流,用于贝叶斯更新中的高效概率密度传输 multimodal
60 Beyond Attribution: Unified Concept-Level Explanations 提出UnCLE框架以解决概念级解释不足问题 multimodal
61 Skewed Score: A statistical framework to assess autograders 提出Skewed Score框架,用于评估和诊断LLM自动评分器的偏差。 large language model
62 Beyond Linear Probes: Dynamic Safety Monitoring for Language Models 提出截断多项式分类器,实现大语言模型动态安全监控 large language model
63 Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification 提出证据不确定性量化方法,用于检测大型视觉-语言模型的不良行为 multimodal
64 Versor: A Geometric Sequence Architecture Versor:一种基于共形几何代数的序列架构,提升泛化能力与效率。 multimodal
65 Symmetry in language statistics shapes the geometry of model representations 揭示语言模型表征几何结构的对称性起源,源于语言统计中的平移对称性。 large language model
66 ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset ÜberWeb:通过多语言数据清洗,构建20万亿token数据集,提升多语言模型性能。 foundation model
67 Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training 揭示Pass@k优化降低Pass@1的现象:LLM后训练中的Prompt干扰 large language model
68 Muon+: Towards Better Muon via One Additional Normalization Step Muon+:通过额外的归一化步骤提升Muon优化器性能 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
69 To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning 提出AOT对抗训练框架,提升多模态大语言模型在复杂视觉场景下的感知鲁棒性。 manipulation reinforcement learning large language model
70 Physics Informed Viscous Value Representations 提出基于粘性值表示的物理信息强化学习方法,提升复杂环境下的价值估计。 manipulation reinforcement learning geometric consistency
71 Moral Preferences of LLMs Under Directed Contextual Influence 提出定向情境影响下的LLM道德偏好评估框架,揭示模型决策易受情境引导且存在反噬现象。 manipulation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
72 Differentiable Zero-One Loss via Hypersimplex Projections 提出基于超单纯形投影的可微零一损失,提升大批量训练泛化性。 geometric consistency

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
73 Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies 提出基于KL正则化MDP的掩码扩散模型优化策略,提升序列生成质量 MDM

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
74 Positional-aware Spatio-Temporal Network for Large-Scale Traffic Prediction 提出位置感知时空网络PASTN,用于大规模交通预测。 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页