cs.LG(2026-02-26)

📊 共 33 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (19 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗3) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (19 篇)

#题目一句话要点标签🔗
1 $φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models 提出$φ$-DPO,解决大型多模态模型持续学习中的公平性问题。 DPO direct preference optimization multimodal
2 MetaOthello: A Controlled Study of Multiple World Models in Transformers MetaOthello:研究Transformer中多个世界模型的受控实验 world model foundation model
3 PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA 提出PSQE,通过增强伪种子质量提升无监督多模态实体对齐性能。 contrastive learning large language model multimodal
4 Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning 提出难度感知熵正则化方法CEEH,提升LLM推理效率并保持精度。 reinforcement learning large language model chain-of-thought
5 Regularized Online RLHF with Generalized Bilinear Preferences 提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题 preference learning RLHF
6 Multilingual Safety Alignment Via Sparse Weight Editing 提出稀疏权重编辑方法以解决多语言安全对齐问题 reinforcement learning RLHF large language model
7 Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization 提出EMPO$^2$,通过混合策略优化和记忆增强提升LLM Agent的探索能力 reinforcement learning large language model
8 Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks 提出层级分组策略优化(HGPO)以解决长时程Agent任务中的上下文不一致问题。 reinforcement learning large language model
9 Multi-agent imitation learning with function approximation: Linear Markov games and beyond 针对线性马尔可夫博弈,提出基于函数逼近的多智能体模仿学习方法 imitation learning
10 Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning 提出GeoDPO,通过翻译器引导的强化学习提升VLM的几何感知能力 reinforcement learning
11 EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning EvolveGen:利用强化学习生成硬件模型检测算法级基准测试,提升验证效率。 reinforcement learning
12 Transformers converge to invariant algorithmic cores 揭示Transformer不变算法核心:跨训练和尺度共享的低维结构 predictive model large language model
13 Autoregressive Visual Decoding from EEG Signals 提出AVDE:一种轻量高效的自回归模型,用于脑电信号到视觉信息的解码。 contrastive learning VQ-VAE
14 Prediction of Diffusion Coefficients in Mixtures with Tensor Completion 提出混合张量补全方法,结合贝叶斯框架和主动学习,提升混合物扩散系数预测精度。 predictive model PULSE
15 Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability 提出残差Koopman谱分析(RKSP)以预测和预防Transformer训练不稳定 Mamba SSM
16 Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks 通过激活子空间瓶颈解释和引导状态空间模型 Mamba SSM
17 Component Centric Placement Using Deep Reinforcement Learning 提出基于深度强化学习的元件中心PCB布局方法,优化元件布局。 reinforcement learning deep reinforcement learning
18 Regularized Online RLHF with Generalized Bilinear Preferences 提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题 preference learning RLHF
19 Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning 提出人类监督信息瓶颈理论,解释并缓解人机对齐中的误差上限问题 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
20 InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models InnerQ:一种硬件感知的、免调优的KV缓存量化方法,加速大语言模型推理。 large language model
21 RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format RAIN-Merging:一种无梯度方法,增强大模型指令跟随能力并保留推理格式 instruction following
22 SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress SIGMA:阿里巴巴AliExpress的语义驱动指令生成式多任务推荐系统 large language model instruction following
23 Physics-informed neural particle flow for the Bayesian update step 提出物理信息神经粒子流,用于贝叶斯更新中的高效概率密度传输 multimodal
24 Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement LITE:通过增强平坦方向动态加速LLM预训练 large language model
25 Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA 提出语义管道预测以提升大语言模型的数据效率 large language model
26 pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training pQuant:通过解耦线性量化感知训练实现高效的低比特语言模型 large language model
27 Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents Rudder:利用LLM Agent在分布式GNN训练中实现自适应预取 large language model
28 Uncertainty-aware Language Guidance for Concept Bottleneck Models 提出不确定性感知的概念瓶颈模型,利用语言模型指导并量化概念不确定性。 large language model
29 U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation 提出U-CAN框架,通过对比衰减LoRA适配器权重实现生成式推荐中的高效可控遗忘。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
30 Physics Informed Viscous Value Representations 提出基于物理信息的粘性值表示,提升离线目标条件强化学习的泛化性。 manipulation reinforcement learning geometric consistency
31 Moral Preferences of LLMs Under Directed Contextual Influence 提出一种评估LLM在定向情境影响下道德偏好的新方法 manipulation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
32 Differentiable Zero-One Loss via Hypersimplex Projections 提出基于超单纯形投影的可微零一损失,提升大批量训练泛化性。 geometric consistency

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
33 Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG 提出Brain-OF,首个用于fMRI、EEG和MEG的通用脑功能基础模型 spatiotemporal foundation model multimodal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页