cs.LG（2026-02-26）

📊 共 33 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (19 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (10 🔗3) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (19 篇)

#	题目	一句话要点	标签	🔗	⭐
1	$φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models	提出$φ$-DPO，解决大型多模态模型持续学习中的公平性问题。	DPO direct preference optimization multimodal
2	MetaOthello: A Controlled Study of Multiple World Models in Transformers	MetaOthello：研究Transformer中多个世界模型的受控实验	world model foundation model
3	PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA	提出PSQE，通过增强伪种子质量提升无监督多模态实体对齐性能。	contrastive learning large language model multimodal
4	Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning	提出难度感知熵正则化方法CEEH，提升LLM推理效率并保持精度。	reinforcement learning large language model chain-of-thought
5	Regularized Online RLHF with Generalized Bilinear Preferences	提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题	preference learning RLHF
6	Multilingual Safety Alignment Via Sparse Weight Editing	提出稀疏权重编辑方法以解决多语言安全对齐问题	reinforcement learning RLHF large language model
7	Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization	提出EMPO$^2$，通过混合策略优化和记忆增强提升LLM Agent的探索能力	reinforcement learning large language model
8	Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks	提出层级分组策略优化（HGPO）以解决长时程Agent任务中的上下文不一致问题。	reinforcement learning large language model	✅
9	Multi-agent imitation learning with function approximation: Linear Markov games and beyond	针对线性马尔可夫博弈，提出基于函数逼近的多智能体模仿学习方法	imitation learning
10	Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning	提出GeoDPO，通过翻译器引导的强化学习提升VLM的几何感知能力	reinforcement learning	✅
11	EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning	EvolveGen：利用强化学习生成硬件模型检测算法级基准测试，提升验证效率。	reinforcement learning
12	Transformers converge to invariant algorithmic cores	揭示Transformer不变算法核心：跨训练和尺度共享的低维结构	predictive model large language model
13	Autoregressive Visual Decoding from EEG Signals	提出AVDE：一种轻量高效的自回归模型，用于脑电信号到视觉信息的解码。	contrastive learning VQ-VAE
14	Prediction of Diffusion Coefficients in Mixtures with Tensor Completion	提出混合张量补全方法，结合贝叶斯框架和主动学习，提升混合物扩散系数预测精度。	predictive model PULSE
15	Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability	提出残差Koopman谱分析(RKSP)以预测和预防Transformer训练不稳定	Mamba SSM
16	Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks	通过激活子空间瓶颈解释和引导状态空间模型	Mamba SSM
17	Component Centric Placement Using Deep Reinforcement Learning	提出基于深度强化学习的元件中心PCB布局方法，优化元件布局。	reinforcement learning deep reinforcement learning
18	Regularized Online RLHF with Generalized Bilinear Preferences	提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题	preference learning RLHF
19	Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning	提出人类监督信息瓶颈理论，解释并缓解人机对齐中的误差上限问题	reinforcement learning large language model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
20	InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models	InnerQ：一种硬件感知的、免调优的KV缓存量化方法，加速大语言模型推理。	large language model
21	RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format	RAIN-Merging：一种无梯度方法，增强大模型指令跟随能力并保留推理格式	instruction following
22	SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress	SIGMA：阿里巴巴AliExpress的语义驱动指令生成式多任务推荐系统	large language model instruction following
23	Physics-informed neural particle flow for the Bayesian update step	提出物理信息神经粒子流，用于贝叶斯更新中的高效概率密度传输	multimodal
24	Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement	LITE：通过增强平坦方向动态加速LLM预训练	large language model	✅
25	Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA	提出语义管道预测以提升大语言模型的数据效率	large language model	✅
26	pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training	pQuant：通过解耦线性量化感知训练实现高效的低比特语言模型	large language model
27	Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents	Rudder：利用LLM Agent在分布式GNN训练中实现自适应预取	large language model	✅
28	Uncertainty-aware Language Guidance for Concept Bottleneck Models	提出不确定性感知的概念瓶颈模型，利用语言模型指导并量化概念不确定性。	large language model
29	U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation	提出U-CAN框架，通过对比衰减LoRA适配器权重实现生成式推荐中的高效可控遗忘。	large language model

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
30	Physics Informed Viscous Value Representations	提出基于物理信息的粘性值表示，提升离线目标条件强化学习的泛化性。	manipulation reinforcement learning geometric consistency	✅
31	Moral Preferences of LLMs Under Directed Contextual Influence	提出一种评估LLM在定向情境影响下道德偏好的新方法	manipulation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
32	Differentiable Zero-One Loss via Hypersimplex Projections	提出基于超单纯形投影的可微零一损失，提升大批量训练泛化性。	geometric consistency

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG	提出Brain-OF，首个用于fMRI、EEG和MEG的通用脑功能基础模型	spatiotemporal foundation model multimodal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页