cs.LG（2026-02-28）

📊 共 74 篇论文

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (40) 支柱九：具身大模型 (Embodied Foundation Models) (28) 支柱一：机器人控制 (Robot Control) (3) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (40 篇)

#	题目	一句话要点	标签
1	Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials	Zatom-1：用于3D分子和材料的多模态流动基础模型，统一生成与预测任务。	flow matching foundation model multimodal
2	Understanding protein function with a multimodal retrieval-augmented foundation model	提出PoET-2：一种多模态检索增强蛋白质基础模型，用于提升蛋白质功能理解。	representation learning foundation model multimodal
3	$ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models	提出$ϕ$-DPO框架以解决大规模多模态模型中的公平性问题	DPO direct preference optimization multimodal
4	Reinforcement-aware Knowledge Distillation for LLM Reasoning	提出RLAD：一种强化学习感知的知识蒸馏方法，用于提升LLM推理能力。	reinforcement learning PPO teacher-student
5	Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing	Decision MetaMamba：异构序列混合增强离线强化学习中的选择性SSM	offline RL Mamba SSM
6	Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory	利用随机矩阵理论进行谱分析，提升大语言模型的可靠性和效率	distillation large language model
7	Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection	提出基于对抗逆强化学习的机械故障检测方法，无需故障标签。	reinforcement learning inverse reinforcement learning
8	Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning	提出难度感知熵正则化方法，提升LLM推理效率并保持精度。	reinforcement learning large language model chain-of-thought
9	MetaOthello: A Controlled Study of Multiple World Models in Transformers	MetaOthello：研究Transformer中多个世界模型的受控实验平台	world model foundation model
10	PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA	提出PSQE，通过增强伪种子质量提升无监督多模态实体对齐性能	contrastive learning large language model multimodal
11	Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits	提出针对离线策略学习的$f$-散度正则化分析方法	reinforcement learning policy learning offline reinforcement learning
12	UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs	提出UniQL框架以解决边缘设备上大语言模型的资源限制问题	Mamba SSM state space model
13	Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation	提出广义On-Policy蒸馏框架G-OPD，通过奖励外推提升学生模型性能，甚至超越教师模型。	reinforcement learning teacher-student distillation
14	Multilingual Safety Alignment Via Sparse Weight Editing	提出基于稀疏权重编辑的多语言安全对齐方法，解决低资源语言安全防护不足问题。	reinforcement learning RLHF large language model
15	Regularized Online RLHF with Generalized Bilinear Preferences	提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题	preference learning RLHF
16	Soft Sequence Policy Optimization	提出软序列策略优化（SSPO），提升LLM在数学推理任务中的训练稳定性和性能。	reinforcement learning PPO large language model
17	Entropy-Controlled Flow Matching	提出熵控制流匹配方法以解决信息几何问题	flow matching
18	Code World Models for Parameter Control in Evolutionary Algorithms	利用LLM构建代码世界模型，实现进化算法参数自适应控制	world model
19	When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals	提出基于能量的生理信号概念漂移理论与正则化方法，提升心电信号模型稳定性。	representation learning multimodal
20	UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs	UpSkill：基于互信息技能学习提升LLM在结构化响应中的多样性	reinforcement learning large language model
21	Transformers converge to invariant algorithmic cores	揭示Transformer不变算法核心：跨训练和尺度共享的低维结构	predictive model large language model
22	Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning	提出GeoDPO，通过翻译器引导的强化学习提升视觉语言模型中的几何感知能力	reinforcement learning
23	Multi-agent imitation learning with function approximation: Linear Markov games and beyond	针对线性马尔可夫博弈，提出基于函数逼近的多智能体模仿学习方法	imitation learning
24	Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks	提出层级群组策略优化（HGPO）以解决长时程Agent任务中的上下文不一致问题。	reinforcement learning large language model
25	Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization	提出EMPO$^2$，结合记忆与混合策略优化，提升LLM Agent探索能力与泛化性	reinforcement learning large language model
26	EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning	EvolveGen：基于强化学习的硬件模型检测算法级基准生成框架	reinforcement learning
27	RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?	RL-Obfuscation：利用强化学习使语言模型逃避潜在空间监控	reinforcement learning large language model
28	Fast and Flexible Probabilistic Forecasting of Dynamical Systems using Flow Matching and Physical Perturbation	提出基于流匹配与物理扰动的动态系统快速概率预测方法	flow matching
29	Statistical Advantage of Softmax Attention: Insights from Single-Location Regression	单位置回归任务揭示Softmax注意力机制的统计优势	linear attention large language model
30	Simplex-to-Euclidean Bijections for Categorical Flow Matching	提出基于单纯形-欧几里得空间双射的分类流匹配方法，用于学习单纯形上的概率分布。	flow matching
31	On the Equivalence of Random Network Distillation, Deep Ensembles, and Bayesian Inference	揭示随机网络蒸馏、深度集成和贝叶斯推断的等价性，用于高效不确定性量化。	distillation
32	AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression	AngelSlim：腾讯混元团队推出的大模型压缩工具包，提升效率与可及性	distillation multimodal
33	On the Interpolation Error of Nonlinear Attention versus Linear Regression	研究表明非线性Attention在高维情形下插值误差通常大于线性回归，但结构化信号可缩小差距。	linear attention
34	One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow	提出基于自蒸馏和确定性流的单步扩散采样器，加速采样并稳定证据估计。	distillation
35	Autoregressive Visual Decoding from EEG Signals	提出AVDE：一种轻量高效的脑电信号自回归视觉解码框架，用于脑机接口应用。	contrastive learning VQ-VAE
36	Prediction of Diffusion Coefficients in Mixtures with Tensor Completion	提出混合张量补全方法，结合贝叶斯框架与主动学习，提升混合物扩散系数预测精度。	predictive model PULSE
37	Aligning Few-Step Diffusion Models with Dense Reward Difference Learning	提出SDPO，通过密集奖励差异学习对齐少步扩散模型与下游目标	reinforcement learning diffusion policy
38	WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks	WebGym：构建大规模真实Web环境，提升视觉Web代理任务性能	reinforcement learning policy learning
39	Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks	通过激活子空间瓶颈提升状态空间模型的可解释性与可操控性	Mamba SSM
40	Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability	提出残差Koopman谱分析方法，用于预测和预防Transformer训练中的不稳定性	Mamba SSM

🔬 支柱九：具身大模型 (Embodied Foundation Models) (28 篇)

#	题目	一句话要点	标签
41	BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning	提出BrepCoder，用于多任务B-rep推理的统一多模态大语言模型	large language model multimodal
42	RETLLM: Training and Data-Free MLLMs for Multimodal Information Retrieval	提出RetLLM，一种无需训练和数据的多模态信息检索MLLM框架	large language model multimodal
43	Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion	融合机器学习集成模型与大语言模型，利用投票融合提升心脏病预测精度。	large language model
44	Global River Forecasting with a Topology-Informed AI Foundation Model	提出GraphRiverCast，利用拓扑信息AI基础模型实现全局河流预测	foundation model
45	RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format	RAIN-Merging：一种无梯度方法，增强大模型指令跟随能力并保留推理格式	instruction following
46	InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models	InnerQ：硬件感知、免调优的KV缓存量化方案，加速大语言模型推理。	large language model
47	Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models	提出DiSP框架，缓解多模态扩散语言模型中的后门攻击	multimodal
48	What Topological and Geometric Structure Do Biological Foundation Models Learn? Evidence from 141 Hypotheses	通过大规模假设筛选，揭示生物基础模型学习到的拓扑和几何结构	foundation model
49	Large Language Model Compression with Global Rank and Sparsity Optimization	提出全局秩和稀疏优化的大语言模型压缩方法，提升压缩性能。	large language model
50	SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress	SIGMA：阿里巴巴AliExpress的语义驱动指令生成式多任务推荐系统	large language model instruction following
51	Predicting LLM Reasoning Performance with Small Proxy Model	提出rBridge，利用小规模代理模型预测大规模LLM的推理性能，降低数据集优化成本。	large language model zero-shot transfer
52	Sustainable LLM Inference using Context-Aware Model Switching	提出上下文感知模型切换以解决大型语言模型能耗问题	large language model
53	Support Tokens, Stability Margins, and a New Foundation for Robust LLMs	提出基于支持Token和稳定裕度的鲁棒LLM新框架，提升模型稳定性。	foundation model
54	OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data	OmniZip：学习一种统一轻量化的多模态数据无损压缩器	large language model
55	Manifold of Failure: Behavioral Attraction Basins in Language Models	提出基于质量多样性搜索的框架，系统性地绘制大语言模型失效域流形。	large language model
56	pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training	pQuant：通过解耦线性量化感知训练实现高效的低比特语言模型	large language model
57	Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA	提出语义管道预测（STP），利用JEPA提升LLM数据效率，突破缩放定律限制。	large language model
58	Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement	LITE：通过增强平坦方向动态加速LLM预训练	large language model
59	Physics-informed neural particle flow for the Bayesian update step	提出物理信息神经粒子流，用于贝叶斯更新中的高效概率密度传输	multimodal
60	Beyond Attribution: Unified Concept-Level Explanations	提出UnCLE框架以解决概念级解释不足问题	multimodal
61	Skewed Score: A statistical framework to assess autograders	提出Skewed Score框架，用于评估和诊断LLM自动评分器的偏差。	large language model
62	Beyond Linear Probes: Dynamic Safety Monitoring for Language Models	提出截断多项式分类器，实现大语言模型动态安全监控	large language model
63	Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification	提出证据不确定性量化方法，用于检测大型视觉-语言模型的不良行为	multimodal
64	Versor: A Geometric Sequence Architecture	Versor：一种基于共形几何代数的序列架构，提升泛化能力与效率。	multimodal
65	Symmetry in language statistics shapes the geometry of model representations	揭示语言模型表征几何结构的对称性起源，源于语言统计中的平移对称性。	large language model
66	ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset	ÜberWeb：通过多语言数据清洗，构建20万亿token数据集，提升多语言模型性能。	foundation model
67	Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training	揭示Pass@k优化降低Pass@1的现象：LLM后训练中的Prompt干扰	large language model
68	Muon+: Towards Better Muon via One Additional Normalization Step	Muon+：通过额外的归一化步骤提升Muon优化器性能	large language model

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签
69	To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning	提出AOT对抗训练框架，提升多模态大语言模型在复杂视觉场景下的感知鲁棒性。	manipulation reinforcement learning large language model
70	Physics Informed Viscous Value Representations	提出基于粘性值表示的物理信息强化学习方法，提升复杂环境下的价值估计。	manipulation reinforcement learning geometric consistency
71	Moral Preferences of LLMs Under Directed Contextual Influence	提出定向情境影响下的LLM道德偏好评估框架，揭示模型决策易受情境引导且存在反噬现象。	manipulation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
72	Differentiable Zero-One Loss via Hypersimplex Projections	提出基于超单纯形投影的可微零一损失，提升大批量训练泛化性。	geometric consistency

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
73	Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies	提出基于KL正则化MDP的掩码扩散模型优化策略，提升序列生成质量	MDM

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
74	Positional-aware Spatio-Temporal Network for Large-Scale Traffic Prediction	提出位置感知时空网络PASTN，用于大规模交通预测。	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-02-28）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (40 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (28 篇)

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理