cs.LG（2026-05-11）

📊 共 44 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (23 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (20 🔗1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (23 篇)

#	题目	一句话要点	标签	🔗
1	MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image	提出MulTaBench基准以解决多模态表格学习中非结构化数据表征对齐不足的问题	foundation model multimodal
2	V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction	提出大规模金融破产预测基准V4FinBench，评估表格基础模型与大语言模型性能	foundation model
3	The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies	揭示思维链忠实度评估中的格式混淆问题：末尾答案偏见对模型推理分析的影响	chain-of-thought
4	jNO: A JAX Library for Neural Operator and Foundation Model Training	提出JAX原生库jNO，实现神经算子与PDE基础模型训练的统一框架	foundation model	✅
5	DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures	提出DynaMiCS动态混合优化器，通过约束优化实现大模型多领域微调中的性能平衡	large language model instruction following
6	What should post-training optimize? A test-time scaling law perspective	提出尾部外推估计器（TEA），解决测试时大规模采样与训练时有限算力间的失配问题	large language model instruction following
7	Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration	提出SPEX以打破奖励瓶颈加速树状思维推理	large language model chain-of-thought
8	Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime	量化平均场Transformer在低温极限下的浓度现象，揭示Token分布的演化规律	foundation model
9	LoKA: Low-precision Kernel Applications for Recommendation Models At Scale	提出LoKA框架，通过系统与模型协同设计实现大规模推荐模型的高效FP8训练	large language model
10	Benchmarking Sensor-Fault Robustness in Forecasting	提出SensorFault-Bench基准测试协议，量化评估信息物理系统（CPS）预测模型的传感器故障鲁棒性	foundation model
11	LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges	系统性综述大语言模型在硬件设计与安全中的应用：机遇、挑战与防御策略	large language model
12	Factual recall in linear associative memories: sharp asymptotics and mechanistic insights	利用统计物理学揭示线性联想记忆的事实存储极限与机制	large language model
13	ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs	提出ConQuR：一种基于优化旋转的角对齐激活量化方法，以解决LLM低比特量化中的离群值难题。	large language model
14	AdaPaD: Adaptive Parallel Deflation for PEFT with Self-Correcting Rank Discovery	提出AdaPaD自适应并行降维方法，实现大模型参数高效微调中的动态秩发现与自校正训练。	large language model
15	Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks	提出基于先验数据拟合网络的摊销化因果敏感性分析方法，实现高效因果推断	foundation model
16	Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition	揭示自注意力机制的协方差读取本质：统一解释上下文学习与重复生成现象	large language model
17	SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding	提出SlimSpec以加速投机解码解决计算瓶颈	large language model
18	Remember to Forget: Gated Adaptive Positional Encoding	提出GAPE门控自适应位置编码，通过内容感知机制解决长文本外推中的注意力退化问题。	large language model
19	Equilibrium Residuals Expose Three Regimes of Matrix-Game Strategic Reasoning in Language Models	通过均衡残差揭示大语言模型在矩阵博弈战略推理中的三个阶段	large language model
20	Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization	提出基于低秩分解的鲁棒多臂老虎机框架，实现大模型评估的高效与统计有效性	large language model
21	Teaching LLMs to See Graphs: Unifying Text and Structural Reasoning	提出图Transformer语言模型（GTLM），通过原生图注意力偏置实现LLM对图结构数据的直接推理。	large language model
22	GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference	提出GELATO框架，通过生成熵与李雅普诺夫优化实现端边协同推断中的自适应Token卸载	large language model
23	TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation	提出TrajDLM：基于块扩散语言模型的拓扑感知轨迹生成框架，实现高效高保真轨迹合成。	zero-shot transfer	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

#	题目	一句话要点	标签	🔗
24	Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning	提出Impoola架构以解决深度强化学习中视觉分辨率缩放受限的问题	reinforcement learning deep reinforcement learning policy learning	✅
25	Robust Probabilistic Shielding for Safe Offline Reinforcement Learning	提出鲁棒概率屏蔽方法，实现离线强化学习中的安全策略改进	reinforcement learning offline RL offline reinforcement learning
26	Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning	提出动态技能生命周期管理框架SLIM，优化智能体强化学习中的技能集演化	reinforcement learning policy learning large language model
27	Balancing Efficiency and Fairness in Traffic Light Control through Deep Reinforcement Learning	提出一种基于深度强化学习的交通信号控制方法，在提升通行效率的同时兼顾车辆与行人的公平性。	reinforcement learning deep reinforcement learning
28	Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories	提出Clin-JEPA多阶段协同训练框架，实现电子健康记录（EHR）患者轨迹的联合嵌入预测预训练。	JEPA representation learning
29	MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization	提出MASS-DPO：基于Fisher信息的主动负样本选择策略，优化多负样本偏好学习效率	DPO direct preference optimization
30	PC3D: Zero-Shot Cooperation Across Variable Rosters via Personalized Context Distillation	提出PC3D框架，通过个性化上下文蒸馏实现多智能体系统在变动规模下的零样本协作	reinforcement learning distillation
31	Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis	提出等变强化学习框架以实现Clifford量子电路的高效合成	reinforcement learning
32	Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why	提出一种无需训练的诊断框架，通过梯度对齐分析揭示策略内蒸馏在推理模型训练中的作用机制。	distillation
33	Policy Gradient Methods for Non-Markovian Reinforcement Learning	提出代理状态马尔可夫策略梯度（ASMPG）算法，解决非马尔可夫决策过程中的策略优化难题	reinforcement learning
34	Locking Pretrained Weights via Deep Low-Rank Residual Distillation	提出DLR-Lock方法，通过深度低秩残差蒸馏锁定预训练模型权重以防御恶意微调。	distillation
35	Scalable Mamba-Based Message-Passing Neural Decoder for Error-Correcting Codes	提出基于Mamba的消息传递神经译码器（MMPD），实现长码纠错的高效与可扩展性	Mamba
36	Step Rejection Fine-Tuning: A Practical Distillation Recipe	提出步级拒绝微调（SRFT）方法，通过细粒度损失掩码提升LLM智能体在复杂任务中的表现	distillation
37	Controllability in preference-conditioned multi-objective reinforcement learning	提出可控性评估指标以解决偏好条件多目标强化学习中的行为敏感度缺失问题	reinforcement learning
38	PhysEDA: Physics-Aware Learning Framework for Efficient EDA With Manhattan Distance Decay	提出PhysEDA框架：通过曼哈顿距离衰减先验实现高效EDA任务建模	reinforcement learning linear attention reward shaping
39	Follow the Mean: Reference-Guided Flow Matching	提出基于参考引导的流匹配（Reference-Guided Flow Matching）框架，实现无需微调的生成模型可控性。	flow matching
40	When Does Non-Uniform Replay Matter in Reinforcement Learning?	揭示非均匀经验回放的生效机制，提出截断几何采样策略以提升离线强化学习效率	reinforcement learning
41	Unsupervised Process Reward Models	提出无监督过程奖励模型（uPRM），通过概率评分机制实现无需人工标注的推理步骤评估。	reinforcement learning large language model
42	Generating Symmetric Materials using Latent Flow Matching	提出SymADiT：基于潜在流匹配与Wyckoff位置约束的对称性感知材料生成模型	flow matching
43	Adaptive Action Chunking via Multi-Chunk Q Value Estimation	提出自适应动作分块（ACH）算法，通过多块Q值估计实现动态动作序列长度调整。	reinforcement learning imitation learning

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
44	XQCfD: Accelerating Fast Actor-Critic Algorithms with Prior Data and Prior Policies	提出XQCfD算法，通过预训练策略与增强回放机制提升机器人强化学习的样本效率。	manipulation reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-05-11）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (23 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理