cs.LG(2026-05-11)

📊 共 44 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (23 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (20 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (23 篇)

#题目一句话要点标签🔗
1 MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image 提出MulTaBench基准以解决多模态表格学习中非结构化数据表征对齐不足的问题 foundation model multimodal
2 V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction 提出大规模金融破产预测基准V4FinBench,评估表格基础模型与大语言模型性能 foundation model
3 The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies 揭示思维链忠实度评估中的格式混淆问题:末尾答案偏见对模型推理分析的影响 chain-of-thought
4 jNO: A JAX Library for Neural Operator and Foundation Model Training 提出JAX原生库jNO,实现神经算子与PDE基础模型训练的统一框架 foundation model
5 DynaMiCS: Fine-tuning LLMs with Performance Constraints using Dynamic Mixtures 提出DynaMiCS动态混合优化器,通过约束优化实现大模型多领域微调中的性能平衡 large language model instruction following
6 What should post-training optimize? A test-time scaling law perspective 提出尾部外推估计器(TEA),解决测试时大规模采样与训练时有限算力间的失配问题 large language model instruction following
7 Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration 提出SPEX以打破奖励瓶颈加速树状思维推理 large language model chain-of-thought
8 Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime 量化平均场Transformer在低温极限下的浓度现象,揭示Token分布的演化规律 foundation model
9 LoKA: Low-precision Kernel Applications for Recommendation Models At Scale 提出LoKA框架,通过系统与模型协同设计实现大规模推荐模型的高效FP8训练 large language model
10 Benchmarking Sensor-Fault Robustness in Forecasting 提出SensorFault-Bench基准测试协议,量化评估信息物理系统(CPS)预测模型的传感器故障鲁棒性 foundation model
11 LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges 系统性综述大语言模型在硬件设计与安全中的应用:机遇、挑战与防御策略 large language model
12 Factual recall in linear associative memories: sharp asymptotics and mechanistic insights 利用统计物理学揭示线性联想记忆的事实存储极限与机制 large language model
13 ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs 提出ConQuR:一种基于优化旋转的角对齐激活量化方法,以解决LLM低比特量化中的离群值难题。 large language model
14 AdaPaD: Adaptive Parallel Deflation for PEFT with Self-Correcting Rank Discovery 提出AdaPaD自适应并行降维方法,实现大模型参数高效微调中的动态秩发现与自校正训练。 large language model
15 Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks 提出基于先验数据拟合网络的摊销化因果敏感性分析方法,实现高效因果推断 foundation model
16 Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition 揭示自注意力机制的协方差读取本质:统一解释上下文学习与重复生成现象 large language model
17 SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding 提出SlimSpec以加速投机解码解决计算瓶颈 large language model
18 Remember to Forget: Gated Adaptive Positional Encoding 提出GAPE门控自适应位置编码,通过内容感知机制解决长文本外推中的注意力退化问题。 large language model
19 Equilibrium Residuals Expose Three Regimes of Matrix-Game Strategic Reasoning in Language Models 通过均衡残差揭示大语言模型在矩阵博弈战略推理中的三个阶段 large language model
20 Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization 提出基于低秩分解的鲁棒多臂老虎机框架,实现大模型评估的高效与统计有效性 large language model
21 Teaching LLMs to See Graphs: Unifying Text and Structural Reasoning 提出图Transformer语言模型(GTLM),通过原生图注意力偏置实现LLM对图结构数据的直接推理。 large language model
22 GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference 提出GELATO框架,通过生成熵与李雅普诺夫优化实现端边协同推断中的自适应Token卸载 large language model
23 TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation 提出TrajDLM:基于块扩散语言模型的拓扑感知轨迹生成框架,实现高效高保真轨迹合成。 zero-shot transfer

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
24 Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning 提出Impoola架构以解决深度强化学习中视觉分辨率缩放受限的问题 reinforcement learning deep reinforcement learning policy learning
25 Robust Probabilistic Shielding for Safe Offline Reinforcement Learning 提出鲁棒概率屏蔽方法,实现离线强化学习中的安全策略改进 reinforcement learning offline RL offline reinforcement learning
26 Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning 提出动态技能生命周期管理框架SLIM,优化智能体强化学习中的技能集演化 reinforcement learning policy learning large language model
27 Balancing Efficiency and Fairness in Traffic Light Control through Deep Reinforcement Learning 提出一种基于深度强化学习的交通信号控制方法,在提升通行效率的同时兼顾车辆与行人的公平性。 reinforcement learning deep reinforcement learning
28 Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories 提出Clin-JEPA多阶段协同训练框架,实现电子健康记录(EHR)患者轨迹的联合嵌入预测预训练。 JEPA representation learning
29 MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization 提出MASS-DPO:基于Fisher信息的主动负样本选择策略,优化多负样本偏好学习效率 DPO direct preference optimization
30 PC3D: Zero-Shot Cooperation Across Variable Rosters via Personalized Context Distillation 提出PC3D框架,通过个性化上下文蒸馏实现多智能体系统在变动规模下的零样本协作 reinforcement learning distillation
31 Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis 提出等变强化学习框架以实现Clifford量子电路的高效合成 reinforcement learning
32 Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why 提出一种无需训练的诊断框架,通过梯度对齐分析揭示策略内蒸馏在推理模型训练中的作用机制。 distillation
33 Policy Gradient Methods for Non-Markovian Reinforcement Learning 提出代理状态马尔可夫策略梯度(ASMPG)算法,解决非马尔可夫决策过程中的策略优化难题 reinforcement learning
34 Locking Pretrained Weights via Deep Low-Rank Residual Distillation 提出DLR-Lock方法,通过深度低秩残差蒸馏锁定预训练模型权重以防御恶意微调。 distillation
35 Scalable Mamba-Based Message-Passing Neural Decoder for Error-Correcting Codes 提出基于Mamba的消息传递神经译码器(MMPD),实现长码纠错的高效与可扩展性 Mamba
36 Step Rejection Fine-Tuning: A Practical Distillation Recipe 提出步级拒绝微调(SRFT)方法,通过细粒度损失掩码提升LLM智能体在复杂任务中的表现 distillation
37 Controllability in preference-conditioned multi-objective reinforcement learning 提出可控性评估指标以解决偏好条件多目标强化学习中的行为敏感度缺失问题 reinforcement learning
38 PhysEDA: Physics-Aware Learning Framework for Efficient EDA With Manhattan Distance Decay 提出PhysEDA框架:通过曼哈顿距离衰减先验实现高效EDA任务建模 reinforcement learning linear attention reward shaping
39 Follow the Mean: Reference-Guided Flow Matching 提出基于参考引导的流匹配(Reference-Guided Flow Matching)框架,实现无需微调的生成模型可控性。 flow matching
40 When Does Non-Uniform Replay Matter in Reinforcement Learning? 揭示非均匀经验回放的生效机制,提出截断几何采样策略以提升离线强化学习效率 reinforcement learning
41 Unsupervised Process Reward Models 提出无监督过程奖励模型(uPRM),通过概率评分机制实现无需人工标注的推理步骤评估。 reinforcement learning large language model
42 Generating Symmetric Materials using Latent Flow Matching 提出SymADiT:基于潜在流匹配与Wyckoff位置约束的对称性感知材料生成模型 flow matching
43 Adaptive Action Chunking via Multi-Chunk Q Value Estimation 提出自适应动作分块(ACH)算法,通过多块Q值估计实现动态动作序列长度调整。 reinforcement learning imitation learning

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
44 XQCfD: Accelerating Fast Actor-Critic Algorithms with Prior Data and Prior Policies 提出XQCfD算法,通过预训练策略与增强回放机制提升机器人强化学习的样本效率。 manipulation reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页