cs.LG(2026-03-11)

📊 共 23 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Reinforcement Learning with Conditional Expectation Reward 提出条件期望奖励(CER),利用大语言模型自身作为隐式验证器,提升通用推理能力。 reinforcement learning large language model
2 Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control 提出RAD框架,通过随机优势控制RLHF中的风险,提升安全性和鲁棒性。 reinforcement learning RLHF
3 Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning 提出GR$^3$,通过群组相对奖励重缩放解决强化学习中的长度膨胀问题 reinforcement learning RLHF
4 UAV-MARL: Multi-Agent Reinforcement Learning for Time-Critical and Dynamic Medical Supply Delivery 提出基于多智能体强化学习的无人机医疗物资动态配送方案 reinforcement learning PPO
5 Graph-GRPO: Training Graph Flow Models with Reinforcement Learning 提出Graph-GRPO,通过强化学习训练图流模型以优化图生成任务 reinforcement learning flow matching
6 Ergodicity in reinforcement learning 探讨非遍历性奖励过程对强化学习的影响,并分析现有解决方案。 reinforcement learning
7 Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models 提出动态预测采样(DPS)方法,加速大模型推理能力强化学习微调。 reinforcement learning large language model
8 Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis EvoKernel:面向NPU内核合成的价值驱动记忆方法,实现冷启动和持续优化 reinforcement learning large language model
9 ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning 提出ReTabSyn以解决低数据和不平衡表格数据合成问题 reinforcement learning
10 Learning to Score: Tuning Cluster Schedulers through Reinforcement Learning 提出基于强化学习的集群调度器调优方法,提升作业性能和集群利用率。 reinforcement learning
11 Adaptive Active Learning for Regression via Reinforcement Learning 提出基于强化学习的自适应主动回归学习方法,提升标注效率。 reinforcement learning
12 Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression 提出STemDist,一种用于时空预测的双维度压缩数据集蒸馏方法。 distillation
13 Riemannian MeanFlow for One-Step Generation on Manifolds 提出黎曼MeanFlow,用于流形上的一步生成,提升质量-效率权衡。 flow matching classifier-free guidance

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
14 TOSSS: a CVE-based Software Security Benchmark for Large Language Models TOSSS:基于CVE漏洞的大语言模型软件安全基准测试 large language model
15 Leech Lattice Vector Quantization for Efficient LLM Compression 提出基于Leech格矢量量化的LLVQ算法,高效压缩大语言模型 large language model
16 LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation LookaheadKV:无需生成即可预测未来,实现快速准确的KV缓存淘汰 large language model
17 CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems CacheSolidarity:防止多租户LLM服务系统中前缀缓存侧信道攻击 large language model
18 The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training 提出均值消减方法,解决FP4量化LLM训练中的不稳定性问题。 large language model
19 GGMPs: Generalized Gaussian Mixture Processes 提出广义高斯混合过程(GGMP),用于解决条件密度估计中的多模态和非高斯性问题。 multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
20 Factorized Neural Implicit DMD for Parametric Dynamics 提出因子化神经隐式DMD,用于参数化动力学系统建模与分析。 spatiotemporal
21 Data-Driven Integration Kernels for Interpretable Nonlocal Operator Learning 提出数据驱动积分核,用于可解释的非局部算子学习,应用于气候过程建模。 spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
22 Protein Counterfactuals via Diffusion-Guided Latent Optimization 提出MCCOP,通过扩散模型引导的隐空间优化实现蛋白质反事实生成。 latent optimization

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
23 Muscle Synergy Priors Enhance Biomechanical Fidelity in Predictive Musculoskeletal Locomotion Simulation 利用肌肉协同先验知识提升预测性肌肉骨骼运动仿真的生物力学保真度 locomotion reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页