cs.LG(2026-03-17)

📊 共 19 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (12) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
1 HIPO: Instruction Hierarchy via Constrained Reinforcement Learning 提出HIPO框架,通过约束强化学习解决层级指令遵循问题。 reinforcement learning RLHF DPO
2 FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data FEAT:一种线性复杂度的大规模结构化数据基础模型 Mamba linear attention foundation model
3 Offline Exploration-Aware Fine-Tuning for Long-Chain Mathematical Reasoning 提出离线探索感知微调(OXA),提升LLM长链数学推理能力 reinforcement learning distillation large language model
4 Efficient Reasoning on the Edge 提出一种轻量级LLM推理方法,通过LoRA适配器和强化学习,实现边缘设备上的高效推理。 reinforcement learning large language model chain-of-thought
5 DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay DyJR通过动态Jensen-Shannon回放,在强化学习中保持多样性并提升大语言模型推理能力。 reinforcement learning large language model
6 Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards 揭示带可验证奖励的强化学习对噪声数据的脆弱性,强调高质量数据的重要性 reinforcement learning large language model
7 Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning 随机重置加速强化学习策略收敛,提升稀疏奖励环境下的学习效率 reinforcement learning deep reinforcement learning
8 Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition 提出CTFG框架以解决跨用户传感器活动识别问题 reinforcement learning
9 RaDAR: Relation-aware Diffusion-Asymmetric Graph Contrastive Learning for Recommendation 提出RaDAR框架,通过关系感知扩散非对称图对比学习提升推荐系统在稀疏和噪声环境下的性能。 contrastive learning
10 When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective 提出无监督强化学习以提升数学推理能力 reinforcement learning large language model
11 Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism 提出双重共识强化学习(DCRL),解决无监督RLVR中伪标签偏差问题 reinforcement learning large language model
12 Age Predictors Through the Lens of Generalization, Bias Mitigation, and Interpretability: Reflections on Causal Implications 提出基于对抗表示学习的年龄预测模型,提升泛化性并缓解偏差。 predictive model representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
13 SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit 提出SOMP:通过子空间引导正交匹配追踪实现大规模语言模型的可扩展梯度反演 large language model
14 SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding SpecMoE:用于跨物种脑电解码的谱混合专家基础模型 foundation model
15 The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models 链式思考推理诱导视觉-语言模型过度自信,降低不确定性量化可靠性 chain-of-thought
16 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models 提出毫秒级无线网络数据集,弥补时间序列基础模型在高频数据上的不足。 foundation model
17 Capability-Guided Compression: Toward Interpretability-Aware Budget Allocation for Large Language Models 提出能力引导压缩(CGC),解决大语言模型压缩中能力盲区预算分配问题 large language model
18 Decoding the Critique Mechanism in Large Reasoning Models 揭示大语言推理模型中的批判机制,并提出基于批判向量的性能提升方法 chain-of-thought

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
19 MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models MDM-Prime-v2:通过二元编码和索引混洗实现扩散语言模型计算最优扩展 MDM

⬅️ 返回 cs.LG 首页 · 🏠 返回主页