cs.LG(2026-01-12)

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (14 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
1 Forward versus Backward: Comparing Reasoning Objectives in Direct Preference Optimization 对比正向与反向推理目标,提升直接偏好优化在数学问题上的可靠性 DPO direct preference optimization large language model
2 Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training 提出分段优势估计(SAE),提升PPO在长文本LLM稀疏奖励训练中的性能。 reinforcement learning PPO large language model
3 d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation 提出d3LLM,通过伪轨迹蒸馏加速扩散语言模型,实现精度与并行性的平衡。 distillation large language model
4 On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training 证明了后训练中监督微调与强化学习无法解耦,避免性能损失 reinforcement learning large language model
5 Stable On-Policy Distillation through Adaptive Target Reformulation 提出Veto:通过自适应目标重构实现稳定的On-Policy蒸馏 distillation large language model
6 Reinforcement Learning for Micro-Level Claims Reserving 提出基于强化学习的微观索赔准备金方法,提升未决赔案负债预测精度与稳定性 reinforcement learning reward design
7 Stagewise Reinforcement Learning and the Geometry of the Regret Landscape 基于后悔函数几何的阶段性强化学习理论,揭示策略演化中的贝叶斯相变 reinforcement learning deep reinforcement learning
8 Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization 提出FLORA,通过流模型任务推断和自适应特征校正解决离线元强化学习中的泛化问题 reinforcement learning offline RL
9 Improving Domain Generalization in Contrastive Learning using Adaptive Temperature Control 提出自适应温度控制对比学习,提升域泛化能力 contrastive learning
10 TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning 提出TFEC框架,通过时频增强对比学习提升多元时间序列聚类性能 contrastive learning
11 Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission 提出基于流匹配的生成解码器以解决无线图像传输问题 flow matching
12 Explaining Machine Learning Predictive Models through Conditional Expectation Methods 提出MUCE方法,通过条件期望解释机器学习模型预测,提升模型透明度和可信度。 predictive model
13 Pseudodata-guided Invariant Representation Learning Boosts the Out-of-Distribution Generalization in Enzymatic Kinetic Parameter Prediction O$^2$DENet通过伪数据引导的不变表示学习提升酶促动力学参数预测的OOD泛化能力 representation learning
14 Reward-Preserving Attacks For Robust Reinforcement Learning 提出α-奖励保持攻击,提升强化学习在对抗环境下的鲁棒性 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
15 SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis SCALPEL:通过低秩参数编辑实现大语言模型选择性能力消融的可解释性分析 large language model
16 CompNO: A Novel Foundation Model approach for solving Partial Differential Equations CompNO:一种用于求解偏微分方程的新型组合式神经算子基础模型 foundation model
17 OceanSAR-2: A Universal Feature Extractor for SAR Ocean Observation OceanSAR-2:用于SAR海洋观测的通用特征提取器,提升性能并降低训练成本。 foundation model
18 Are LLM Decisions Faithful to Verbal Confidence? RiskEval揭示LLM置信度与决策行为脱节问题 large language model
19 ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs ARCQuant:通过增强残差通道提升LLM的NVFP4量化性能 large language model
20 MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization MAESTRO:通过元学习自适应估计标量化权衡,优化奖励函数,提升LLM在开放域任务中的性能。 large language model
21 Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment 提出基于最优传输的SOT框架,提升LLM微调过程中的安全性。 large language model
22 Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics 提出FASC:通过Fisher对齐子空间诊断实现知识感知的LLM压缩 large language model
23 PRPO: Aligning Process Reward with Outcome Reward in Policy Optimization PRPO:对齐过程奖励与结果奖励,提升策略优化效果 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页