cs.LG（2026-01-12）

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (14 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (9 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Forward versus Backward: Comparing Reasoning Objectives in Direct Preference Optimization	对比正向与反向推理目标，提升直接偏好优化在数学问题上的可靠性	DPO direct preference optimization large language model
2	Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training	提出分段优势估计(SAE)，提升PPO在长文本LLM稀疏奖励训练中的性能。	reinforcement learning PPO large language model
3	d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation	提出d3LLM，通过伪轨迹蒸馏加速扩散语言模型，实现精度与并行性的平衡。	distillation large language model	✅
4	On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training	证明了后训练中监督微调与强化学习无法解耦，避免性能损失	reinforcement learning large language model
5	Stable On-Policy Distillation through Adaptive Target Reformulation	提出Veto：通过自适应目标重构实现稳定的On-Policy蒸馏	distillation large language model
6	Reinforcement Learning for Micro-Level Claims Reserving	提出基于强化学习的微观索赔准备金方法，提升未决赔案负债预测精度与稳定性	reinforcement learning reward design
7	Stagewise Reinforcement Learning and the Geometry of the Regret Landscape	基于后悔函数几何的阶段性强化学习理论，揭示策略演化中的贝叶斯相变	reinforcement learning deep reinforcement learning
8	Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization	提出FLORA，通过流模型任务推断和自适应特征校正解决离线元强化学习中的泛化问题	reinforcement learning offline RL
9	Improving Domain Generalization in Contrastive Learning using Adaptive Temperature Control	提出自适应温度控制对比学习，提升域泛化能力	contrastive learning
10	TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning	提出TFEC框架，通过时频增强对比学习提升多元时间序列聚类性能	contrastive learning	✅
11	Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission	提出基于流匹配的生成解码器以解决无线图像传输问题	flow matching
12	Explaining Machine Learning Predictive Models through Conditional Expectation Methods	提出MUCE方法，通过条件期望解释机器学习模型预测，提升模型透明度和可信度。	predictive model
13	Pseudodata-guided Invariant Representation Learning Boosts the Out-of-Distribution Generalization in Enzymatic Kinetic Parameter Prediction	O$^2$DENet通过伪数据引导的不变表示学习提升酶促动力学参数预测的OOD泛化能力	representation learning
14	Reward-Preserving Attacks For Robust Reinforcement Learning	提出α-奖励保持攻击，提升强化学习在对抗环境下的鲁棒性	reinforcement learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
15	SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis	SCALPEL：通过低秩参数编辑实现大语言模型选择性能力消融的可解释性分析	large language model
16	CompNO: A Novel Foundation Model approach for solving Partial Differential Equations	CompNO：一种用于求解偏微分方程的新型组合式神经算子基础模型	foundation model
17	OceanSAR-2: A Universal Feature Extractor for SAR Ocean Observation	OceanSAR-2：用于SAR海洋观测的通用特征提取器，提升性能并降低训练成本。	foundation model
18	Are LLM Decisions Faithful to Verbal Confidence?	RiskEval揭示LLM置信度与决策行为脱节问题	large language model
19	ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs	ARCQuant：通过增强残差通道提升LLM的NVFP4量化性能	large language model	✅
20	MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization	MAESTRO：通过元学习自适应估计标量化权衡，优化奖励函数，提升LLM在开放域任务中的性能。	large language model
21	Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment	提出基于最优传输的SOT框架，提升LLM微调过程中的安全性。	large language model
22	Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics	提出FASC：通过Fisher对齐子空间诊断实现知识感知的LLM压缩	large language model
23	PRPO: Aligning Process Reward with Outcome Reward in Policy Optimization	PRPO：对齐过程奖励与结果奖励，提升策略优化效果	large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页