cs.LG（2026-01-28）

📊 共 28 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (13 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (13 🔗2) 支柱八：物理动画 (Physics-based Animation) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Reinforcement Learning via Self-Distillation	提出自蒸馏策略优化(SDPO)，利用反馈信息提升强化学习效果	reinforcement learning distillation large language model
2	PatchFormer: A Patch-Based Time Series Foundation Model with Hierarchical Masked Reconstruction and Cross-Domain Transfer Learning for Zero-Shot Multi-Horizon Forecasting	PatchFormer：基于分层掩码重建和跨域迁移学习的时间序列基础模型，用于零样本多步预测。	distillation foundation model
3	Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models	提出PU-RL蒸馏方法，用于在本地小模型上实现强化学习对齐。	reinforcement learning direct preference optimization distillation
4	Less is More: Clustered Cross-Covariance Control for Offline RL	提出聚类交叉协方差控制（C^4）方法，解决离线强化学习中的分布偏移问题。	reinforcement learning policy learning offline RL
5	Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers	提出一种基于预测驱动的DRL方法，用于数据中心中主动式的SFC资源分配。	reinforcement learning deep reinforcement learning DRL
6	GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning	提出GraphAllocBench：一个灵活的偏好条件多目标策略学习基准。	reinforcement learning policy learning
7	Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning	提出失败前缀条件学习方法，解决LLM在饱和推理问题上的训练停滞问题	reinforcement learning large language model
8	Ranking-aware Reinforcement Learning for Ordinal Ranking	提出排序感知强化学习(RARL)框架，解决序数排序中的依赖关系建模难题。	reinforcement learning
9	CCMamba: Selective State-Space Models for Higher-Order Graph Learning on Combinatorial Complexes	提出CCMamba，用于组合复形上高阶图学习的选择性状态空间模型	Mamba
10	C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding	C2：结合约束感知损失的交叉学习决策Transformer，用于增强自动竞价效果	decision transformer	✅
11	Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning	Spark：通过动态分支的策略感知探索，解决长时程Agent学习中的资源分配问题	reinforcement learning large language model
12	Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery	提出基于自我怀疑与恢复的元认知强化学习框架，提升奖励腐蚀环境下的鲁棒性。	reinforcement learning
13	Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning	提出基于谱分析的自监督学习统一框架，提升表征学习效率	representation learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
14	A Foundation Model for Virtual Sensors	提出用于虚拟传感器的基础模型，解决现有方法计算开销大、泛化性弱的问题。	foundation model
15	Reward Models Inherit Value Biases from Pretraining	奖励模型继承预训练语言模型的价值观偏见，影响对齐效果	large language model
16	VSCOUT: A Hybrid Variational Autoencoder Approach to Outlier Detection in High-Dimensional Retrospective Monitoring	VSCOUT：一种混合变分自编码器方法，用于高维回顾性监控中的异常检测。	multimodal
17	Context-Augmented Code Generation Using Programming Knowledge Graphs	提出基于编程知识图谱的上下文增强代码生成方法，提升复杂问题解决能力	large language model	✅
18	HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs	HESTIA：一种Hessian引导的可微量化感知训练框架，用于极低比特LLM	large language model	✅
19	Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs	利用嵌入和图神经网络检测大型语言模型生成的参考文献	large language model
20	Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs	提出Concept Component Analysis (ConCA)，用于从LLM中提取可解释的概念。	large language model
21	LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning	提出LLM-AutoDP，利用LLM智能体自动进行数据处理以优化模型微调。	large language model
22	Less is More: Benchmarking LLM Based Recommendation Agents	LLM推荐Agent：更少用户历史不损预测精度反降成本	large language model
23	Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction	提出基于同伴预测的LLM评估与训练方法，提升弱监督下的真实性。	large language model
24	Memory Retrieval in Transformers: Insights from The Encoding Specificity Principle	基于编码特异性原则，揭示Transformer中Attention层的记忆检索机制	large language model
25	HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH	提出HE-SNR指标，通过熵压缩指导LLM在SWE-BENCH上的中训练，提升软件工程任务性能。	large language model
26	Efficient Evaluation of LLM Performance with Statistical Guarantees	提出FAQ方法，在固定查询预算下高效评估LLM性能并保证统计有效性。	large language model

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	A Learning-based Framework for Spatial Impulse Response Compensation in 3D Photoacoustic Computed Tomography	提出基于学习的空间脉冲响应补偿框架，加速3D光声计算层析成像。	PULSE

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
28	Cheap2Rich: A Multi-Fidelity Framework for Data Assimilation and System Identification of Multiscale Physics -- Rotating Detonation Engines	Cheap2Rich：多尺度数据同化框架，用于旋转爆震发动机系统辨识。	sim2real sparse sensors

⬅️ 返回 cs.LG 首页 · 🏠 返回主页