cs.LG(2026-01-28)

📊 共 28 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (13 🔗2) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Reinforcement Learning via Self-Distillation 提出自蒸馏策略优化(SDPO),利用反馈信息提升强化学习效果 reinforcement learning distillation large language model
2 PatchFormer: A Patch-Based Time Series Foundation Model with Hierarchical Masked Reconstruction and Cross-Domain Transfer Learning for Zero-Shot Multi-Horizon Forecasting PatchFormer:基于分层掩码重建和跨域迁移学习的时间序列基础模型,用于零样本多步预测。 distillation foundation model
3 Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models 提出PU-RL蒸馏方法,用于在本地小模型上实现强化学习对齐。 reinforcement learning direct preference optimization distillation
4 Less is More: Clustered Cross-Covariance Control for Offline RL 提出聚类交叉协方差控制(C^4)方法,解决离线强化学习中的分布偏移问题。 reinforcement learning policy learning offline RL
5 Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers 提出一种基于预测驱动的DRL方法,用于数据中心中主动式的SFC资源分配。 reinforcement learning deep reinforcement learning DRL
6 GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning 提出GraphAllocBench:一个灵活的偏好条件多目标策略学习基准。 reinforcement learning policy learning
7 Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning 提出失败前缀条件学习方法,解决LLM在饱和推理问题上的训练停滞问题 reinforcement learning large language model
8 Ranking-aware Reinforcement Learning for Ordinal Ranking 提出排序感知强化学习(RARL)框架,解决序数排序中的依赖关系建模难题。 reinforcement learning
9 CCMamba: Selective State-Space Models for Higher-Order Graph Learning on Combinatorial Complexes 提出CCMamba,用于组合复形上高阶图学习的选择性状态空间模型 Mamba
10 C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding C2:结合约束感知损失的交叉学习决策Transformer,用于增强自动竞价效果 decision transformer
11 Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Spark:通过动态分支的策略感知探索,解决长时程Agent学习中的资源分配问题 reinforcement learning large language model
12 Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery 提出基于自我怀疑与恢复的元认知强化学习框架,提升奖励腐蚀环境下的鲁棒性。 reinforcement learning
13 Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning 提出基于谱分析的自监督学习统一框架,提升表征学习效率 representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
14 A Foundation Model for Virtual Sensors 提出用于虚拟传感器的基础模型,解决现有方法计算开销大、泛化性弱的问题。 foundation model
15 Reward Models Inherit Value Biases from Pretraining 奖励模型继承预训练语言模型的价值观偏见,影响对齐效果 large language model
16 VSCOUT: A Hybrid Variational Autoencoder Approach to Outlier Detection in High-Dimensional Retrospective Monitoring VSCOUT:一种混合变分自编码器方法,用于高维回顾性监控中的异常检测。 multimodal
17 Context-Augmented Code Generation Using Programming Knowledge Graphs 提出基于编程知识图谱的上下文增强代码生成方法,提升复杂问题解决能力 large language model
18 HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs HESTIA:一种Hessian引导的可微量化感知训练框架,用于极低比特LLM large language model
19 Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs 利用嵌入和图神经网络检测大型语言模型生成的参考文献 large language model
20 Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs 提出Concept Component Analysis (ConCA),用于从LLM中提取可解释的概念。 large language model
21 LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning 提出LLM-AutoDP,利用LLM智能体自动进行数据处理以优化模型微调。 large language model
22 Less is More: Benchmarking LLM Based Recommendation Agents LLM推荐Agent:更少用户历史不损预测精度反降成本 large language model
23 Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction 提出基于同伴预测的LLM评估与训练方法,提升弱监督下的真实性。 large language model
24 Memory Retrieval in Transformers: Insights from The Encoding Specificity Principle 基于编码特异性原则,揭示Transformer中Attention层的记忆检索机制 large language model
25 HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH 提出HE-SNR指标,通过熵压缩指导LLM在SWE-BENCH上的中训练,提升软件工程任务性能。 large language model
26 Efficient Evaluation of LLM Performance with Statistical Guarantees 提出FAQ方法,在固定查询预算下高效评估LLM性能并保证统计有效性。 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
27 A Learning-based Framework for Spatial Impulse Response Compensation in 3D Photoacoustic Computed Tomography 提出基于学习的空间脉冲响应补偿框架,加速3D光声计算层析成像。 PULSE

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
28 Cheap2Rich: A Multi-Fidelity Framework for Data Assimilation and System Identification of Multiscale Physics -- Rotating Detonation Engines Cheap2Rich:多尺度数据同化框架,用于旋转爆震发动机系统辨识。 sim2real sparse sensors

⬅️ 返回 cs.LG 首页 · 🏠 返回主页