cs.LG(2026-04-16)

📊 共 27 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (12) 支柱九:具身大模型 (Embodied Foundation Models) (12 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
1 Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling Data 评估掩码自编码器基础模型在利用地面钻井数据预测井下参数方面的潜力 masked autoencoder foundation model
2 Learning Ad Hoc Network Dynamics via Graph-Structured World Models 提出G-RSSM,通过图结构世界模型学习Ad hoc网络动态,用于size无关的节点决策。 reinforcement learning deep reinforcement learning world model
3 DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models DLink:从脑电图基础模型中蒸馏分层和主导知识,实现轻量化部署。 teacher-student distillation foundation model
4 MambaSL: Exploring Single-Layer Mamba for Time Series Classification MambaSL:探索单层Mamba模型在时间序列分类中的应用 Mamba SSM state space model
5 LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning LongAct:利用内在激活模式提升长文本强化学习性能 reinforcement learning large language model
6 On the Expressive Power and Limitations of Multi-Layer SSMs 揭示多层SSM在组合任务中的局限性,并探索在线CoT如何提升其表达能力 SSM chain-of-thought
7 RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning 提出RL-STPA框架,用于安全关键强化学习中的系统性风险分析。 reinforcement learning reward shaping
8 Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization 提出基于Wasserstein空间的强化学习框架,优化策略。 reinforcement learning
9 Beyond Importance Sampling: Rejection-Gated Policy Optimization 提出RGPO,通过可学习的接受门控优化策略,提升强化学习的稳定性和性能。 PPO RLHF
10 Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models 提出奖励加权无分类器引导方法,提升自回归模型策略 reinforcement learning classifier-free guidance
11 Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation 提出基于统一领域表示和双向Logit蒸馏的多目标LLM不可学习框架 distillation large language model
12 Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning 提出TeLAPA,通过维护技能对齐的策略邻域,提升持续强化学习中的可塑性。 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
13 Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting 对比研究:电力价格概率预测中基础模型与专用模型的性能效率权衡 foundation model
14 Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings 利用大语言模型嵌入,从临床记录预测创伤后癫痫风险 large language model
15 Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits 提出校准门控LLM伪观测,解决在线上下文Bandit算法的冷启动问题。 large language model
16 Improving Sparse Autoencoder with Dynamic Attention 提出基于动态稀疏注意力的稀疏自编码器,提升特征解耦与重建效果 foundation model
17 Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization 提出基于约束策略优化的自适应推理计算分配方法,提升LLM在有限预算下的性能。 large language model
18 Gating Enables Curvature: A Geometric Expressivity Gap in Attention 揭示门控机制在Attention中的几何表达能力差距,实现非平坦流形建模 large language model
19 ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding 提出ConfLayers,一种基于置信度的自适应层跳跃方法,加速自推测解码。 large language model
20 Generative Augmented Inference 提出生成式增强推理(GAI)框架,利用AI辅助数据提升人工标注模型的估计效率。 large language model
21 FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models 提出FineSteer,用于大语言模型中细粒度的推理时行为引导,提升安全性和真实性。 large language model multimodal
22 StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models 提出StoSignSGD,通过结构随机性解决SignSGD在大模型训练中非光滑目标上的发散问题 large language model foundation model
23 ExoNet: Multimodal Deep Learning for TESS Exoplanet Candidate Identification via Phase-Folded Light Curves, Stellar Parameters, and Multi-Head Attention Fusion ExoNet:利用多模态深度学习和注意力机制识别TESS系外行星候选者 multimodal
24 Prompt-Driven Code Summarization: A Systematic Literature Review Prompt驱动的代码摘要生成综述:系统性分析Prompt策略对LLM性能的影响 large language model chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
25 LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking RLVR训练的大语言模型存在奖励欺骗,通过枚举而非归纳学习逻辑规则 manipulation reinforcement learning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
26 Material-Agnostic Zero-Shot Thermal Inference for Metal Additive Manufacturing via a Parametric PINN Framework 提出一种参数化PINN框架,用于金属增材制造中材料无关的零样本热推断。 spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
27 $π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities 提出通用机器人基础模型$π_{0.7}$,通过情境引导实现零样本泛化与涌现能力。 cross-embodiment foundation model multimodal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页