cs.LG(2026-04-16)

📊 共 19 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9) 支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 Assessing the Potential of Masked Autoencoder Foundation Models in Predicting Downhole Metrics from Surface Drilling Data 评估掩码自编码器基础模型在利用地面钻井数据预测井下参数方面的潜力 masked autoencoder foundation model
2 Learning Ad Hoc Network Dynamics via Graph-Structured World Models 提出G-RSSM,通过图结构世界模型学习Ad hoc网络动态,用于size无关的节点决策。 reinforcement learning deep reinforcement learning world model
3 DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models DLink:从脑电图基础模型中蒸馏分层和主导知识,实现轻量化部署。 teacher-student distillation foundation model
4 MambaSL: Exploring Single-Layer Mamba for Time Series Classification MambaSL:探索单层Mamba模型在时间序列分类中的应用 Mamba SSM state space model
5 LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning LongAct:利用内在激活模式提升长文本强化学习性能 reinforcement learning large language model
6 On the Expressive Power and Limitations of Multi-Layer SSMs 揭示多层SSM在组合任务中的局限性,并探索在线CoT如何提升其表达能力 SSM chain-of-thought
7 RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning 提出RL-STPA框架,用于安全关键强化学习中的系统性风险分析。 reinforcement learning reward shaping
8 Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization 提出基于Wasserstein空间的强化学习框架,优化策略。 reinforcement learning
9 Beyond Importance Sampling: Rejection-Gated Policy Optimization 提出RGPO,通过可学习的接受门控优化策略,提升强化学习的稳定性和性能。 PPO RLHF

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
10 Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting 对比研究:电力价格概率预测中基础模型与专用模型的性能效率权衡 foundation model
11 Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings 利用大语言模型嵌入,从临床记录预测创伤后癫痫风险 large language model
12 Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits 提出校准门控LLM伪观测,解决在线上下文Bandit算法的冷启动问题。 large language model
13 Improving Sparse Autoencoder with Dynamic Attention 提出基于动态稀疏注意力的稀疏自编码器,提升特征解耦与重建效果 foundation model
14 Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization 提出基于约束策略优化的自适应推理计算分配方法,提升LLM在有限预算下的性能。 large language model
15 Gating Enables Curvature: A Geometric Expressivity Gap in Attention 揭示门控机制在Attention中的几何表达能力差距,实现非平坦流形建模 large language model
16 ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding 提出ConfLayers,一种基于置信度的自适应层跳跃方法,加速自推测解码。 large language model
17 Generative Augmented Inference 提出生成式增强推理(GAI)框架,利用AI辅助数据提升人工标注模型的估计效率。 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
18 LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking RLVR训练的大语言模型存在奖励欺骗,通过枚举而非归纳学习逻辑规则 manipulation reinforcement learning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
19 Material-Agnostic Zero-Shot Thermal Inference for Metal Additive Manufacturing via a Parametric PINN Framework 提出一种参数化PINN框架,用于金属增材制造中材料无关的零样本热推断。 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页