cs.LG(2026-02-23)

📊 共 24 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (12 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱八:物理动画 (Physics-based Animation) (3) 支柱一:机器人控制 (Robot Control) (3 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
1 Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing Decision MetaMamba:异构序列混合增强离线强化学习中的选择性SSM offline RL Mamba SSM
2 Uncertainty-Aware Rank-One MIMO Q Network Framework for Accelerated Offline Reinforcement Learning 提出不确定性感知的Rank-One MIMO Q网络,加速离线强化学习并提升性能。 reinforcement learning offline RL offline reinforcement learning
3 RAmmStein: Regime Adaptation in Mean-reverting Markets with Stein Thresholds -- Optimal Impulse Control in Concentrated AMMs RAmmStein:基于Stein阈值和均值回复的集中式AMM机制自适应 reinforcement learning deep reinforcement learning PULSE
4 Advantage-based Temporal Attack in Reinforcement Learning 提出基于优势函数的时序对抗Transformer,提升强化学习模型的鲁棒性 reinforcement learning deep reinforcement learning DRL
5 LAD: Learning Advantage Distribution for Reasoning 提出LAD:通过学习优势分布提升大模型推理能力,增强多样性 reinforcement learning multimodal
6 On the Equivalence of Random Network Distillation, Deep Ensembles, and Bayesian Inference 揭示随机网络蒸馏、深度集成和贝叶斯推断的等价性,用于高效不确定性量化。 distillation
7 DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning 提出DSDR双尺度多样性正则化框架,提升LLM推理中基于强化学习的探索能力 reinforcement learning large language model
8 Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning 提出基于表征学习的孟德尔随机化方法,解决工具变量-结果混淆问题 representation learning
9 Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation 提出基于伪标签和知识蒸馏的自动和弦识别增强方法 distillation
10 SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning 提出SenTSR-Bench,通过知识注入增强时序数据诊断推理能力 reinforcement learning large language model
11 Federated Causal Representation Learning in State-Space Systems for Decentralized Counterfactual Reasoning 提出联邦因果表征学习框架,解决工业互联系统中分散反事实推理难题 representation learning
12 Sparse Masked Attention Policies for Reliable Generalization 提出稀疏掩码注意力策略,提升强化学习策略的泛化可靠性 reinforcement learning PPO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
13 Counterfactual Understanding via Retrieval-aware Multimodal Modeling for Time-to-Event Survival Prediction 提出CURE框架以解决时间事件反事实生存预测问题 multimodal
14 MACE-POLAR-1: A Polarisable Electrostatic Foundation Model for Molecular Chemistry MACE-POLAR-1:用于分子化学的可极化静电基础模型 foundation model
15 BarrierSteer: LLM Safety via Learning Barrier Steering BarrierSteer:通过学习屏障导向实现LLM安全性 large language model
16 A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs 提出Replicate-and-Quantize框架,解决SMoE模型推理时负载不均衡问题。 large language model
17 Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models 提出LA-LoRA,解决差分隐私联邦学习中LoRA微调大模型的性能下降问题。 large language model
18 Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering 提出面向汽车系统工程的GenAI可信工作流设计原则,提升安全性和可追溯性 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
19 LEVDA: Latent Ensemble Variational Data Assimilation via Differentiable Dynamics 提出LEVDA,利用可微动力学在潜空间进行集合变分数据同化,提升地球物理预测精度。 spatiotemporal
20 Fully Convolutional Spatiotemporal Learning for Microstructure Evolution Prediction 提出全卷积时空学习模型,加速材料微观结构演化预测。 spatiotemporal
21 NEXUS : A compact neural architecture for high-resolution spatiotemporal air quality forecasting in Delhi Nationa Capital Region NEXUS:紧凑型神经网络架构,用于德里国家首都区高分辨率时空空气质量预测 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
22 Compositional Planning with Jumpy World Models 提出基于Jumpy World Models的组合规划方法,提升长时序任务零样本性能。 manipulation world model predictive model
23 Variational Trajectory Optimization of Anisotropic Diffusion Schedules 提出变分框架优化各向异性扩散模型,提升图像生成质量与效率。 trajectory optimization
24 RobPI: Robust Private Inference against Malicious Client RobPI:提出针对恶意客户端的鲁棒私有推理协议 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页