cs.LG（2026-02-20）

📊 共 20 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (10 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning	提出FINO，通过注入噪声的Flow Matching提升离线到在线强化学习的样本效率	reinforcement learning offline RL flow matching
2	Deep Reinforcement Learning for Optimizing Energy Consumption in Smart Grid Systems	利用物理信息神经网络加速智能电网能量优化中的深度强化学习	reinforcement learning deep reinforcement learning policy learning
3	Flow Actor-Critic for Offline Reinforcement Learning	提出Flow Actor-Critic，利用流模型解决离线强化学习中复杂数据分布问题	reinforcement learning offline RL offline reinforcement learning
4	Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning	提出基于记忆的优势塑造方法，提升LLM引导强化学习的样本效率	reinforcement learning large language model
5	MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance	MIRA：一种利用有限LLM指导的记忆集成强化学习Agent，解决稀疏奖励问题。	reinforcement learning large language model	✅
6	Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards	提出梯度正则化方法，解决RLHF和RLVR中奖励函数漏洞利用问题	reinforcement learning RLHF
7	Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models	提出基于双仿真的联合嵌入预测世界模型，提升规划在视觉干扰下的鲁棒性	world model
8	On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction	研究Proto-Tokens中编码的语义和句法信息，探索单步文本重建的非自回归路径。	distillation large language model
9	Balancing Symmetry and Efficiency in Graph Flow Matching	提出一种可控对称性调制方案，在图生成模型中平衡对称性和效率。	flow matching
10	Learning Optimal and Sample-Efficient Decision Policies with Guarantees	针对高风险决策，提出一种具有保证的、样本高效的强化学习策略学习方法。	reinforcement learning imitation learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory	通过信息论分析与改进思维链的可监控性，提升LLM安全性	chain-of-thought
12	Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function	提出非干涉权重场（NIWF），解决大模型灾难性遗忘问题。	large language model instruction following
13	MapTab: Can MLLMs Master Constrained Route Planning?	MapTab：评估多模态大语言模型在约束条件下的路线规划能力	large language model multimodal
14	Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework	提出Continual-NExT框架，解决多模态大语言模型持续学习难题。	large language model multimodal
15	Large Causal Models for Temporal Causal Discovery	提出用于时序因果发现的大型因果模型，提升泛化性和可扩展性	foundation model	✅
16	Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings	提出基于希尔伯特空间嵌入的量子最大似然预测框架，用于统一处理经典和量子LLM。	large language model
17	[Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games	复现与扩展LLM谈判能力基准测试，揭示模型评估的客观性挑战	large language model

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Whole-Brain Connectomic Graph Model Enables Whole-Body Locomotion Control in Fruit Fly	提出FlyGM：基于果蝇全脑连接组图模型的具身运动控制方法	locomotion reinforcement learning
19	Online decoding of rat self-paced locomotion speed from EEG using recurrent neural networks	利用循环神经网络从脑电信号在线解码大鼠自主运动速度	locomotion

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Stable Long-Horizon Spatiotemporal Prediction on Meshes Using Latent Multiscale Recurrent Graph Neural Networks	提出基于潜在多尺度递归图神经网络的稳定长时间预测方法	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页