cs.LG（2026-03-23）

📊 共 27 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (15 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (10) 支柱一：机器人控制 (Robot Control) (2)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
1	SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models	提出SSAM，通过奇异子空间对齐实现多模态大语言模型的无训练融合	large language model multimodal
2	Riemannian Geometry Speaks Louder Than Words: From Graph Foundation Model to Next-Generation Graph Intelligence	提出Riemannian Foundation Model (RFM)，利用黎曼几何构建下一代图智能。	large language model foundation model
3	AdditiveLLM2: A Multi-modal Large Language Model for Additive Manufacturing	提出AdditiveLLM2，一种面向增材制造的多模态大语言模型，通过领域自适应预训练实现专业化。	large language model
4	Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors	PW-FouCast：通过雷达观测与气象大模型先验的频谱融合，扩展降水临近预报时效	foundation model	✅
5	Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction	提出MindTS模型，通过语义对齐和精简交互实现多模态时间序列异常检测。	multimodal	✅
6	ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention	提出ROM，通过流式检测和干预缓解大型推理模型中的过度思考问题。	large language model chain-of-thought
7	Data-Free Layer-Adaptive Merging via Fisher Information for Long-to-Short Reasoning LLMs	提出基于Fisher信息的层自适应模型融合方法，提升长链推理LLM性能。	large language model chain-of-thought
8	Noise Titration: Exact Distributional Benchmarking for Probabilistic Time Series Forecasting	提出噪声滴定方法以解决时间序列预测的评估问题	foundation model
9	SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection	SPA：一种简单但效果极佳的知识注入基线方法	large language model	✅
10	Revisiting Quantum Code Generation: Where Should Domain Knowledge Live?	通过推理时增强，提升LLM在量子代码生成中的性能，无需领域微调。	large language model
11	Causal Evidence that Language Models use Confidence to Drive Behavior	揭示大语言模型利用置信度驱动行为决策，为自主智能体发展奠定基础	large language model
12	Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization	提出MoE架构优化框架，通过联合约束和降维搜索，实现任意计算预算下的最优架构配置。	large language model
13	Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization	提出深度递归Transformer，解决Transformer在组合泛化中计算深度受限问题	chain-of-thought
14	Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy	提出基于柯尔莫哥洛夫复杂度的LLM隐写术理论界限及基于困惑度的检测代理。	large language model
15	Generalization Limits of In-Context Operator Networks for Higher-Order Partial Differential Equations	ICONs模型扩展至高阶偏微分方程，保持解的动态特性	foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Multimodal Survival Analysis with Locally Deployable Large Language Models	提出基于本地部署LLM的多模态生存分析方法，提升预测精度和隐私性。	teacher-student distillation large language model
17	CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter	提出CoRA：一种相关性感知适配器，提升时间序列基础模型的多变量预测能力	contrastive learning foundation model
18	Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe	针对长程工具使用Agent，提出基于强化学习的系统性方案，显著提升TravelPlanner性能。	reinforcement learning reward shaping large language model
19	Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors	揭示深度强化学习中两种时序差分误差解释的差异性及其对算法性能的影响	reinforcement learning deep reinforcement learning
20	Cluster-Specific Predictive Modeling: A Scalable Solution for Resource-Constrained Wi-Fi Controllers	提出集群特定预测建模以解决资源受限Wi-Fi控制器问题	predictive model MAE
21	What Do World Models Learn in RL? Probing Latent Representations in Learned Environment Simulators	通过可解释性分析，揭示强化学习中世界模型对环境状态的线性表征学习	reinforcement learning world model
22	On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation	提出基于更新方向的RLVR方法以提升大语言模型推理能力	reinforcement learning large language model
23	TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning	TREX：基于轨迹归因的多目标强化学习可解释性框架	reinforcement learning
24	P^2O: Joint Policy and Prompt Optimization	提出P^2O框架，联合优化策略与提示，提升LLM在困难样本上的推理能力。	reinforcement learning large language model
25	Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective	提出GSB-PPO，一种基于广义薛定谔桥的路径空间近端策略优化方法，用于训练生成策略。	reinforcement learning PPO

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning	提出基于可微世界模型的模型预测控制，用于离线强化学习。	locomotion MPC model predictive control
27	Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration	提出解耦探索与策略优化的不确定性引导树搜索算法，用于解决强化学习中的困难探索问题	manipulation dexterous manipulation reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页