cs.LG（2025-09-28）

📊 共 44 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (21 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (21 🔗2) 支柱一：机器人控制 (Robot Control) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (21 篇)

#	题目	一句话要点	标签	🔗
1	HyMaTE: A Hybrid Mamba and Transformer Model for EHR Representation Learning	提出HyMaTE，结合Mamba和Transformer用于提升EHR表征学习效果	Mamba SSM state space model	✅
2	Dynamic Policy Induction for Adaptive Prompt Optimization: Bridging the Efficiency-Accuracy Gap via Lightweight Reinforcement Learning	提出Prompt Policy Network，通过轻量级强化学习自适应优化LLM Prompt策略，提升效率并保持精度。	reinforcement learning PPO large language model
3	InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions	提出InfMasking，通过对比多模态交互增强协同信息，提升多模态表征学习效果。	representation learning multimodal	✅
4	In-Context Compositional Q-Learning for Offline Reinforcement Learning	提出ICQL，利用上下文学习进行离线强化学习中的组合Q函数估计	reinforcement learning offline RL offline reinforcement learning
5	A Weather Foundation Model for the Power Grid	针对电网的定制化天气预报基础模型，提升极端天气事件预警能力。	MAE foundation model
6	MemMamba: Rethinking Memory Patterns in State Space Model	MemMamba：通过状态总结和跨层注意力，改进状态空间模型的长序列记忆能力。	Mamba state space model
7	Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression	揭示Mamba在上下文线性回归中模拟在线梯度下降的机理	Mamba SSM foundation model
8	Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm	提出Explore-Execute Chain框架，解耦规划与执行，提升LLM推理效率与可解释性。	reinforcement learning large language model chain-of-thought	✅
9	DRIK: Distribution-Robust Inductive Kriging without Information Leakage	DRIK：一种分布鲁棒的归纳克里金方法，避免信息泄露，提升时空数据泛化能力	MAE sparse sensors spatial relationship
10	GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning	GPS-MTM：利用自监督学习捕获GPS轨迹中的常态模式	trajectory transformer representation learning foundation model
11	Curriculum-Guided Reinforcement Learning for Synthesizing Gas-Efficient Financial Derivatives Contracts	提出基于课程学习的强化学习框架，用于合成高 gas 效率的金融衍生品智能合约。	reinforcement learning PPO
12	Adversarial Diffusion for Robust Reinforcement Learning	提出AD-RRL，利用对抗扩散模型提升强化学习在不确定环境中的鲁棒性	reinforcement learning model-based RL
13	SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention	提出SLA：一种可微调的稀疏线性注意力机制，加速Diffusion Transformer模型。	linear attention	✅
14	GeoFunFlow: Geometric Function Flow Matching for Inverse Operator Learning over Complex Geometries	提出GeoFunFlow以解决复杂几何体上的逆问题	flow matching
15	Bridging On-Device and Cloud LLMs for Collaborative Reasoning: A Unified Methodology for Local Routing and Post-Training	提出一种设备-云协同推理方法，通过强化学习提升端侧LLM的路由和推理能力。	reinforcement learning large language model
16	Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning	提出基于风险寻求乐观主义的多智能体强化学习方法，提升合作博弈性能	reinforcement learning
17	Guide: Generalized-Prior and Data Encoders for DAG Estimation	GUIDE：融合LLM先验与数据编码的DAG估计框架	reinforcement learning large language model
18	Space Group Conditional Flow Matching	提出空间群条件流匹配模型，用于生成具有高对称性的稳定晶体结构。	flow matching
19	An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms	提出模式感知批归一化(MA-BN)，提升离线Actor-Critic算法的稳定性和性能	reinforcement learning deep reinforcement learning DRL	✅
20	Why Alignment Must Precede Distillation: A Minimal Working Explanation	提出对齐先于蒸馏策略，解决知识蒸馏后模型对齐效果不佳的问题	distillation
21	Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation	DART：解耦训练与自适应数据管理，提升GUI智能体多轮强化学习效率	reinforcement learning policy learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

#	题目	一句话要点	标签	🔗
22	Tequila: Trapping-free Ternary Quantization for Large Language Models	Tequila：一种无死区陷阱的三元量化方法，用于加速大语言模型推理。	large language model	✅
23	Knowledge Homophily in Large Language Models	探索大语言模型中的知识同质性，并提出基于图神经网络的知识评估方法。	large language model
24	Estimating Time Series Foundation Model Transferability via In-Context Learning	TimeTic：基于上下文学习的时间序列预训练模型迁移性评估框架	foundation model
25	Large Language Models and Futures Price Factors in China	利用大型语言模型构建中国期货市场因子模型，显著提升投资组合表现	large language model
26	Disentanglement of Variations with Multimodal Generative Modeling	提出信息解耦多模态变分自编码器以解决生成质量问题	multimodal
27	The Impossibility of Inverse Permutation Learning in Transformer Models	证明了仅解码器Transformer无法学习逆排列，并提出了两种可行方案。	large language model chain-of-thought
28	Visual CoT Makes VLMs Smarter but More Fragile	揭示Visual CoT的脆弱性并提出鲁棒性增强方法，提升VQA模型抗噪能力	multimodal chain-of-thought
29	AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring	AQUAIR：用于智能水产养殖监测的高分辨率室内环境质量数据集	TAMP
30	Edge-FIT: Federated Instruction Tuning of Quantized LLMs for Privacy-Preserving Smart Home Environments	Edge-FIT：面向隐私保护智能家居环境的量化LLM联邦指令调优	large language model
31	MACE: A Hybrid LLM Serving System with Colocated SLO-aware Continuous Retraining Alignment	MACE：一种混合LLM服务系统，通过协同的SLO感知持续重训练对齐模型。	large language model
32	Brain-language fusion enables interactive neural readout and in-silico experimentation	CorText：脑-语言融合实现交互式神经解读与计算机实验	large language model
33	HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models	HiViS：通过对Drafter隐藏视觉tokens，加速视觉-语言模型中的推测解码。	large language model
34	Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings	提出动态正交持续微调(DOC)以缓解LLM持续学习中的灾难性遗忘	large language model	✅
35	Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know	提出贝叶斯MoE路由框架，提升LLM不确定性感知能力	large language model
36	IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting	IndexNet：针对时间序列预测，提出时间戳和变量感知的建模方法	TAMP
37	Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement	提出基于稀疏自编码器的向量精炼方法SAE-RSV，提升小样本下LLM引导向量的有效性。	large language model
38	FraudTransformer: Time-Aware GPT for Transaction Fraud Detection	FraudTransformer：面向交易欺诈检测的时序感知GPT模型	TAMP
39	Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs	提出基于合作博弈论的Transformer MLP神经元联盟分析方法，揭示模型内部特征编码机制。	large language model
40	Towards a Comprehensive Scaling Law of Mixture-of-Experts	针对MoE模型，提出综合性扩展法则，指导模型设计与训练。	large language model
41	Improving constraint-based discovery with robust propagation and reliable LLM priors	MosaCD：结合鲁棒传播与可靠LLM先验改进基于约束的因果发现	large language model
42	Efficient Turing Machine Simulation with Transformers	提出高效Transformer图灵机模拟方法，显著降低推理步数	chain-of-thought

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
43	STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning	STAIR：通过时序对齐偏好强化学习解决多阶段任务中的阶段错位问题	manipulation reinforcement learning policy learning

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
44	On the Separability of Information in Diffusion Models	扩散模型信息可分离性研究：揭示图像重建与类别信息的独立性	classifier-free guidance

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2025-09-28）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (21 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理