cs.LG(2025-09-28)

📊 共 44 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (21 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (21 🔗2) 支柱一:机器人控制 (Robot Control) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (21 篇)

#题目一句话要点标签🔗
1 HyMaTE: A Hybrid Mamba and Transformer Model for EHR Representation Learning 提出HyMaTE,结合Mamba和Transformer用于提升EHR表征学习效果 Mamba SSM state space model
2 Dynamic Policy Induction for Adaptive Prompt Optimization: Bridging the Efficiency-Accuracy Gap via Lightweight Reinforcement Learning 提出Prompt Policy Network,通过轻量级强化学习自适应优化LLM Prompt策略,提升效率并保持精度。 reinforcement learning PPO large language model
3 InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions 提出InfMasking,通过对比多模态交互增强协同信息,提升多模态表征学习效果。 representation learning multimodal
4 In-Context Compositional Q-Learning for Offline Reinforcement Learning 提出ICQL,利用上下文学习进行离线强化学习中的组合Q函数估计 reinforcement learning offline RL offline reinforcement learning
5 A Weather Foundation Model for the Power Grid 针对电网的定制化天气预报基础模型,提升极端天气事件预警能力。 MAE foundation model
6 MemMamba: Rethinking Memory Patterns in State Space Model MemMamba:通过状态总结和跨层注意力,改进状态空间模型的长序列记忆能力。 Mamba state space model
7 Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression 揭示Mamba在上下文线性回归中模拟在线梯度下降的机理 Mamba SSM foundation model
8 Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm 提出Explore-Execute Chain框架,解耦规划与执行,提升LLM推理效率与可解释性。 reinforcement learning large language model chain-of-thought
9 DRIK: Distribution-Robust Inductive Kriging without Information Leakage DRIK:一种分布鲁棒的归纳克里金方法,避免信息泄露,提升时空数据泛化能力 MAE sparse sensors spatial relationship
10 GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning GPS-MTM:利用自监督学习捕获GPS轨迹中的常态模式 trajectory transformer representation learning foundation model
11 Curriculum-Guided Reinforcement Learning for Synthesizing Gas-Efficient Financial Derivatives Contracts 提出基于课程学习的强化学习框架,用于合成高 gas 效率的金融衍生品智能合约。 reinforcement learning PPO
12 Adversarial Diffusion for Robust Reinforcement Learning 提出AD-RRL,利用对抗扩散模型提升强化学习在不确定环境中的鲁棒性 reinforcement learning model-based RL
13 SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention 提出SLA:一种可微调的稀疏线性注意力机制,加速Diffusion Transformer模型。 linear attention
14 GeoFunFlow: Geometric Function Flow Matching for Inverse Operator Learning over Complex Geometries 提出GeoFunFlow以解决复杂几何体上的逆问题 flow matching
15 Bridging On-Device and Cloud LLMs for Collaborative Reasoning: A Unified Methodology for Local Routing and Post-Training 提出一种设备-云协同推理方法,通过强化学习提升端侧LLM的路由和推理能力。 reinforcement learning large language model
16 Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning 提出基于风险寻求乐观主义的多智能体强化学习方法,提升合作博弈性能 reinforcement learning
17 Guide: Generalized-Prior and Data Encoders for DAG Estimation GUIDE:融合LLM先验与数据编码的DAG估计框架 reinforcement learning large language model
18 Space Group Conditional Flow Matching 提出空间群条件流匹配模型,用于生成具有高对称性的稳定晶体结构。 flow matching
19 An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms 提出模式感知批归一化(MA-BN),提升离线Actor-Critic算法的稳定性和性能 reinforcement learning deep reinforcement learning DRL
20 Why Alignment Must Precede Distillation: A Minimal Working Explanation 提出对齐先于蒸馏策略,解决知识蒸馏后模型对齐效果不佳的问题 distillation
21 Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation DART:解耦训练与自适应数据管理,提升GUI智能体多轮强化学习效率 reinforcement learning policy learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (21 篇)

#题目一句话要点标签🔗
22 Tequila: Trapping-free Ternary Quantization for Large Language Models Tequila:一种无死区陷阱的三元量化方法,用于加速大语言模型推理。 large language model
23 Knowledge Homophily in Large Language Models 探索大语言模型中的知识同质性,并提出基于图神经网络的知识评估方法。 large language model
24 Estimating Time Series Foundation Model Transferability via In-Context Learning TimeTic:基于上下文学习的时间序列预训练模型迁移性评估框架 foundation model
25 Large Language Models and Futures Price Factors in China 利用大型语言模型构建中国期货市场因子模型,显著提升投资组合表现 large language model
26 Disentanglement of Variations with Multimodal Generative Modeling 提出信息解耦多模态变分自编码器以解决生成质量问题 multimodal
27 The Impossibility of Inverse Permutation Learning in Transformer Models 证明了仅解码器Transformer无法学习逆排列,并提出了两种可行方案。 large language model chain-of-thought
28 Visual CoT Makes VLMs Smarter but More Fragile 揭示Visual CoT的脆弱性并提出鲁棒性增强方法,提升VQA模型抗噪能力 multimodal chain-of-thought
29 AQUAIR: A High-Resolution Indoor Environmental Quality Dataset for Smart Aquaculture Monitoring AQUAIR:用于智能水产养殖监测的高分辨率室内环境质量数据集 TAMP
30 Edge-FIT: Federated Instruction Tuning of Quantized LLMs for Privacy-Preserving Smart Home Environments Edge-FIT:面向隐私保护智能家居环境的量化LLM联邦指令调优 large language model
31 MACE: A Hybrid LLM Serving System with Colocated SLO-aware Continuous Retraining Alignment MACE:一种混合LLM服务系统,通过协同的SLO感知持续重训练对齐模型。 large language model
32 Brain-language fusion enables interactive neural readout and in-silico experimentation CorText:脑-语言融合实现交互式神经解读与计算机实验 large language model
33 HiViS: Hiding Visual Tokens from the Drafter for Speculative Decoding in Vision-Language Models HiViS:通过对Drafter隐藏视觉tokens,加速视觉-语言模型中的推测解码。 large language model
34 Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings 提出动态正交持续微调(DOC)以缓解LLM持续学习中的灾难性遗忘 large language model
35 Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know 提出贝叶斯MoE路由框架,提升LLM不确定性感知能力 large language model
36 IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting IndexNet:针对时间序列预测,提出时间戳和变量感知的建模方法 TAMP
37 Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement 提出基于稀疏自编码器的向量精炼方法SAE-RSV,提升小样本下LLM引导向量的有效性。 large language model
38 FraudTransformer: Time-Aware GPT for Transaction Fraud Detection FraudTransformer:面向交易欺诈检测的时序感知GPT模型 TAMP
39 Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs 提出基于合作博弈论的Transformer MLP神经元联盟分析方法,揭示模型内部特征编码机制。 large language model
40 Towards a Comprehensive Scaling Law of Mixture-of-Experts 针对MoE模型,提出综合性扩展法则,指导模型设计与训练。 large language model
41 Improving constraint-based discovery with robust propagation and reliable LLM priors MosaCD:结合鲁棒传播与可靠LLM先验改进基于约束的因果发现 large language model
42 Efficient Turing Machine Simulation with Transformers 提出高效Transformer图灵机模拟方法,显著降低推理步数 chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
43 STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning STAIR:通过时序对齐偏好强化学习解决多阶段任务中的阶段错位问题 manipulation reinforcement learning policy learning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
44 On the Separability of Information in Diffusion Models 扩散模型信息可分离性研究:揭示图像重建与类别信息的独立性 classifier-free guidance

⬅️ 返回 cs.LG 首页 · 🏠 返回主页