cs.LG(2025-10-07)

📊 共 46 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (21 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (20 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (21 篇)

#题目一句话要点标签🔗
1 Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data 提出Relational Transformer,实现关系型数据零样本迁移学习。 foundation model zero-shot transfer
2 Flexible Swarm Learning May Outpace Foundation Models in Essential Tasks 提出基于小智能体网络(SANs)的灵活群体学习,以应对动态环境中复杂系统的自适应建模难题。 foundation model
3 LLM-FS-Agent: A Deliberative Role-based Large Language Model Architecture for Transparent Feature Selection 提出LLM-FS-Agent,一种基于角色扮演的大语言模型架构,用于透明特征选择。 large language model
4 Influence Functions for Efficient Data Selection in Reasoning 提出基于影响函数的CoT数据选择方法,提升LLM推理性能 large language model chain-of-thought
5 Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density 揭示JEPAs的密度估计能力:通过高斯嵌入实现数据密度学习 multimodal
6 Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models 提出GradFix,通过梯度符号掩码实现跨预训练模型任务向量迁移 foundation model
7 Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin 揭示LLM中Attention Sink与压缩谷的关联,提出Mix-Compress-Refine信息流理论 large language model
8 Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting 提出时间序列预测的预训练模型组合方法,提升测试效率并保持精度。 foundation model
9 Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings 利用双曲嵌入曲率,提出几何感知后门攻击方法 foundation model
10 Training Dynamics Impact Post-Training Quantization Robustness 揭示训练动态对量化鲁棒性的影响,提出超参数干预提升量化质量 large language model
11 LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams 提出利用LLM作为策略无关代理,解决异构智能体团队中人机协作问题 large language model
12 lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models lm-Meter:揭示设备端语言模型的运行时推理延迟瓶颈 large language model
13 BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining 提出BLISS:一种轻量级的双层影响评分方法,用于语言模型预训练中的数据选择。 large language model
14 Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs 提出基于正确性的LLM解码策略,提升复杂推理任务性能 large language model
15 Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music 提出三步框架,用于从复音音乐中转录吉他音轨的节奏模式 foundation model
16 Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning 对比深度迁移学习中多种成员推理攻击的有效性,为隐私风险评估提供指导。 foundation model
17 From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs 针对多核NPU,提出LLM Serving的系统性优化方案,提升推理性能。 large language model
18 (Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs InfoRMIA:针对LLM的更强Token级成员推理与记忆评估方法 large language model
19 ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization ARMOR:通过自适应矩阵分解实现高性能半结构化剪枝 large language model
20 NorMuon: Making Muon more efficient and scalable 提出NorMuon优化器,结合正交化与神经元自适应学习率,提升大模型训练效率。 large language model
21 AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning 提出AMAQ:一种自适应混合精度激活量化方法,用于协同参数高效微调。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
22 Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment 提出RLHF-COV和DPO-COV算法,同时缓解离线和在线RLHF/DPO对齐中的数据污染、过度优化和冗余问题。 reinforcement learning RLHF DPO
23 EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models EARL:用于大型语言模型的高效Agentic强化学习系统 reinforcement learning large language model
24 Multimodal Trajectory Representation Learning for Travel Time Estimation 提出MDTI框架,融合多模态轨迹数据,提升出行时间预测精度。 representation learning multimodal
25 Primal-Dual Direct Preference Optimization for Constrained LLM Alignment 提出Primal-Dual DPO方法,用于约束大型语言模型对齐,提升安全性和效率。 DPO direct preference optimization large language model
26 Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing 提出SODA:一种语义一致的知识蒸馏方法,用于深度跨模态哈希 distillation multimodal
27 Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents 提出Stratified GRPO,解决LLM搜索Agent强化学习中结构异质性问题 reinforcement learning large language model
28 Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks 提出Lexical Policy Networks (LEXPOL),利用语言编码门控策略网络解决多任务强化学习问题。 reinforcement learning language conditioned
29 The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives 提出贝叶斯框架以验证和优化大语言模型目标 reinforcement learning inverse reinforcement learning RLHF
30 Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL 提出Failure-aware IRL,通过关注失败案例提升LLM对齐效果 reinforcement learning inverse reinforcement learning RLHF
31 GUIDE: Guided Initialization and Distillation of Embeddings 提出GUIDE:引导初始化和嵌入蒸馏,提升学生模型质量且无额外开销 teacher-student distillation
32 From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning 提出H-DSAC以解决现实世界自动驾驶的安全与效率问题 reinforcement learning policy learning
33 Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy 提出基于强化学习的专家策略编排方法,解决在线匹配问题。 reinforcement learning
34 Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization 通过Hellinger局部化,实现多轨迹下近乎实例最优的参数恢复 linear attention foundation model
35 Edit-Based Flow Matching for Temporal Point Processes 提出基于编辑操作的流匹配模型,用于提升时间点过程的生成效率与灵活性。 flow matching
36 Untangling Component Imbalance in Hybrid Linear Attention Conversion Methods 揭示混合线性注意力转换方法中的组件失衡问题并提出解决方案 linear attention
37 Deciphering Invariant Feature Decoupling in Source-free Time Series Forecasting with Proxy Denoising 提出TimePD,通过代理去噪解决无源时间序列预测中的不变特征解耦问题 distillation large language model
38 Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection 提出一种联邦学习框架FedCAPS,用于稳健且保护隐私的特征选择。 representation learning
39 Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation 提出Traj-Transformer,利用Transformer和扩散模型生成高质量GPS轨迹 trajectory transformer spatiotemporal
40 Monte Carlo Permutation Search 提出蒙特卡洛置换搜索(MCPS)算法,提升通用游戏AI在算力有限场景下的性能。 reinforcement learning deep reinforcement learning
41 Implicit Updates for Average-Reward Temporal Difference Learning 提出平均奖励隐式TD(λ)算法,提升时序差分学习的数值稳定性和效率 reinforcement learning policy learning

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
42 Differentiable Model Predictive Control on the GPU 提出GPU加速的可微模型预测控制,提升强化学习和模仿学习训练速度。 MPC model predictive control reinforcement learning
43 Reference Grounded Skill Discovery 提出参考引导的技能发现算法RGSD,用于高自由度智能体技能学习。 humanoid locomotion imitation learning

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
44 On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond 提出任意过程生成方法,扩展扩散模型能力,解决自回归模型难以处理的复杂推理问题。 MDM
45 Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies 提出基于KL正则化MDP的离散扩散模型Unmasking策略学习方法,显著提升性能。 MDM

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
46 BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression BlockGPT:基于帧级别自回归的时空降雨建模,显著提升预测精度与速度。 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页