cs.LG（2025-10-07）

📊 共 46 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (21 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (20 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱四：生成式动作 (Generative Motion) (2) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

#	题目	一句话要点	标签	🔗
1	Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data	提出Relational Transformer，实现关系型数据零样本迁移学习。	foundation model zero-shot transfer
2	Flexible Swarm Learning May Outpace Foundation Models in Essential Tasks	提出基于小智能体网络（SANs）的灵活群体学习，以应对动态环境中复杂系统的自适应建模难题。	foundation model
3	LLM-FS-Agent: A Deliberative Role-based Large Language Model Architecture for Transparent Feature Selection	提出LLM-FS-Agent，一种基于角色扮演的大语言模型架构，用于透明特征选择。	large language model
4	Influence Functions for Efficient Data Selection in Reasoning	提出基于影响函数的CoT数据选择方法，提升LLM推理性能	large language model chain-of-thought
5	Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density	揭示JEPAs的密度估计能力：通过高斯嵌入实现数据密度学习	multimodal
6	Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models	提出GradFix，通过梯度符号掩码实现跨预训练模型任务向量迁移	foundation model
7	Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin	揭示LLM中Attention Sink与压缩谷的关联，提出Mix-Compress-Refine信息流理论	large language model
8	Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting	提出时间序列预测的预训练模型组合方法，提升测试效率并保持精度。	foundation model
9	Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings	利用双曲嵌入曲率，提出几何感知后门攻击方法	foundation model
10	Training Dynamics Impact Post-Training Quantization Robustness	揭示训练动态对量化鲁棒性的影响，提出超参数干预提升量化质量	large language model
11	LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams	提出利用LLM作为策略无关代理，解决异构智能体团队中人机协作问题	large language model
12	lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models	lm-Meter：揭示设备端语言模型的运行时推理延迟瓶颈	large language model	✅
13	BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining	提出BLISS：一种轻量级的双层影响评分方法，用于语言模型预训练中的数据选择。	large language model
14	Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs	提出基于正确性的LLM解码策略，提升复杂推理任务性能	large language model
15	Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music	提出三步框架，用于从复音音乐中转录吉他音轨的节奏模式	foundation model
16	Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning	对比深度迁移学习中多种成员推理攻击的有效性，为隐私风险评估提供指导。	foundation model
17	From Principles to Practice: A Systematic Study of LLM Serving on Multi-core NPUs	针对多核NPU，提出LLM Serving的系统性优化方案，提升推理性能。	large language model
18	(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs	InfoRMIA：针对LLM的更强Token级成员推理与记忆评估方法	large language model
19	ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization	ARMOR：通过自适应矩阵分解实现高性能半结构化剪枝	large language model
20	NorMuon: Making Muon more efficient and scalable	提出NorMuon优化器，结合正交化与神经元自适应学习率，提升大模型训练效率。	large language model
21	AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning	提出AMAQ：一种自适应混合精度激活量化方法，用于协同参数高效微调。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

#	题目	一句话要点	标签	🔗
22	Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment	提出RLHF-COV和DPO-COV算法，同时缓解离线和在线RLHF/DPO对齐中的数据污染、过度优化和冗余问题。	reinforcement learning RLHF DPO
23	EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models	EARL：用于大型语言模型的高效Agentic强化学习系统	reinforcement learning large language model
24	Multimodal Trajectory Representation Learning for Travel Time Estimation	提出MDTI框架，融合多模态轨迹数据，提升出行时间预测精度。	representation learning multimodal	✅
25	Primal-Dual Direct Preference Optimization for Constrained LLM Alignment	提出Primal-Dual DPO方法，用于约束大型语言模型对齐，提升安全性和效率。	DPO direct preference optimization large language model
26	Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing	提出SODA：一种语义一致的知识蒸馏方法，用于深度跨模态哈希	distillation multimodal
27	Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents	提出Stratified GRPO，解决LLM搜索Agent强化学习中结构异质性问题	reinforcement learning large language model
28	Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks	提出Lexical Policy Networks (LEXPOL)，利用语言编码门控策略网络解决多任务强化学习问题。	reinforcement learning language conditioned
29	The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives	提出贝叶斯框架以验证和优化大语言模型目标	reinforcement learning inverse reinforcement learning RLHF
30	Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL	提出Failure-aware IRL，通过关注失败案例提升LLM对齐效果	reinforcement learning inverse reinforcement learning RLHF
31	GUIDE: Guided Initialization and Distillation of Embeddings	提出GUIDE：引导初始化和嵌入蒸馏，提升学生模型质量且无额外开销	teacher-student distillation
32	From Learning to Mastery: Achieving Safe and Efficient Real-World Autonomous Driving with Human-In-The-Loop Reinforcement Learning	提出H-DSAC以解决现实世界自动驾驶的安全与效率问题	reinforcement learning policy learning
33	Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy	提出基于强化学习的专家策略编排方法，解决在线匹配问题。	reinforcement learning
34	Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization	通过Hellinger局部化，实现多轨迹下近乎实例最优的参数恢复	linear attention foundation model
35	Edit-Based Flow Matching for Temporal Point Processes	提出基于编辑操作的流匹配模型，用于提升时间点过程的生成效率与灵活性。	flow matching
36	Untangling Component Imbalance in Hybrid Linear Attention Conversion Methods	揭示混合线性注意力转换方法中的组件失衡问题并提出解决方案	linear attention
37	Deciphering Invariant Feature Decoupling in Source-free Time Series Forecasting with Proxy Denoising	提出TimePD，通过代理去噪解决无源时间序列预测中的不变特征解耦问题	distillation large language model
38	Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection	提出一种联邦学习框架FedCAPS，用于稳健且保护隐私的特征选择。	representation learning
39	Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation	提出Traj-Transformer，利用Transformer和扩散模型生成高质量GPS轨迹	trajectory transformer spatiotemporal
40	Monte Carlo Permutation Search	提出蒙特卡洛置换搜索(MCPS)算法，提升通用游戏AI在算力有限场景下的性能。	reinforcement learning deep reinforcement learning
41	Implicit Updates for Average-Reward Temporal Difference Learning	提出平均奖励隐式TD(λ)算法，提升时序差分学习的数值稳定性和效率	reinforcement learning policy learning

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
42	Differentiable Model Predictive Control on the GPU	提出GPU加速的可微模型预测控制，提升强化学习和模仿学习训练速度。	MPC model predictive control reinforcement learning
43	Reference Grounded Skill Discovery	提出参考引导的技能发现算法RGSD，用于高自由度智能体技能学习。	humanoid locomotion imitation learning

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
44	On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond	提出任意过程生成方法，扩展扩散模型能力，解决自回归模型难以处理的复杂推理问题。	MDM
45	Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies	提出基于KL正则化MDP的离散扩散模型Unmasking策略学习方法，显著提升性能。	MDM

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
46	BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression	BlockGPT：基于帧级别自回归的时空降雨建模，显著提升预测精度与速度。	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2025-10-07）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理