cs.LG（2025-09-30）

📊 共 70 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (37 🔗10) 支柱二：RL算法与架构 (RL & Architecture) (27 🔗4) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (2) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (37 篇)

#	题目	一句话要点	标签	🔗
1	Massively Multimodal Foundation Models: A Framework for Capturing Dependencies with Specialized Mixture-of-Experts	提出基于专家混合模型的大规模多模态框架，利用时序依赖指导路由。	foundation model multimodal
2	DecepChain: Inducing Deceptive Reasoning in Large Language Models	DecepChain：诱导大语言模型产生具有欺骗性的推理链	large language model chain-of-thought	✅
3	MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation	提出MultiFair，通过双层梯度调制实现多模态医学分类的平衡公平性。	multimodal
4	Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models	提出FreeDave算法，实现扩散大语言模型无损并行解码加速。	large language model
5	Large Language Models Inference Engines based on Spiking Neural Networks	提出NeurTransformer，一种基于脉冲神经网络的大语言模型推理引擎设计方法。	large language model
6	AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond	AccidentBench：构建大规模多模态基准，评估车辆事故及其他安全场景下的理解与推理能力	multimodal	✅
7	Memory-Driven Self-Improvement for Decision Making with Large Language Models	提出基于记忆驱动的自提升框架，提升LLM在序贯决策任务中的性能	large language model
8	NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training	提出NeuroTTT，通过测试时训练桥接脑电图预训练模型与下游任务的错位问题	foundation model	✅
9	MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning	提出MIDAS，通过不一致数据增强解决多模态不平衡学习问题	multimodal
10	Kairos: Towards Adaptive and Generalizable Time Series Foundation Models	Kairos：面向自适应和泛化时间序列的动态基础模型	foundation model	✅
11	Layer-wise dynamic rank for compressing large language models	提出D-Rank：一种层间动态秩分配的LLM压缩框架，提升压缩性能。	large language model
12	Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training	揭示LLM视觉先验：通过语言预训练学习视觉感知与推理能力	large language model multimodal
13	ACT: Agentic Classification Tree	提出Agentic Classification Tree (ACT)，利用LLM为非结构化数据构建可解释决策树。	large language model chain-of-thought
14	Attribution-Guided Decoding	提出基于归因引导的解码方法(AGD)，提升LLM指令遵循和知识准确性。	large language model instruction following
15	Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking	提出Expert Merging，通过无监督专家对齐和重要性引导的分层分块实现模型合并。	large language model multimodal	✅
16	Adaptive and Resource-efficient Agentic AI Systems for Mobile and Embedded Devices: A Survey	针对移动和嵌入式设备，提出自适应且资源高效的Agentic AI系统综述	foundation model multimodal
17	LLM-Generated Samples for Android Malware Detection	利用LLM生成样本增强Android恶意软件检测，提升稀疏数据集性能。	large language model
18	In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks	提出上下文好奇心机制，增强决策预训练Transformer在Bandit任务中的泛化能力	large language model
19	Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval?	研究代码检索中，编程语言和模型对LLM评判效果的影响，并提出迁移学习方法。	large language model	✅
20	From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization	T2L-Agent：利用LLM和运行时信息实现开源软件漏洞的行级精确定位	large language model
21	DiSC-AMC: Token- and Parameter-Efficient Discretized Statistics In-Context Automatic Modulation Classification	DiSC-AMC：面向token和参数高效的离散化统计量上下文自动调制分类	large language model
22	Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT	提出ACT-ViT，利用激活张量检测大语言模型中的幻觉问题	large language model	✅
23	Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management	评估LLM在运营管理中作为人类行为模拟器的能力：效果预测与分布对齐	chain-of-thought
24	The Pitfalls of KV Cache Compression	揭示KV缓存压缩在多指令场景下的缺陷，并提出改进方案	instruction following
25	Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space	提出Thoughtbubbles，一种在隐空间进行并行自适应计算的无监督Transformer方法。	chain-of-thought
26	LoRAFusion: Efficient LoRA Fine-Tuning for LLMs	LoRAFusion：面向LLM的高效LoRA微调系统，加速单任务和多任务微调。	large language model	✅
27	GRPO-$λ$: Credit Assignment improves LLM Reasoning	GRPO-λ：通过改进信用分配提升大型语言模型的推理能力	large language model
28	PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning	PrunedLoRA：通过梯度结构化剪枝，为微调中的低秩自适应提供鲁棒性。	large language model
29	Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls	Transformer难以学习乘法：逆向工程揭示长程依赖的陷阱	chain-of-thought
30	Estimating Dimensionality of Neural Representations from Finite Samples	提出一种偏差校正的维度估计器，用于解决神经表征维度估计中样本量依赖问题。	large language model
31	TASP: Topology-aware Sequence Parallelism	提出TASP，利用拓扑感知序列并行加速长文本大模型训练。	large language model	✅
32	AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size	AdaBlock-dLLM：通过自适应块大小实现语义感知的扩散LLM推理	large language model
33	Are neural scaling laws leading quantum chemistry astray?	揭示神经标度律在量子化学中面临的挑战：单纯扩大模型和数据规模不足以保证可靠性	foundation model
34	Beyond Linear Probes: Dynamic Safety Monitoring for Language Models	提出截断多项式分类器，用于大语言模型动态安全监控，实现计算效率与安全性的平衡。	large language model	✅
35	Muon Outperforms Adam in Tail-End Associative Memory Learning	Muon优化器在长尾关联记忆学习中优于Adam，提升尾部类别学习效果	large language model
36	Better Privilege Separation for Agents by Restricting Data Types	提出类型约束特权分离方法，系统性防御AI Agent中的提示注入攻击。	large language model
37	Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space	提出旋转控制卸载（RCU）方法，解决LLM持续卸载中的灾难性效用损失问题。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (27 篇)

#	题目	一句话要点	标签	🔗
38	Distillation of Large Language Models via Concrete Score Matching	提出Concrete Score Distillation，解决LLM蒸馏中logit信息损失和解空间限制问题	distillation large language model instruction following
39	Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models	揭示PPO/GRPO中裁剪机制对LLM强化学习熵的影响，提出clip-low增加探索。	reinforcement learning PPO large language model
40	OPPO: Accelerating PPO-based RLHF via Pipeline Overlap	OPPO：通过流水线重叠加速基于PPO的RLHF训练	reinforcement learning PPO RLHF
41	Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval	提出轻量级对比学习桥接方法，用于靶点特异性药物文本对齐与检索。	contrastive learning foundation model multimodal
42	Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models	提出递归自聚合(RSA)方法，提升大语言模型在推理时的深度思考能力。	reinforcement learning large language model	✅
43	TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning	提出TAP：联邦学习中多任务多模态基础模型的两阶段自适应个性化方法	distillation foundation model
44	CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models	CAST：面向大语言模型的连续可微半结构化稀疏训练框架	distillation large language model
45	Data-to-Energy Stochastic Dynamics	提出数据到能量的随机动力学方法，解决无数据样本下的薛定谔桥问题。	reinforcement learning flow matching multimodal	✅
46	Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners	提出TFPI，加速RLVR训练，提升推理模型效率与性能	reinforcement learning distillation chain-of-thought
47	Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space	提出ExploRLer，通过探索稀疏参数空间提升On-Policy强化学习效率	reinforcement learning PPO
48	Less is More: Towards Simple Graph Contrastive Learning	提出一种简化的图对比学习方法，有效解决异质图上的表示学习问题	representation learning contrastive learning
49	Boundary-to-Region Supervision for Offline Safe Reinforcement Learning	提出B2R框架，通过非对称条件反射解决离线安全强化学习中的安全约束问题	reinforcement learning	✅
50	Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces	提出自适应澄清强化学习，解决视觉-语言接口中信息缺失问题	reinforcement learning
51	Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation	提出UOT-RFM以解决长尾分布生成中的偏差问题	flow matching
52	Debunk the Myth of SFT Generalization	通过提示多样性和CoT，SFT在决策任务中可实现与RL相当的泛化能力	reinforcement learning chain-of-thought	✅
53	Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation	提出Directed-MAML，通过任务导向近似加速元强化学习收敛并降低计算成本。	reinforcement learning
54	Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models	AttnRL：基于注意力机制的强化学习框架，提升推理模型的过程监督探索效率	reinforcement learning large language model
55	Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning	提出条件奖励建模（CRM）以提升LLM推理能力，解决过程奖励模型的局限性。	reinforcement learning large language model
56	Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning	扩展Robbins-Siegmund定理，解决强化学习中非可和零阶项收敛性问题	reinforcement learning
57	Alignment-Aware Decoding	提出对齐感知解码(AAD)，在推理阶段提升大语言模型的对齐效果。	DPO large language model
58	RL-Guided Data Selection for Language Model Finetuning	提出基于强化学习的数据选择方法，提升大语言模型微调效率与性能。	reinforcement learning large language model
59	Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation	Knapsack RL：通过优化预算分配解锁LLM的探索能力	reinforcement learning large language model
60	Learning to Reason as Action Abstractions with Scalable Mid-Training RL	提出RA3算法，通过可扩展的中期训练强化学习提升代码生成任务性能	reinforcement learning large language model
61	Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse	提出AR3PO以解决RLVR中的采样效率问题	reinforcement learning large language model
62	Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling	提出可微自编码神经算子(DIANO)，用于可解释和可集成的隐空间建模。	latent dynamics spatiotemporal
63	Accelerating Transformers in Online RL	提出基于加速器策略的Transformer在线强化学习方法，提升训练稳定性和速度。	reinforcement learning behavior cloning
64	Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access	提出Informed Asymmetric Actor-Critic，利用特权信号提升部分可观测环境下的强化学习。	reinforcement learning privileged information

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
65	Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models	提出RevAm，基于强化学习优化扩散模型轨迹，恢复被擦除的概念	manipulation trajectory optimization
66	Noise-Guided Transport for Imitation Learning	提出噪声引导传输(NGT)，解决低数据量模仿学习中的专家策略学习问题。	humanoid imitation learning	✅

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
67	Parametric Neural Amp Modeling with Active Learning	提出基于主动学习的参数化神经吉他放大器建模框架Panama	AMP
68	Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN	提出T-BiGAN，用于电力系统PMU数据中时空异常的无监督检测。	spatiotemporal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
69	Physics-Informed Learning for Human Whole-Body Kinematics Prediction via Sparse IMUs	提出基于物理信息的稀疏IMU人体全身运动学预测方法，用于人机协作。	human motion human motion prediction motion prediction

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
70	DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick	提出DiVeQ，利用重参数化技巧实现可微向量量化，提升VQ-VAE和VQGAN性能。	VQ-VAE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2025-09-30）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (37 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (27 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理