cs.LG(2025-09-30)

📊 共 70 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (37 🔗10) 支柱二:RL算法与架构 (RL & Architecture) (27 🔗4) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (37 篇)

#题目一句话要点标签🔗
1 Massively Multimodal Foundation Models: A Framework for Capturing Dependencies with Specialized Mixture-of-Experts 提出基于专家混合模型的大规模多模态框架,利用时序依赖指导路由。 foundation model multimodal
2 DecepChain: Inducing Deceptive Reasoning in Large Language Models DecepChain:诱导大语言模型产生具有欺骗性的推理链 large language model chain-of-thought
3 MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation 提出MultiFair,通过双层梯度调制实现多模态医学分类的平衡公平性。 multimodal
4 Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models 提出FreeDave算法,实现扩散大语言模型无损并行解码加速。 large language model
5 Large Language Models Inference Engines based on Spiking Neural Networks 提出NeurTransformer,一种基于脉冲神经网络的大语言模型推理引擎设计方法。 large language model
6 AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond AccidentBench:构建大规模多模态基准,评估车辆事故及其他安全场景下的理解与推理能力 multimodal
7 Memory-Driven Self-Improvement for Decision Making with Large Language Models 提出基于记忆驱动的自提升框架,提升LLM在序贯决策任务中的性能 large language model
8 NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training 提出NeuroTTT,通过测试时训练桥接脑电图预训练模型与下游任务的错位问题 foundation model
9 MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning 提出MIDAS,通过不一致数据增强解决多模态不平衡学习问题 multimodal
10 Kairos: Towards Adaptive and Generalizable Time Series Foundation Models Kairos:面向自适应和泛化时间序列的动态基础模型 foundation model
11 Layer-wise dynamic rank for compressing large language models 提出D-Rank:一种层间动态秩分配的LLM压缩框架,提升压缩性能。 large language model
12 Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training 揭示LLM视觉先验:通过语言预训练学习视觉感知与推理能力 large language model multimodal
13 ACT: Agentic Classification Tree 提出Agentic Classification Tree (ACT),利用LLM为非结构化数据构建可解释决策树。 large language model chain-of-thought
14 Attribution-Guided Decoding 提出基于归因引导的解码方法(AGD),提升LLM指令遵循和知识准确性。 large language model instruction following
15 Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking 提出Expert Merging,通过无监督专家对齐和重要性引导的分层分块实现模型合并。 large language model multimodal
16 Adaptive and Resource-efficient Agentic AI Systems for Mobile and Embedded Devices: A Survey 针对移动和嵌入式设备,提出自适应且资源高效的Agentic AI系统综述 foundation model multimodal
17 LLM-Generated Samples for Android Malware Detection 利用LLM生成样本增强Android恶意软件检测,提升稀疏数据集性能。 large language model
18 In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks 提出上下文好奇心机制,增强决策预训练Transformer在Bandit任务中的泛化能力 large language model
19 Which Programming Language and Model Work Best With LLM-as-a-Judge For Code Retrieval? 研究代码检索中,编程语言和模型对LLM评判效果的影响,并提出迁移学习方法。 large language model
20 From Trace to Line: LLM Agent for Real-World OSS Vulnerability Localization T2L-Agent:利用LLM和运行时信息实现开源软件漏洞的行级精确定位 large language model
21 DiSC-AMC: Token- and Parameter-Efficient Discretized Statistics In-Context Automatic Modulation Classification DiSC-AMC:面向token和参数高效的离散化统计量上下文自动调制分类 large language model
22 Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT 提出ACT-ViT,利用激活张量检测大语言模型中的幻觉问题 large language model
23 Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management 评估LLM在运营管理中作为人类行为模拟器的能力:效果预测与分布对齐 chain-of-thought
24 The Pitfalls of KV Cache Compression 揭示KV缓存压缩在多指令场景下的缺陷,并提出改进方案 instruction following
25 Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space 提出Thoughtbubbles,一种在隐空间进行并行自适应计算的无监督Transformer方法。 chain-of-thought
26 LoRAFusion: Efficient LoRA Fine-Tuning for LLMs LoRAFusion:面向LLM的高效LoRA微调系统,加速单任务和多任务微调。 large language model
27 GRPO-$λ$: Credit Assignment improves LLM Reasoning GRPO-λ:通过改进信用分配提升大型语言模型的推理能力 large language model
28 PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning PrunedLoRA:通过梯度结构化剪枝,为微调中的低秩自适应提供鲁棒性。 large language model
29 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls Transformer难以学习乘法:逆向工程揭示长程依赖的陷阱 chain-of-thought
30 Estimating Dimensionality of Neural Representations from Finite Samples 提出一种偏差校正的维度估计器,用于解决神经表征维度估计中样本量依赖问题。 large language model
31 TASP: Topology-aware Sequence Parallelism 提出TASP,利用拓扑感知序列并行加速长文本大模型训练。 large language model
32 AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size AdaBlock-dLLM:通过自适应块大小实现语义感知的扩散LLM推理 large language model
33 Are neural scaling laws leading quantum chemistry astray? 揭示神经标度律在量子化学中面临的挑战:单纯扩大模型和数据规模不足以保证可靠性 foundation model
34 Beyond Linear Probes: Dynamic Safety Monitoring for Language Models 提出截断多项式分类器,用于大语言模型动态安全监控,实现计算效率与安全性的平衡。 large language model
35 Muon Outperforms Adam in Tail-End Associative Memory Learning Muon优化器在长尾关联记忆学习中优于Adam,提升尾部类别学习效果 large language model
36 Better Privilege Separation for Agents by Restricting Data Types 提出类型约束特权分离方法,系统性防御AI Agent中的提示注入攻击。 large language model
37 Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space 提出旋转控制卸载(RCU)方法,解决LLM持续卸载中的灾难性效用损失问题。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (27 篇)

#题目一句话要点标签🔗
38 Distillation of Large Language Models via Concrete Score Matching 提出Concrete Score Distillation,解决LLM蒸馏中logit信息损失和解空间限制问题 distillation large language model instruction following
39 Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models 揭示PPO/GRPO中裁剪机制对LLM强化学习熵的影响,提出clip-low增加探索。 reinforcement learning PPO large language model
40 OPPO: Accelerating PPO-based RLHF via Pipeline Overlap OPPO:通过流水线重叠加速基于PPO的RLHF训练 reinforcement learning PPO RLHF
41 Thin Bridges for Drug Text Alignment: Lightweight Contrastive Learning for Target Specific Drug Retrieval 提出轻量级对比学习桥接方法,用于靶点特异性药物文本对齐与检索。 contrastive learning foundation model multimodal
42 Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models 提出递归自聚合(RSA)方法,提升大语言模型在推理时的深度思考能力。 reinforcement learning large language model
43 TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning 提出TAP:联邦学习中多任务多模态基础模型的两阶段自适应个性化方法 distillation foundation model
44 CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models CAST:面向大语言模型的连续可微半结构化稀疏训练框架 distillation large language model
45 Data-to-Energy Stochastic Dynamics 提出数据到能量的随机动力学方法,解决无数据样本下的薛定谔桥问题。 reinforcement learning flow matching multimodal
46 Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners 提出TFPI,加速RLVR训练,提升推理模型效率与性能 reinforcement learning distillation chain-of-thought
47 Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space 提出ExploRLer,通过探索稀疏参数空间提升On-Policy强化学习效率 reinforcement learning PPO
48 Less is More: Towards Simple Graph Contrastive Learning 提出一种简化的图对比学习方法,有效解决异质图上的表示学习问题 representation learning contrastive learning
49 Boundary-to-Region Supervision for Offline Safe Reinforcement Learning 提出B2R框架,通过非对称条件反射解决离线安全强化学习中的安全约束问题 reinforcement learning
50 Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces 提出自适应澄清强化学习,解决视觉-语言接口中信息缺失问题 reinforcement learning
51 Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation 提出UOT-RFM以解决长尾分布生成中的偏差问题 flow matching
52 Debunk the Myth of SFT Generalization 通过提示多样性和CoT,SFT在决策任务中可实现与RL相当的泛化能力 reinforcement learning chain-of-thought
53 Directed-MAML: Meta Reinforcement Learning Algorithm with Task-directed Approximation 提出Directed-MAML,通过任务导向近似加速元强化学习收敛并降低计算成本。 reinforcement learning
54 Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models AttnRL:基于注意力机制的强化学习框架,提升推理模型的过程监督探索效率 reinforcement learning large language model
55 Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning 提出条件奖励建模(CRM)以提升LLM推理能力,解决过程奖励模型的局限性。 reinforcement learning large language model
56 Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning 扩展Robbins-Siegmund定理,解决强化学习中非可和零阶项收敛性问题 reinforcement learning
57 Alignment-Aware Decoding 提出对齐感知解码(AAD),在推理阶段提升大语言模型的对齐效果。 DPO large language model
58 RL-Guided Data Selection for Language Model Finetuning 提出基于强化学习的数据选择方法,提升大语言模型微调效率与性能。 reinforcement learning large language model
59 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Knapsack RL:通过优化预算分配解锁LLM的探索能力 reinforcement learning large language model
60 Learning to Reason as Action Abstractions with Scalable Mid-Training RL 提出RA3算法,通过可扩展的中期训练强化学习提升代码生成任务性能 reinforcement learning large language model
61 Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse 提出AR3PO以解决RLVR中的采样效率问题 reinforcement learning large language model
62 Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling 提出可微自编码神经算子(DIANO),用于可解释和可集成的隐空间建模。 latent dynamics spatiotemporal
63 Accelerating Transformers in Online RL 提出基于加速器策略的Transformer在线强化学习方法,提升训练稳定性和速度。 reinforcement learning behavior cloning
64 Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access 提出Informed Asymmetric Actor-Critic,利用特权信号提升部分可观测环境下的强化学习。 reinforcement learning privileged information

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
65 Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models 提出RevAm,基于强化学习优化扩散模型轨迹,恢复被擦除的概念 manipulation trajectory optimization
66 Noise-Guided Transport for Imitation Learning 提出噪声引导传输(NGT),解决低数据量模仿学习中的专家策略学习问题。 humanoid imitation learning

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
67 Parametric Neural Amp Modeling with Active Learning 提出基于主动学习的参数化神经吉他放大器建模框架Panama AMP
68 Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN 提出T-BiGAN,用于电力系统PMU数据中时空异常的无监督检测。 spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
69 Physics-Informed Learning for Human Whole-Body Kinematics Prediction via Sparse IMUs 提出基于物理信息的稀疏IMU人体全身运动学预测方法,用于人机协作。 human motion human motion prediction motion prediction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
70 DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick 提出DiVeQ,利用重参数化技巧实现可微向量量化,提升VQ-VAE和VQGAN性能。 VQ-VAE

⬅️ 返回 cs.LG 首页 · 🏠 返回主页