cs.LG(2026-02-06)

📊 共 44 篇论文

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (24) 支柱二:RL算法与架构 (RL & Architecture) (12) 支柱一:机器人控制 (Robot Control) (5) 支柱八:物理动画 (Physics-based Animation) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (24 篇)

#题目一句话要点标签🔗
1 Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models 重新审视通用Transformer:解构时间序列基础模型的强大基线 foundation model
2 On the Non-Identifiability of Steering Vectors in Large Language Models 揭示大语言模型Steering Vector的非唯一性,挑战现有可解释性方法 large language model
3 Rare Event Analysis of Large Language Models 提出LLM罕见事件分析框架,用于识别和分析模型部署中未曾观察到的显著行为。 large language model
4 NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models NanoQuant:首个实现大语言模型高效Sub-1-Bit量化的后训练量化方法 large language model
5 AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models 提出AsynDBT异步分布式双层调优算法,高效解决大语言模型上下文学习问题。 large language model
6 DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters 提出DiTS:一种基于多模态扩散Transformer的时间序列预测模型,显著提升预测精度。 multimodal
7 Live Knowledge Tracing: Real-Time Adaptation using Tabular Foundation Models 提出基于表格基础模型的实时知识追踪方法,加速预测并避免过拟合。 foundation model
8 On the Plasticity and Stability for Post-Training Large Language Models 提出概率冲突解决(PCR)框架,提升后训练大语言模型的稳定性和可塑性。 large language model
9 Adaptive Retrieval helps Reasoning in LLMs -- but mostly if it's not used 自适应检索增强LLM推理能力,但“不用”比“用”效果更好 large language model chain-of-thought
10 XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference XShare:协同批内专家共享加速MoE模型推理 large language model
11 tLoRA: Efficient Multi-LoRA Training with Elastic Shared Super-Models tLoRA:通过弹性共享超模型实现高效的多LoRA训练 large language model
12 SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding SpecAttn:通过自验证引导的稀疏注意力加速长文本LLM的自推测解码。 large language model
13 Collaborative and Efficient Fine-tuning: Leveraging Task Similarity 提出CoLoRA,利用任务相似性协同高效地微调个性化大模型。 foundation model
14 Discrete Adjoint Matching 提出离散伴随匹配(DAM)算法,用于微调基于连续时间马尔可夫链的离散生成模型。 large language model
15 ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs ScaleBITS:面向硬件友好的混合精度LLM可扩展比特宽度搜索 large language model
16 T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility T-STAR:一种用于共享微出行短期概率需求预测的上下文感知Transformer框架 multimodal
17 Can LLM Safety Be Ensured by Constraining Parameter Regions? 评估参数约束法在确保LLM安全性的有效性,发现现有方法难以可靠识别安全区域。 large language model
18 Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay 提出最优学习率调度以解决损失动态建模问题 large language model
19 EXACT: Explicit Attribute-Guided Decoding-Time Personalization 提出EXACT以解决个性化生成中的偏好表示问题 large language model
20 Confundo: Learning to Generate Robust Poison for Practical RAG Systems Confundo:学习生成鲁棒的RAG系统投毒,提升实际攻击效果 large language model
21 Evolutionary Generation of Multi-Agent Systems EvoMAS:基于演化算法的多智能体系统自动生成框架,提升复杂任务性能。 large language model
22 Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning 提出基于群体因果反事实策略优化的LLM推理方法,提升推理泛化性 large language model
23 Principle-Evolvable Scientific Discovery via Uncertainty Minimization PiEvo:通过不确定性最小化实现原理可演化的科学发现 large language model
24 Uniform Spectral Growth and Convergence of Muon in LoRA-Style Matrix Factorization 提出均匀谱增长与收敛机制以优化LoRA风格矩阵分解 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
25 FlowDA: Accurate, Low-Latency Weather Data Assimilation via Flow Matching FlowDA:基于Flow Matching的精准低延迟天气数据同化框架 flow matching foundation model
26 Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences 提出基于非凸f-散度的DPO扩展,提升奖励模型对概率偏移的抵抗性 RLHF DPO
27 Continuous-time reinforcement learning: ellipticity enables model-free value function approximation 利用椭圆性,提出连续时间强化学习中无模型值函数逼近方法 reinforcement learning
28 From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers 引入归纳偏置,Transformer可学习开普勒定律并发现牛顿力学 world model
29 A first realization of reinforcement learning-based closed-loop EEG-TMS 首次实现基于强化学习的闭环脑电-经颅磁刺激系统,用于个性化神经调控。 reinforcement learning
30 Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters 利用变异性建模优化LLM推理超参数,提升效率与可持续性 predictive model large language model
31 Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities 提出软前向-后向表示,用于解决通用效用函数下的零样本强化学习问题 reinforcement learning
32 Memory-Conditioned Flow-Matching for Stable Autoregressive PDE Rollouts 提出记忆条件Flow-Matching方法,提升自回归PDE求解器的长期稳定性 flow matching
33 Reinforcement Learning-Based Dynamic Management of Structured Parallel Farm Skeletons on Serverless Platforms 提出基于强化学习的动态管理框架,优化Serverless平台上的并行Farm骨架 reinforcement learning
34 Adaptive Uncertainty-Aware Tree Search for Robust Reasoning 提出不确定性感知树搜索(UATS),提升LLM在复杂推理中的鲁棒性 reinforcement learning large language model
35 The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL 提出最优Token基线(OTB),降低长程LLM-RL训练中的梯度方差,提升训练稳定性。 reinforcement learning large language model
36 Risk-Sensitive Exponential Actor Critic 提出风险敏感指数Actor-Critic算法,解决强化学习中风险规避策略学习的数值不稳定问题。 reinforcement learning deep reinforcement learning

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
37 The hidden risks of temporal resampling in clinical reinforcement learning 揭示临床强化学习中时间重采样的潜在风险,强调不规则时间处理的重要性 manipulation reinforcement learning offline reinforcement learning
38 Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics 提出基于回声状态网络的在线自适应强化学习,解决非平稳动态环境下的控制问题。 domain randomization reinforcement learning privileged information
39 Endogenous Resistance to Activation Steering in Language Models 大型语言模型存在内生抗性,可抵御任务错位的激活引导,提升生成质量。 manipulation large language model
40 Cerebellar-Inspired Residual Control for Fault Recovery: From Inference-Time Adaptation to Structural Consolidation 提出基于小脑的残差控制框架,用于机器人故障恢复和在线自适应。 humanoid reinforcement learning
41 Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion 提出DMA*-SH框架,通过动态对齐的共享超网络解决零样本执行器反演问题 domain randomization reinforcement learning

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
42 Calibrating Generative AI to Produce Realistic Essays for Data Augmentation 利用生成式AI校准生成逼真作文,用于数据增强,提升自动评分引擎性能。 ASE large language model
43 Reciprocal Latent Fields for Precomputed Sound Propagation 提出互易潜在场(RLF),用于预计算声传播,显著降低内存占用。 PULSE

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
44 Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit JackZebra框架:通过物理对抗攻击劫持自动驾驶车辆的行驶路线 physically plausible

⬅️ 返回 cs.LG 首页 · 🏠 返回主页