cs.LG(2026-04-20)

📊 共 35 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (19 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱一:机器人控制 (Robot Control) (3)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (19 篇)

#题目一句话要点标签🔗
1 A multimodal and temporal foundation model for virtual patient representations at healthcare system scale Apollo:构建医疗系统级多模态时序基础模型,实现虚拟患者表征 foundation model multimodal
2 Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective 提出CmIR框架,通过因果推断学习模态不变表示,提升多模态情感计算的鲁棒性。 multimodal
3 Towards a Foundation-Model Paradigm for Aerodynamic Prediction in Three-dimensional Design 提出AeroTransformer,通过预训练-微调范式提升三维气动预测精度。 foundation model
4 LoReC: Rethinking Large Language Models for Graph Data Analysis 提出LoReC,增强大语言模型在图数据分析中的性能,超越传统图神经网络。 large language model
5 LLM-AUG: Robust Wireless Data Augmentation with In-Context Learning in Large Language Models 提出LLM-AUG以解决无线通信数据稀缺问题 large language model
6 SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models SafeAnchor:防止大语言模型持续领域自适应中的累积安全侵蚀 large language model
7 CAARL: In-Context Learning for Interpretable Co-Evolving Time Series Forecasting 提出CAARL,利用上下文学习解决可解释的协同演化时间序列预测问题 large language model chain-of-thought
8 Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering 提出潜变量相移回滚(LPSR),通过监控残差流和KV缓存控制实现推理时错误校正。 large language model
9 Barrier-enforced multi-objective optimization for direct point and sharp interval forecasting 提出基于障碍函数的自适应多目标优化方法,用于直接预测点和锐利区间。 foundation model
10 Semantic Step Prediction: Multi-Step Latent Forecasting in LLM Reasoning Trajectories via Step Sampling 提出语义步骤预测以提升多步推理的准确性 large language model
11 Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling 提出AdaLeZO,通过自适应层采样优化零阶优化,加速大语言模型微调。 large language model
12 Correction and Corruption: A Two-Rate View of Error Flow in LLM Protocols 提出双速率视角以审计大型语言模型协议中的错误流 large language model
13 Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement 提出奖励校准方法,解决偏好优化中似然位移问题,提升大语言模型对齐效果 large language model
14 mlr3torch: A Deep Learning Framework in R based on mlr3 and torch mlr3torch:基于mlr3和torch的R语言深度学习框架,简化模型定义、训练和评估。 multimodal
15 Predicting LLM Compression Degradation from Spectral Statistics 提出压缩前性能预测方法以优化大语言模型的低秩压缩 large language model
16 Towards Real-Time ECG and EMG Modeling on $μ$ NPUs 提出PhysioLite,实现微型NPU上实时心电/肌电信号建模 foundation model
17 TeleEmbedBench: A Multi-Corpus Embedding Benchmark for RAG in Telecommunications 提出TeleEmbedBench,用于评估电信领域RAG的嵌入模型,解决通用benchmark不足问题。 large language model
18 HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation HiP-LoRA:通过预算控制的谱可塑性实现鲁棒的低秩适应 foundation model
19 Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics GLMTest:利用程序结构感知的语言模型实现定向软件测试 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
20 Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought 提出校准强化学习算法CAL-GRPO,解决多步CoT推理中的梯度偏差问题。 reinforcement learning chain-of-thought
21 Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity Sonata:一种用于临床数据稀缺下惯性运动学的混合世界模型 world model world models representation learning
22 Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning 提出基于正则化强化学习的动态中止框架,提升LLM推理效率与准确性。 reinforcement learning large language model chain-of-thought
23 Fisher Decorator: Refining Flow Policy via A Local Transport Map Fisher Decorator:通过局部传输映射优化基于流的离线强化学习策略 reinforcement learning offline RL offline reinforcement learning
24 Efficient Federated RLHF via Zeroth-Order Policy Optimization 提出Par-S²ZPO算法,解决联邦RLHF中资源受限Agent的效率问题 reinforcement learning RLHF
25 When Can LLMs Learn to Reason with Weak Supervision? 提出弱监督下的推理学习方法以提升LLM性能 reinforcement learning large language model
26 Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data 提出CUTS和Mixed-CUTS框架,解决强化学习中推理数据饱和导致的策略退化问题 reinforcement learning
27 Neural Garbage Collection: Learning to Forget while Learning to Reason 提出神经垃圾回收(NGC),通过端到端学习实现语言模型在推理过程中自主遗忘,提升效率。 reinforcement learning chain-of-thought
28 HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment 提出HEAL框架,通过混合域熵动态对齐增强少样本RLVR探索能力 reinforcement learning large language model
29 LEPO: \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization for Large Language~Models 提出LEPO,通过在隐空间进行强化学习,提升大语言模型的推理能力 reinforcement learning large language model
30 The Umwelt Representation Hypothesis: Rethinking Universality 提出Umwelt表征假说,质疑通用表征,强调生态约束对表征的影响 world model world models
31 Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer 提出Agentic Consensus,通过可治理的共识层提升人机协同编程的可控性与可审计性 world model world models
32 Tool Learning Needs Nothing More Than a Free 8B Language Model 提出TRUSTEE,利用8B开源语言模型训练工具调用Agent,无需额外数据。 reinforcement learning curriculum learning

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
33 Can Explicit Physical Feasibility Benefit VLA Learning? An Empirical Study 通过显式物理可行性约束提升VLA模型学习能力,解决机器人操作中的可靠性问题 manipulation imitation learning vision-language-action
34 Bounded Ratio Reinforcement Learning 提出有界比率强化学习框架(BRRL),弥合信任域方法与PPO启发式裁剪目标之间的差距。 humanoid humanoid locomotion locomotion
35 Ranking Abuse via Strategic Pairwise Data Perturbations 提出自适应子集选择攻击(ASSA)以研究基于MLE排序系统在对抗性扰动下的脆弱性 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页