cs.LG(2025-05-28)

📊 共 44 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (22 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (20 🔗3) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)

#题目一句话要点标签🔗
1 Defining Foundation Models for Computational Science: A Call for Clarity and Rigor 为计算科学定义基础模型:呼吁清晰性和严谨性 foundation model
2 On Learning Verifiers for Chain-of-Thought Reasoning 提出学习可信验证器框架,用于自然语言思维链推理的正确性验证。 chain-of-thought
3 DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models DES-LOC:面向大规模模型训练的解耦低通信自适应优化器 foundation model
4 EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles EnsemW2S:利用大语言模型集成提升弱到强泛化能力 large language model
5 SlimLLM: Accurate Structured Pruning for Large Language Models SlimLLM:面向大语言模型的精确结构化剪枝方法 large language model
6 Revisiting Bayesian Model Averaging in the Era of Foundation Models 提出基于贝叶斯模型平均(BMA)和可优化模型平均(OMA)的集成方法,提升图像和文本分类任务性能。 foundation model
7 Investigating the effectiveness of multimodal data in forecasting SARS-COV-2 case surges 利用多模态数据预测SARS-COV-2病例激增,揭示国家和阶段异质性。 multimodal
8 NOCL: Node-Oriented Conceptualization LLM for Graph Tasks without Message Passing 提出NOCL,一种无需消息传递的面向节点概念化的大语言模型,用于图任务。 large language model foundation model
9 SimuGen: Multi-modal Agentic Framework for Constructing Block Diagram-Based Simulation Models SimuGen:多模态Agent框架,用于构建基于框图的Simulink仿真模型 large language model multimodal
10 Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking 针对LLM预训练,提出权重重构和动量重置技术,提升参数效率和降低内存需求。 large language model
11 BLUR: A Benchmark for LLM Unlearning Robust to Forget-Retain Overlap BLUR:一个针对LLM非学习的基准测试,对遗忘-保留重叠具有鲁棒性 large language model
12 Highly Efficient and Effective LLMs with Multi-Boolean Architectures 提出基于多核布尔架构的高效LLM微调方法,无需全精度潜在权重。 large language model
13 Navigating the Latent Space Dynamics of Neural Models 提出基于隐空间动力系统的神经网络分析方法,用于分析模型泛化能力和提取先验知识。 foundation model
14 Multivariate de Bruijn Graphs: A Symbolic Graph Framework for Time Series Forecasting 提出DRAGON,利用多变量de Bruijn图解决时间序列预测中符号结构缺失问题 foundation model
15 FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference FlashFormer:用于高效低批量推理的全模型融合Kernel large language model
16 Update Your Transformer to the Latest Release: Re-Basin of Task Vectors 提出TransFusion,通过重构任务向量实现Transformer模型微调知识的无数据迁移。 foundation model
17 Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning 提出OA-Adapter以解决LLMs持续学习中的预算分配问题 large language model
18 MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning 提出MoRE:一种低秩专家混合模型,用于自适应多任务学习。 large language model
19 Detecting Undesired Process Behavior by Means of Retrieval Augmented Generation 提出基于检索增强生成(RAG)的方法,无需微调即可检测流程中不期望的行为。 large language model
20 ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning ACE:探索激活余弦相似性和方差,实现LLM精确高效的校准剪枝 large language model
21 LLMs Judging LLMs: A Simplex Perspective 提出几何贝叶斯方法以评估大型语言模型的输出质量 large language model
22 FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design FALCON:全自动布局约束模拟电路设计的机器学习框架 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
23 LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning 提出LLM-ODDR框架,利用大语言模型解决网约车订单分配与司机调度联合优化问题 reinforcement learning spatiotemporal large language model
24 A Closer Look at Multimodal Representation Collapse 揭示多模态表征坍塌机理,提出显式基向量重分配算法以提升多模态融合性能。 distillation multimodal
25 Scaling Offline RL via Efficient and Expressive Shortcut Models 提出SORL算法,利用高效且富有表现力的捷径模型扩展离线强化学习。 reinforcement learning offline RL offline reinforcement learning
26 SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning 提出SOReL和TOReL,解决离线强化学习中超参数调优和性能评估难题。 reinforcement learning offline RL offline reinforcement learning
27 Estimating the Effects of Sample Training Orders for Large Language Models without Retraining 提出一种免重训练框架,用于评估大语言模型训练样本顺序的影响 curriculum learning large language model
28 Preference Learning with Response Time: Robust Losses and Guarantees 提出基于响应时间的偏好学习方法,提升奖励模型学习的样本效率与理论保证。 preference learning foundation model
29 Skywork Open Reasoner 1 Technical Report Skywork-OR1:通过强化学习提升长CoT模型推理能力,显著超越同规模模型。 reinforcement learning large language model chain-of-thought
30 Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding 提出DRG-Sapphire,利用强化学习解决LLM在DRG编码中的分布外推理难题。 reinforcement learning large language model
31 SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training 提出SDPO以解决扩散模型训练中的偏差与不稳定问题 preference learning DPO direct preference optimization
32 Scaling Reasoning without Attention 提出无注意力语言模型以解决推理效率低下问题 Mamba state space model large language model
33 Two-Stage Feature Generation with Transformer and Reinforcement Learning 提出基于Transformer和强化学习的两阶段特征生成框架,提升模型性能和适应性。 reinforcement learning PPO
34 A Provable Approach for End-to-End Safe Reinforcement Learning 提出PLS:一种可证明的端到端安全强化学习方法,确保学习和部署全过程的安全性。 reinforcement learning
35 Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking 提出Contraction Actor-Critic算法,用于未知动力学下的鲁棒路径跟踪。 reinforcement learning
36 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models 针对推理语言模型,提出基于熵机制的强化学习方法,解决策略熵坍塌问题。 reinforcement learning
37 Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation 提出物理信息蒸馏方法以解决扩散模型中的PDE约束问题 distillation
38 When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks? 探讨神经进化在迁移学习任务中超越强化学习的能力 reinforcement learning
39 An Augmentation-Aware Theory for Self-Supervised Contrastive Learning 提出一种数据增强感知的自监督对比学习理论框架,显式建模数据增强的影响。 contrastive learning
40 Weakly-Supervised Contrastive Learning for Imprecise Class Labels 提出基于图的弱监督对比学习框架,解决标签不准确情况下的表征学习问题 contrastive learning
41 FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators FNOPE:利用傅里叶神经算子进行函数空间上的模拟推断,提升时空过程建模效率。 flow matching spatiotemporal
42 Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training 改进群体相对策略优化:探索其在On-Policy和Off-Policy训练中的应用 reinforcement learning PPO

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
43 Forecasting Multivariate Urban Data via Decomposition and Spatio-Temporal Graph Analysis 提出DST模型,通过分解和时空图分析预测多元城市数据 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
44 Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection 提出基于伪数据注入的随机Bandit算法对抗攻击方法 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页