cs.LG(2025-10-21)

📊 共 31 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (19) 支柱九:具身大模型 (Embodied Foundation Models) (11) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (19 篇)

#题目一句话要点标签🔗
1 BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping BAPO:通过自适应裁剪平衡策略优化,稳定LLM的离线强化学习。 reinforcement learning PPO large language model
2 Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options 提出M-AUPO算法以提升偏好强化学习的样本效率 reinforcement learning large language model
3 From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation 提出定制化GRPO,解决主体驱动图像生成中保真度和可编辑性的trade-off问题 reinforcement learning reward shaping
4 Towards Universal Solvers: Using PGD Attack in Active Learning to Increase Generalizability of Neural Operators as Knowledge Distillation from Numerical PDE Solvers 提出基于PGD攻击的主动学习框架,提升神经算子在偏微分方程求解中的泛化性。 teacher-student distillation
5 Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach 提出贝叶斯价值函数解决不完美转移预测的强化学习问题 reinforcement learning model-based RL
6 Learning to Navigate Under Imperfect Perception: Conformalised Segmentation for Safe Reinforcement Learning 提出COPPOL,结合Conformal Prediction与强化学习,实现安全导航。 reinforcement learning policy learning
7 ADPO: Anchored Direct Preference Optimization ADPO:锚定直接偏好优化,通过解耦响应质量与先验流行度提升策略对齐效果 reinforcement learning direct preference optimization
8 Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task 研究表明,更高维度嵌入能为排序任务Transformer构建更强的世界模型 reinforcement learning world model
9 Towards Identifiability of Hierarchical Temporal Causal Representation Learning 提出CHiLD框架,解决时间序列数据中分层潜在因果表示学习的唯一性问题。 latent dynamics representation learning
10 What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning 解耦数据排序对LLM数学推理的影响,探究有效课程学习策略 curriculum learning large language model
11 POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning POLAR:提出基于策略梯度强化学习的联邦学习隐蔽后门攻击方法 reinforcement learning
12 Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting 利用On-Policy数据缓解语言模型微调中的灾难性遗忘 reinforcement learning instruction following
13 Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation 提出T-MTB:一种可迁移的LLM后门攻击方法,提升蒸馏场景下的安全性风险 distillation
14 Simple and Efficient Heterogeneous Temporal Graph Neural Network 提出SE-HTGNN,通过动态注意力机制和LLM提示,高效学习异构时序图表示。 representation learning large language model
15 Condition-Invariant fMRI Decoding of Speech Intelligibility with Deep State Space Model 提出基于深度状态空间模型的fMRI语音可懂度跨条件解码方法 state space model
16 Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs 针对无折扣总回报MDP,提出策略梯度算法的收敛性分析方法 reinforcement learning large language model
17 Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching 提出时间条件收缩匹配(TCCM),用于表格数据中可扩展、可解释且鲁棒的异常检测。 flow matching
18 RESCUE: Retrieval Augmented Secure Code Generation RESCUE:提出检索增强的安全代码生成框架,提升LLM代码安全性。 distillation large language model
19 Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients 提出噪声校正GRPO框架,解决RLHF中噪声奖励导致的策略优化偏差问题 reinforcement learning RLHF

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
20 Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records 提出基于大语言模型处理历史记录的先验知识优化治疗推荐的Bandit算法 large language model
21 QKCV Attention: Enhancing Time Series Forecasting with Static Categorical Embeddings for Both Lightweight and Pre-trained Foundation Models 提出QKCV注意力机制,利用静态类别嵌入增强时序预测模型性能 foundation model
22 Large Connectome Model: An fMRI Foundation Model of Brain Connectomes Empowered by Brain-Environment Interaction in Multitask Learning Landscape 提出基于脑-环境交互多任务学习的脑连接组fMRI基础模型,提升临床应用性能。 foundation model
23 Reasoning Language Model Inference Serving Unveiled: An Empirical Study 揭示推理大语言模型推理服务的特性与优化方法 large language model
24 Benchmarking On-Device Machine Learning on Apple Silicon with MLX 利用MLX框架,在Apple Silicon上高效部署Transformer模型,实现端侧机器学习加速。 large language model
25 Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications 针对毫米波通信波束对准,提出物理信息参数化Bandit算法pretc和prgreedy multimodal
26 Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs 提出条件缩放律与搜索框架,优化LLM架构以提升推理效率 large language model
27 ACTG-ARL: Differentially Private Conditional Text Generation with RL-Boosted Control 提出ACTG-ARL框架,通过强化学习提升差分隐私条件文本生成质量与控制能力。 instruction following
28 Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations 提出P-GAP:通过梯度对齐扰动的零阶优化加速LLM微调 large language model
29 3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency 提出3D优化框架,联合校准精度、成本和延迟,实现AI推理的约束感知扩展。 large language model
30 ActivationReasoning: Logical Reasoning in Latent Activation Spaces 提出ActivationReasoning框架,将逻辑推理嵌入LLM的隐空间以提升可控性和可靠性。 large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
31 BO4Mob: Bayesian Optimization Benchmarks for High-Dimensional Urban Mobility Problem BO4Mob:高维城市交通优化贝叶斯优化基准框架 CHOIS

⬅️ 返回 cs.LG 首页 · 🏠 返回主页