cs.LG（2025-10-21）

📊 共 31 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (19) 支柱九：具身大模型 (Embodied Foundation Models) (11) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (19 篇)

#	题目	一句话要点	标签	🔗	⭐
1	BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping	BAPO：通过自适应裁剪平衡策略优化，稳定LLM的离线强化学习。	reinforcement learning PPO large language model
2	Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options	提出M-AUPO算法以提升偏好强化学习的样本效率	reinforcement learning large language model
3	From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation	提出定制化GRPO，解决主体驱动图像生成中保真度和可编辑性的trade-off问题	reinforcement learning reward shaping
4	Towards Universal Solvers: Using PGD Attack in Active Learning to Increase Generalizability of Neural Operators as Knowledge Distillation from Numerical PDE Solvers	提出基于PGD攻击的主动学习框架，提升神经算子在偏微分方程求解中的泛化性。	teacher-student distillation
5	Reinforcement Learning with Imperfect Transition Predictions: A Bellman-Jensen Approach	提出贝叶斯价值函数解决不完美转移预测的强化学习问题	reinforcement learning model-based RL
6	Learning to Navigate Under Imperfect Perception: Conformalised Segmentation for Safe Reinforcement Learning	提出COPPOL，结合Conformal Prediction与强化学习，实现安全导航。	reinforcement learning policy learning
7	ADPO: Anchored Direct Preference Optimization	ADPO：锚定直接偏好优化，通过解耦响应质量与先验流行度提升策略对齐效果	reinforcement learning direct preference optimization
8	Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task	研究表明，更高维度嵌入能为排序任务Transformer构建更强的世界模型	reinforcement learning world model
9	Towards Identifiability of Hierarchical Temporal Causal Representation Learning	提出CHiLD框架，解决时间序列数据中分层潜在因果表示学习的唯一性问题。	latent dynamics representation learning
10	What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning	解耦数据排序对LLM数学推理的影响，探究有效课程学习策略	curriculum learning large language model
11	POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning	POLAR：提出基于策略梯度强化学习的联邦学习隐蔽后门攻击方法	reinforcement learning
12	Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting	利用On-Policy数据缓解语言模型微调中的灾难性遗忘	reinforcement learning instruction following
13	Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation	提出T-MTB：一种可迁移的LLM后门攻击方法，提升蒸馏场景下的安全性风险	distillation
14	Simple and Efficient Heterogeneous Temporal Graph Neural Network	提出SE-HTGNN，通过动态注意力机制和LLM提示，高效学习异构时序图表示。	representation learning large language model
15	Condition-Invariant fMRI Decoding of Speech Intelligibility with Deep State Space Model	提出基于深度状态空间模型的fMRI语音可懂度跨条件解码方法	state space model
16	Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs	针对无折扣总回报MDP，提出策略梯度算法的收敛性分析方法	reinforcement learning large language model
17	Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching	提出时间条件收缩匹配(TCCM)，用于表格数据中可扩展、可解释且鲁棒的异常检测。	flow matching
18	RESCUE: Retrieval Augmented Secure Code Generation	RESCUE：提出检索增强的安全代码生成框架，提升LLM代码安全性。	distillation large language model
19	Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients	提出噪声校正GRPO框架，解决RLHF中噪声奖励导致的策略优化偏差问题	reinforcement learning RLHF

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records	提出基于大语言模型处理历史记录的先验知识优化治疗推荐的Bandit算法	large language model
21	QKCV Attention: Enhancing Time Series Forecasting with Static Categorical Embeddings for Both Lightweight and Pre-trained Foundation Models	提出QKCV注意力机制，利用静态类别嵌入增强时序预测模型性能	foundation model
22	Large Connectome Model: An fMRI Foundation Model of Brain Connectomes Empowered by Brain-Environment Interaction in Multitask Learning Landscape	提出基于脑-环境交互多任务学习的脑连接组fMRI基础模型，提升临床应用性能。	foundation model
23	Reasoning Language Model Inference Serving Unveiled: An Empirical Study	揭示推理大语言模型推理服务的特性与优化方法	large language model
24	Benchmarking On-Device Machine Learning on Apple Silicon with MLX	利用MLX框架，在Apple Silicon上高效部署Transformer模型，实现端侧机器学习加速。	large language model
25	Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications	针对毫米波通信波束对准，提出物理信息参数化Bandit算法pretc和prgreedy	multimodal
26	Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs	提出条件缩放律与搜索框架，优化LLM架构以提升推理效率	large language model
27	ACTG-ARL: Differentially Private Conditional Text Generation with RL-Boosted Control	提出ACTG-ARL框架，通过强化学习提升差分隐私条件文本生成质量与控制能力。	instruction following
28	Towards Fast LLM Fine-tuning through Zeroth-Order Optimization with Projected Gradient-Aligned Perturbations	提出P-GAP：通过梯度对齐扰动的零阶优化加速LLM微调	large language model
29	3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency	提出3D优化框架，联合校准精度、成本和延迟，实现AI推理的约束感知扩展。	large language model
30	ActivationReasoning: Logical Reasoning in Latent Activation Spaces	提出ActivationReasoning框架，将逻辑推理嵌入LLM的隐空间以提升可控性和可靠性。	large language model

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
31	BO4Mob: Bayesian Optimization Benchmarks for High-Dimensional Urban Mobility Problem	BO4Mob：高维城市交通优化贝叶斯优化基准框架	CHOIS	✅

⬅️ 返回 cs.LG 首页 · 🏠 返回主页