cs.LG（2025-07-11）

📊 共 23 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (10 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱一：机器人控制 (Robot Control) (4) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning	提出数学LLM双阶段训练方法，通过SFT提升精度，GRPO优化效率	reinforcement learning large language model	✅
2	Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control	通过动作分块与探索性数据收集，行为克隆在连续控制任务上获得指数级提升	imitation learning behavior cloning
3	Enhancing RLHF with Human Gaze Modeling	利用人类眼动建模增强RLHF，加速语言模型对齐人类偏好	reinforcement learning RLHF
4	Online Pre-Training for Offline-to-Online Reinforcement Learning	提出在线预训练方法OPT，解决离线预训练模型在线微调时值估计不准确问题	reinforcement learning TD3
5	Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data	提出LoGIC：一种基于通信博弈的无监督图像描述方法，无需额外数据提升性能。	reinforcement learning large language model
6	One Token to Fool LLM-as-a-Judge	揭示LLM裁判的脆弱性：仅用单个token即可欺骗LLM奖励模型	reinforcement learning large language model	✅
7	Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning	提出ORAC算法，通过乐观探索解决风险规避约束强化学习中的次优策略问题。	reinforcement learning
8	Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data	PARS算法：通过惩罚不可行动作和奖励缩放，提升离线强化学习性能	reinforcement learning
9	Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation	提出知识融合与蒸馏方法，解决深度模型中的局部过拟合问题。	distillation
10	SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation	提出SFedKD框架，通过差异感知多教师知识蒸馏解决序列联邦学习中的灾难性遗忘问题	distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography	提出基于睡眠多导图自监督学习的多模态心血管风险预测方法	multimodal	✅
12	A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering	提出基于激活模式聚类的LLM稀疏性预测方法，提升计算效率。	large language model
13	Quantum-Accelerated Neural Imputation with Large Language Models (LLMs)	Quantum-UnIMP：利用量子加速LLM进行混合数据缺失值填补，显著提升填补精度。	large language model
14	Self-Supervised Learning-Based Multimodal Prediction on Prosocial Behavior Intentions	提出基于自监督学习的多模态预测方法，用于预测驾驶场景中的亲社会行为意图。	multimodal
15	On Evaluating Performance of LLM Inference Serving Systems	揭示LLM推理服务系统评估中的反模式，并提出一套更稳健的评估框架	large language model
16	BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity	提出BlockFFN，通过块级激活稀疏性实现端侧加速友好的混合专家模型。	large language model	✅
17	AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling	提出基于自回归块的迭代编码器AbbIE，提升序列建模效率并支持动态计算缩放。	large language model
18	Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text	提出一种基于机器学习和增强并行检测的文本生成BPMN模型方法	large language model

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
19	SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations	SPLASH：基于偏好的逆强化学习，从次优分层演示中学习长时对抗任务	sim-to-real reinforcement learning inverse reinforcement learning
20	Behavioral Exploration: Learning to Explore via In-Context Adaptation	提出行为探索方法，通过上下文适应学习探索策略，提升机器人自主探索能力。	locomotion manipulation
21	Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security	提出量子机器学习安全统一杀伤链模型，应对复杂攻击，促进全面防御。	manipulation
22	Prediction of Lane Change Intentions of Human Drivers using an LSTM, a CNN and a Transformer	利用LSTM、CNN和Transformer预测人类驾驶员的变道意图	motion planning

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models	针对离散扩散模型的无分类器引导理论优化，提升生成质量	classifier-free guidance

⬅️ 返回 cs.LG 首页 · 🏠 返回主页