cs.LG(2025-07-11)

📊 共 23 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱一:机器人控制 (Robot Control) (4) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning 提出数学LLM双阶段训练方法,通过SFT提升精度,GRPO优化效率 reinforcement learning large language model
2 Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control 通过动作分块与探索性数据收集,行为克隆在连续控制任务上获得指数级提升 imitation learning behavior cloning
3 Enhancing RLHF with Human Gaze Modeling 利用人类眼动建模增强RLHF,加速语言模型对齐人类偏好 reinforcement learning RLHF
4 Online Pre-Training for Offline-to-Online Reinforcement Learning 提出在线预训练方法OPT,解决离线预训练模型在线微调时值估计不准确问题 reinforcement learning TD3
5 Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data 提出LoGIC:一种基于通信博弈的无监督图像描述方法,无需额外数据提升性能。 reinforcement learning large language model
6 One Token to Fool LLM-as-a-Judge 揭示LLM裁判的脆弱性:仅用单个token即可欺骗LLM奖励模型 reinforcement learning large language model
7 Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning 提出ORAC算法,通过乐观探索解决风险规避约束强化学习中的次优策略问题。 reinforcement learning
8 Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data PARS算法:通过惩罚不可行动作和奖励缩放,提升离线强化学习性能 reinforcement learning
9 Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation 提出知识融合与蒸馏方法,解决深度模型中的局部过拟合问题。 distillation
10 SFedKD: Sequential Federated Learning with Discrepancy-Aware Multi-Teacher Knowledge Distillation 提出SFedKD框架,通过差异感知多教师知识蒸馏解决序列联邦学习中的灾难性遗忘问题 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
11 Multimodal Cardiovascular Risk Profiling Using Self-Supervised Learning of Polysomnography 提出基于睡眠多导图自监督学习的多模态心血管风险预测方法 multimodal
12 A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering 提出基于激活模式聚类的LLM稀疏性预测方法,提升计算效率。 large language model
13 Quantum-Accelerated Neural Imputation with Large Language Models (LLMs) Quantum-UnIMP:利用量子加速LLM进行混合数据缺失值填补,显著提升填补精度。 large language model
14 Self-Supervised Learning-Based Multimodal Prediction on Prosocial Behavior Intentions 提出基于自监督学习的多模态预测方法,用于预测驾驶场景中的亲社会行为意图。 multimodal
15 On Evaluating Performance of LLM Inference Serving Systems 揭示LLM推理服务系统评估中的反模式,并提出一套更稳健的评估框架 large language model
16 BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity 提出BlockFFN,通过块级激活稀疏性实现端侧加速友好的混合专家模型。 large language model
17 AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling 提出基于自回归块的迭代编码器AbbIE,提升序列建模效率并支持动态计算缩放。 large language model
18 Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text 提出一种基于机器学习和增强并行检测的文本生成BPMN模型方法 large language model

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
19 SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations SPLASH:基于偏好的逆强化学习,从次优分层演示中学习长时对抗任务 sim-to-real reinforcement learning inverse reinforcement learning
20 Behavioral Exploration: Learning to Explore via In-Context Adaptation 提出行为探索方法,通过上下文适应学习探索策略,提升机器人自主探索能力。 locomotion manipulation
21 Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security 提出量子机器学习安全统一杀伤链模型,应对复杂攻击,促进全面防御。 manipulation
22 Prediction of Lane Change Intentions of Human Drivers using an LSTM, a CNN and a Transformer 利用LSTM、CNN和Transformer预测人类驾驶员的变道意图 motion planning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models 针对离散扩散模型的无分类器引导理论优化,提升生成质量 classifier-free guidance

⬅️ 返回 cs.LG 首页 · 🏠 返回主页