cs.LG(2026-04-15)

📊 共 26 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (18 🔗5) 支柱一:机器人控制 (Robot Control) (4) 支柱九:具身大模型 (Embodied Foundation Models) (4)

🔬 支柱二:RL算法与架构 (RL & Architecture) (18 篇)

#题目一句话要点标签🔗
1 Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning 提出RoleJudge框架,利用音频大语言模型评估语音角色扮演中角色一致性 reinforcement learning large language model multimodal
2 From Alignment to Prediction: A Study of Self-Supervised Learning and Predictive Representation Learning 提出预测表征学习(PRL)范式,扩展自监督学习至数据分布预测。 JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture
3 Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety 提出基于深度强化学习的驾驶员生理状态自适应自动刹车系统,提升道路安全 reinforcement learning deep reinforcement learning
4 Beyond State Consistency: Behavior Consistency in Text-Based World Models 提出行为一致性奖励(BehR)训练范式,提升文本世界模型与真实环境的功能一致性。 world model world models
5 Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning 提出证据感知自校正强化学习,提升放射报告生成的临床一致性。 reinforcement learning preference learning
6 FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction 提出FAST框架,结合注意力机制与状态空间模型,用于时空交通预测。 Mamba MAE spatiotemporal
7 A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models 提出基于KL散度的前向敏感度分析方法,加速混合精度SSM-Transformer模型量化部署。 SSM state space model large language model
8 Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO 提出目标解耦架构,解决多时间尺度PPO中Surrogate Hacking问题 reinforcement learning PPO representation learning
9 $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data 提出$π$-Play,通过特权自蒸馏实现多智能体自博弈,无需外部数据,提升搜索智能体训练效率。 distillation privileged information
10 From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space 提出PreRL和DSRL,通过预训练空间强化学习提升LLM推理能力 reinforcement learning
11 ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation 提出MVCrec,通过ID和图视角对比学习及多视角注意力融合提升序列推荐性能。 contrastive learning
12 TIP: Token Importance in On-Policy Distillation 提出TIP:基于Token重要性的On-Policy蒸馏方法,提升训练效率并降低内存占用 distillation
13 A Comparative Study of Dynamic Programming and Reinforcement Learning in Finite Horizon Dynamic Pricing 对比动态规划与强化学习在有限期动态定价中的性能与权衡 reinforcement learning
14 DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off DiPO:解耦困惑度策略优化,实现细粒度的探索-利用权衡,提升LLM推理能力。 reinforcement learning large language model
15 Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces 提出Soft Q(λ),一种基于资格迹的熵正则化强化学习离策略多步方法 reinforcement learning
16 EMGFlow: Robust and Efficient Surface Electromyography Synthesis via Flow Matching EMGFlow:提出基于Flow Matching的表面肌电信号合成方法,提升数据增强效果和效率。 flow matching
17 Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization 提出基于梯度流形优化的联合表示学习与聚类框架,解决高维数据聚类难题。 representation learning
18 Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus 提出基于潜在共识的多智能体Transformer (CMAT),桥接MARL到SARL,提升多智能体协作性能。 reinforcement learning PPO

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
19 Jump-Start Reinforcement Learning with Vision-Language-Action Regularization 提出VLAJS方法,利用视觉-语言-动作模型引导强化学习,提升机器人操作任务的探索效率。 manipulation sim-to-real reinforcement learning
20 Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning 提出Chain of Uncertain Rewards (CoUR)框架,利用LLM高效设计强化学习奖励函数。 manipulation dexterous manipulation reinforcement learning
21 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges 揭示大模型奖励利用机制:提出代理压缩假设解释涌现的对齐问题 manipulation reinforcement learning RLHF
22 LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design 提出LEGO-MOF,实现可编辑、可生成、可优化的金属有机框架设计。 manipulation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
23 MAny: Merge Anything for Multimodal Continual Instruction Tuning 提出MAny框架,通过跨模态投影和低秩参数融合解决多模态持续指令调优中的灾难性遗忘问题。 large language model multimodal
24 LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning LongCoT:用于评估长程思维链推理能力的可扩展基准测试 chain-of-thought
25 Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning 提出EPI动态参数隔离框架,解决SFT中参数重要性随时间演变的问题 large language model
26 Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate DASH-Q:基于稳定对角曲率估计的鲁棒超低比特后训练量化 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页