cs.LG（2024-07-05）

📊 共 18 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱一：机器人控制 (Robot Control) (2) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	The Impact of Quantization and Pruning on Deep Reinforcement Learning Models	研究量化和剪枝对深度强化学习模型性能的影响，旨在资源受限环境下的高效部署。	reinforcement learning deep reinforcement learning DRL
2	Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling	提出RDT，通过序列建模解决离线强化学习中的数据损坏问题	reinforcement learning offline RL offline reinforcement learning	✅
3	Hindsight Preference Learning for Offline Preference-based Reinforcement Learning	提出HPL：利用后见之明偏好学习解决离线偏好强化学习中的信用分配问题	reinforcement learning preference learning	✅
4	Graph Reinforcement Learning for Power Grids: A Comprehensive Survey	图强化学习用于电力系统控制：综述电力网络中基于图强化学习的控制方法。	reinforcement learning representation learning
5	Improving Knowledge Distillation in Transfer Learning with Layer-wise Learning Rates	提出层级学习率的知识蒸馏迁移学习方法，提升复杂任务性能。	distillation
6	Explorative Imitation Learning: A Path Signature Approach for Continuous Environments	提出基于路径签名和探索的模仿学习方法CILO，用于连续控制环境。	imitation learning
7	Understanding the Gains from Repeated Self-Distillation	研究重复自蒸馏的增益，揭示其在降低线性回归风险方面的潜力	distillation
8	Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks	提出基于Petri网的强化学习约束框架，提升AI可信度并应用于交通信号控制	reinforcement learning
9	Simplifying Deep Temporal Difference Learning	提出PQN：一种简化的深度在线Q学习算法，无需目标网络和经验回放，且性能优异。	reinforcement learning PPO

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Multimodal Classification via Modal-Aware Interactive Enhancement	提出模态感知交互增强方法，解决多模态学习中的模态不平衡问题。	multimodal
11	Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions	探索大语言模型在星-空-地一体化网络中的应用与未来方向	large language model
12	SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking	提出SpikeLLM，通过基于显著性的脉冲神经网络扩展到大型语言模型，实现高效推理。	large language model
13	On scalable oversight with weak LLMs judging strong LLMs	利用弱LLM作为裁判，评估强LLM的可扩展监督框架研究	large language model multimodal
14	Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models	Lazarus：面向MoE模型，实现弹性容错训练，提升训练效率。	large language model
15	LoCo: Low-Bit Communication Adaptor for Large-scale Model Training	提出LoCo低比特通信适配器，解决大规模模型训练中低精度梯度通信的性能下降问题。	large language model

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	UpStory: the Uppsala Storytelling dataset	发布UpStory数据集，用于儿童互动中Rapport预测的机器学习研究	manipulation dyadic interaction
17	Augmented Bayesian Policy Search	提出增强贝叶斯策略搜索(ABS)，结合贝叶斯优化与策略梯度方法解决高维运动控制问题。	locomotion reinforcement learning

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Spatiotemporal Forecasting of Traffic Flow using Wavelet-based Temporal Attention	提出基于小波变换时序注意力的图神经网络，用于交通流量时空预测。	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页