cs.LG（2024-05-29）

📊 共 16 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (12 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (4)

🔬 支柱二：RL算法与架构 (RL & Architecture) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning	提出基于偏好动作优化的扩散策略，提升离线强化学习性能	reinforcement learning offline RL offline reinforcement learning
2	Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF	提出价值激励偏好优化(VPO)，统一在线与离线RLHF，提升LLM对齐效果。	reinforcement learning offline RL RLHF
3	CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning	提出s-CLIPLoss和NormSim，提升多模态对比学习中数据选择的性能。	contrastive learning multimodal
4	Self-Exploring Language Models: Active Preference Elicitation for Online Alignment	提出自探索语言模型（SELM），通过主动偏好诱导实现LLM的在线对齐。	reinforcement learning RLHF DPO	✅
5	Robust Preference Optimization through Reward Model Distillation	提出基于奖励模型蒸馏的鲁棒偏好优化方法，提升语言模型对偏好数据分布偏移的适应性。	reinforcement learning DPO direct preference optimization
6	Preference Learning Algorithms Do Not Learn Preference Rankings	揭示偏好学习算法的局限性：模型排序能力与人类偏好存在显著差距	preference learning RLHF DPO
7	Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation	提出SEER以提高偏好强化学习的反馈效率	reinforcement learning policy learning
8	Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees	提出谱风险约束策略优化算法(SRCPO)，解决风险约束强化学习中的收敛性难题。	reinforcement learning
9	Forward-Backward Knowledge Distillation for Continual Clustering	提出面向无监督持续聚类的正向-反向知识蒸馏方法FBCC，解决灾难性遗忘问题。	distillation
10	Learning Human-Aligned Representations with Contrastive Learning and Generative Similarity	提出基于生成相似度的对比学习方法，学习与人类认知对齐的表征	contrastive learning
11	Stress-Testing Capability Elicitation With Password-Locked Models	提出密码锁模型，评估微调在大型语言模型能力诱导中的有效性	reinforcement learning large language model
12	Deep Bayesian Filter for Bayes-faithful Data Assimilation	提出深度贝叶斯滤波，用于解决非线性状态空间模型中的非高斯后验数据同化问题	SSM state space model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Pretrained Mobility Transformer: A Foundation Model for Human Mobility	提出预训练移动Transformer（PMT），用于理解城市空间和人类移动模式	foundation model
14	DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints	DiveR-CT：通过放宽约束增强多样性的大语言模型助手红队测试	large language model
15	Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities	Zipper：一种用于融合多模态信息的多塔解码器架构	foundation model multimodal
16	To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability	研究FP8对LLM训练稳定性的影响，提出评估方法并分析精度与稳定性的关系	large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页