cs.LG(2024-05-29)

📊 共 16 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (12 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (4)

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
1 Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning 提出基于偏好动作优化的扩散策略,提升离线强化学习性能 reinforcement learning offline RL offline reinforcement learning
2 Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF 提出价值激励偏好优化(VPO),统一在线与离线RLHF,提升LLM对齐效果。 reinforcement learning offline RL RLHF
3 CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning 提出s-CLIPLoss和NormSim,提升多模态对比学习中数据选择的性能。 contrastive learning multimodal
4 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment 提出自探索语言模型(SELM),通过主动偏好诱导实现LLM的在线对齐。 reinforcement learning RLHF DPO
5 Robust Preference Optimization through Reward Model Distillation 提出基于奖励模型蒸馏的鲁棒偏好优化方法,提升语言模型对偏好数据分布偏移的适应性。 reinforcement learning DPO direct preference optimization
6 Preference Learning Algorithms Do Not Learn Preference Rankings 揭示偏好学习算法的局限性:模型排序能力与人类偏好存在显著差距 preference learning RLHF DPO
7 Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation 提出SEER以提高偏好强化学习的反馈效率 reinforcement learning policy learning
8 Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees 提出谱风险约束策略优化算法(SRCPO),解决风险约束强化学习中的收敛性难题。 reinforcement learning
9 Forward-Backward Knowledge Distillation for Continual Clustering 提出面向无监督持续聚类的正向-反向知识蒸馏方法FBCC,解决灾难性遗忘问题。 distillation
10 Learning Human-Aligned Representations with Contrastive Learning and Generative Similarity 提出基于生成相似度的对比学习方法,学习与人类认知对齐的表征 contrastive learning
11 Stress-Testing Capability Elicitation With Password-Locked Models 提出密码锁模型,评估微调在大型语言模型能力诱导中的有效性 reinforcement learning large language model
12 Deep Bayesian Filter for Bayes-faithful Data Assimilation 提出深度贝叶斯滤波,用于解决非线性状态空间模型中的非高斯后验数据同化问题 SSM state space model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
13 Pretrained Mobility Transformer: A Foundation Model for Human Mobility 提出预训练移动Transformer(PMT),用于理解城市空间和人类移动模式 foundation model
14 DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints DiveR-CT:通过放宽约束增强多样性的大语言模型助手红队测试 large language model
15 Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities Zipper:一种用于融合多模态信息的多塔解码器架构 foundation model multimodal
16 To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability 研究FP8对LLM训练稳定性的影响,提出评估方法并分析精度与稳定性的关系 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页