cs.LG(2024-07-19)

📊 共 17 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (12 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (5)

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
1 A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C 对比DQN、PPO和A2C在BreakOut游戏中性能,为游戏AI提供参考 reinforcement learning deep reinforcement learning PPO
2 BOND: Aligning LLMs with Best-of-N Distillation 提出BOND算法,通过模仿Best-of-N采样提升大语言模型性能,降低推理计算开销。 reinforcement learning RLHF distillation
3 Longhorn: State Space Models are Amortized Online Learners Longhorn:将状态空间模型视为在线学习器的摊销版本,提升序列建模性能。 Mamba SSM state space model
4 Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification 揭示RLHF中KL散度正则化在重尾奖励函数下的失效问题:灾难性Goodhart现象 reinforcement learning RLHF
5 On Policy Evaluation Algorithms in Distributional Reinforcement Learning 提出一种新的分布强化学习策略评估算法,适用于具有任意概率奖励机制的MDP reinforcement learning DRL
6 Investigating the Indirect Object Identification circuit in Mamba 研究Mamba模型中的间接对象识别电路,揭示其内部机制。 Mamba SSM
7 Decomposed Direct Preference Optimization for Structure-Based Drug Design 提出DecompDPO,利用多粒度偏好优化结构药物设计扩散模型。 DPO direct preference optimization
8 A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning 利用Reticulate包,实现R与Python在数据科学、机器学习和强化学习中的高效协同 reinforcement learning
9 OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning OASIS:面向离线安全强化学习的条件分布塑造方法 reinforcement learning
10 Data-Centric Human Preference with Rationales for Direct Preference Alignment 提出基于理由的数据中心人类偏好对齐方法,提升直接偏好优化效率。 reinforcement learning preference learning direct preference optimization
11 L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering 提出L2CL:一种简易的层间对比学习图协同过滤方法,提升推荐性能。 contrastive learning
12 Towards the Causal Complete Cause of Multi-Modal Representation Learning 提出C³正则化方法,通过因果完备性提升多模态表征学习效果 representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
13 Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference 提出通用性能建模方法,分析分布式LLM训练和推理的性能瓶颈与技术趋势。 large language model
14 NeuroBind: Towards Unified Multimodal Representations for Neural Signals NeuroBind:面向神经信号的统一多模态表征学习框架 multimodal
15 Enhancing Graph Neural Networks with Limited Labeled Data by Actively Distilling Knowledge from Large Language Models 提出基于图神经网络和主动蒸馏的知识增强方法,解决少样本节点分类问题 large language model
16 AuditNet: A Conversational AI-based Security Assistant [DEMO] AuditNet:基于对话式AI的安全助手,辅助物联网网络安全专家 large language model
17 Generative Language Model for Catalyst Discovery 提出CatGPT,一种用于生成新型催化剂结构的生成式语言模型 foundation model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页