cs.LG（2024-07-19）

📊 共 17 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (12 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (5)

🔬 支柱二：RL算法与架构 (RL & Architecture) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C	对比DQN、PPO和A2C在BreakOut游戏中性能，为游戏AI提供参考	reinforcement learning deep reinforcement learning PPO
2	BOND: Aligning LLMs with Best-of-N Distillation	提出BOND算法，通过模仿Best-of-N采样提升大语言模型性能，降低推理计算开销。	reinforcement learning RLHF distillation
3	Longhorn: State Space Models are Amortized Online Learners	Longhorn：将状态空间模型视为在线学习器的摊销版本，提升序列建模性能。	Mamba SSM state space model
4	Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification	揭示RLHF中KL散度正则化在重尾奖励函数下的失效问题：灾难性Goodhart现象	reinforcement learning RLHF
5	On Policy Evaluation Algorithms in Distributional Reinforcement Learning	提出一种新的分布强化学习策略评估算法，适用于具有任意概率奖励机制的MDP	reinforcement learning DRL
6	Investigating the Indirect Object Identification circuit in Mamba	研究Mamba模型中的间接对象识别电路，揭示其内部机制。	Mamba SSM
7	Decomposed Direct Preference Optimization for Structure-Based Drug Design	提出DecompDPO，利用多粒度偏好优化结构药物设计扩散模型。	DPO direct preference optimization
8	A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning	利用Reticulate包，实现R与Python在数据科学、机器学习和强化学习中的高效协同	reinforcement learning
9	OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning	OASIS：面向离线安全强化学习的条件分布塑造方法	reinforcement learning
10	Data-Centric Human Preference with Rationales for Direct Preference Alignment	提出基于理由的数据中心人类偏好对齐方法，提升直接偏好优化效率。	reinforcement learning preference learning direct preference optimization
11	L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering	提出L2CL：一种简易的层间对比学习图协同过滤方法，提升推荐性能。	contrastive learning	✅
12	Towards the Causal Complete Cause of Multi-Modal Representation Learning	提出C³正则化方法，通过因果完备性提升多模态表征学习效果	representation learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference	提出通用性能建模方法，分析分布式LLM训练和推理的性能瓶颈与技术趋势。	large language model
14	NeuroBind: Towards Unified Multimodal Representations for Neural Signals	NeuroBind：面向神经信号的统一多模态表征学习框架	multimodal
15	Enhancing Graph Neural Networks with Limited Labeled Data by Actively Distilling Knowledge from Large Language Models	提出基于图神经网络和主动蒸馏的知识增强方法，解决少样本节点分类问题	large language model
16	AuditNet: A Conversational AI-based Security Assistant [DEMO]	AuditNet：基于对话式AI的安全助手，辅助物联网网络安全专家	large language model
17	Generative Language Model for Catalyst Discovery	提出CatGPT，一种用于生成新型催化剂结构的生成式语言模型	foundation model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页