cs.LG(2024-08-27)

📊 共 24 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (16 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (16 篇)

#题目一句话要点标签🔗
1 UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function UNA:通过广义隐式奖励函数统一RLHF/PPO、DPO和KTO对齐方法 PPO RLHF DPO
2 Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning 提出iAC框架,利用优化解函数作为离线强化学习的确定性策略,提升鲁棒性。 reinforcement learning offline RL offline reinforcement learning
3 Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis 提出sLAVA:一种参数高效的量化混合专家视觉语言模型,用于半导体电子显微图像分析 teacher-student multimodal instruction following
4 Generative Verifiers: Reward Modeling as Next-Token Prediction 提出生成式验证器(GenRM),利用下一token预测目标提升LLM推理性能。 DPO large language model chain-of-thought
5 Simultaneous Training of First- and Second-Order Optimizers in Population-Based Reinforcement Learning 提出在基于种群的强化学习中同时训练一阶和二阶优化器的方法 reinforcement learning TD3
6 The Mamba in the Llama: Distilling and Accelerating Hybrid Models 提出Transformer蒸馏至线性RNN的混合模型,并用硬件感知推测解码加速推理。 Mamba distillation
7 Unsupervised-to-Online Reinforcement Learning 提出无监督到在线强化学习(U2O RL),解决离线到在线强化学习的局限性。 reinforcement learning offline RL
8 Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning Instruct-SkillMix:一种强大的LLM指令调优自动化流程,低成本生成高质量SFT数据。 PPO DPO instruction following
9 Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation 提出DP-SAD,通过随机对抗蒸馏学习差分隐私扩散模型,提升生成质量。 distillation
10 What makes math problems hard for reinforcement learning: a case study 通过组合群论猜想,研究强化学习在寻找稀有高回报实例中的挑战 reinforcement learning
11 On latent dynamics learning in nonlinear reduced order modeling 提出潜变量动力学模型(LDM)用于非线性降阶建模,提升参数化偏微分方程的求解精度。 latent dynamics
12 Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning 提出近似对称性方法以解决多智能体强化学习问题 reinforcement learning
13 Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems 提出基于强化学习的动态算子管理框架,用于置换流水车间调度问题。 reinforcement learning
14 Learning Granularity Representation for Temporal Knowledge Graph Completion 提出LGRe模型,利用多粒度时间表示增强时序知识图谱补全 representation learning TAMP
15 DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing 提出基于DRL的联邦自监督学习任务卸载与资源分配算法,优化ISAC车辆边缘计算。 DRL
16 Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction 提出可解释的分层城市表征学习模型,用于通勤流量预测 representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
17 NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals NeuroLM:首个通用多任务脑电信号处理大模型,弥合语言与脑电信号的鸿沟 large language model foundation model
18 Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning 提出多模态融合框架MMF,结合LLM与GNN提升化学性质预测精度。 large language model
19 Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models 提出基于图像基础模型的免训练时序异常检测方法ITF-TAD,解决深度学习模型训练不稳定和调参困难问题。 foundation model
20 PAT: Pruning-Aware Tuning for Large Language Models 提出PAT:一种面向大语言模型的剪枝感知调优方法,提升效率并保持性能。 large language model
21 The Benefits of Balance: From Information Projections to Variance Reduction 揭示数据平衡在多模态学习中的益处:方差缩减的理论与实践 foundation model multimodal
22 The Uniqueness of LLaMA3-70B Series with Per-Channel Quantization 针对LLaMA3-70B量化脆弱性,提出混合粒度和双平滑策略,提升W8A8量化精度 large language model
23 GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs GIFT-SW:通过高斯噪声注入微调LLM显著权重,实现高效参数微调 large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
24 Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications Poly2Vec:一种用于GeoAI应用的基于多态傅里叶变换的地理空间对象编码方法 spatial relationship

⬅️ 返回 cs.LG 首页 · 🏠 返回主页