cs.LG（2024-08-27）

📊 共 24 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (16 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (16 篇)

#	题目	一句话要点	标签	🔗	⭐
1	UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function	UNA：通过广义隐式奖励函数统一RLHF/PPO、DPO和KTO对齐方法	PPO RLHF DPO
2	Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning	提出iAC框架，利用优化解函数作为离线强化学习的确定性策略，提升鲁棒性。	reinforcement learning offline RL offline reinforcement learning
3	Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	提出sLAVA：一种参数高效的量化混合专家视觉语言模型，用于半导体电子显微图像分析	teacher-student multimodal instruction following
4	Generative Verifiers: Reward Modeling as Next-Token Prediction	提出生成式验证器(GenRM)，利用下一token预测目标提升LLM推理性能。	DPO large language model chain-of-thought
5	Simultaneous Training of First- and Second-Order Optimizers in Population-Based Reinforcement Learning	提出在基于种群的强化学习中同时训练一阶和二阶优化器的方法	reinforcement learning TD3
6	The Mamba in the Llama: Distilling and Accelerating Hybrid Models	提出Transformer蒸馏至线性RNN的混合模型，并用硬件感知推测解码加速推理。	Mamba distillation	✅
7	Unsupervised-to-Online Reinforcement Learning	提出无监督到在线强化学习(U2O RL)，解决离线到在线强化学习的局限性。	reinforcement learning offline RL
8	Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning	Instruct-SkillMix：一种强大的LLM指令调优自动化流程，低成本生成高质量SFT数据。	PPO DPO instruction following
9	Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation	提出DP-SAD，通过随机对抗蒸馏学习差分隐私扩散模型，提升生成质量。	distillation
10	What makes math problems hard for reinforcement learning: a case study	通过组合群论猜想，研究强化学习在寻找稀有高回报实例中的挑战	reinforcement learning
11	On latent dynamics learning in nonlinear reduced order modeling	提出潜变量动力学模型(LDM)用于非线性降阶建模，提升参数化偏微分方程的求解精度。	latent dynamics
12	Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning	提出近似对称性方法以解决多智能体强化学习问题	reinforcement learning
13	Dynamic operator management in meta-heuristics using reinforcement learning: an application to permutation flowshop scheduling problems	提出基于强化学习的动态算子管理框架，用于置换流水车间调度问题。	reinforcement learning
14	Learning Granularity Representation for Temporal Knowledge Graph Completion	提出LGRe模型，利用多粒度时间表示增强时序知识图谱补全	representation learning TAMP	✅
15	DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing	提出基于DRL的联邦自监督学习任务卸载与资源分配算法，优化ISAC车辆边缘计算。	DRL
16	Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction	提出可解释的分层城市表征学习模型，用于通勤流量预测	representation learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
17	NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals	NeuroLM：首个通用多任务脑电信号处理大模型，弥合语言与脑电信号的鸿沟	large language model foundation model
18	Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning	提出多模态融合框架MMF，结合LLM与GNN提升化学性质预测精度。	large language model
19	Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models	提出基于图像基础模型的免训练时序异常检测方法ITF-TAD，解决深度学习模型训练不稳定和调参困难问题。	foundation model
20	PAT: Pruning-Aware Tuning for Large Language Models	提出PAT：一种面向大语言模型的剪枝感知调优方法，提升效率并保持性能。	large language model	✅
21	The Benefits of Balance: From Information Projections to Variance Reduction	揭示数据平衡在多模态学习中的益处：方差缩减的理论与实践	foundation model multimodal
22	The Uniqueness of LLaMA3-70B Series with Per-Channel Quantization	针对LLaMA3-70B量化脆弱性，提出混合粒度和双平滑策略，提升W8A8量化精度	large language model
23	GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs	GIFT-SW：通过高斯噪声注入微调LLM显著权重，实现高效参数微调	large language model

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications	Poly2Vec：一种用于GeoAI应用的基于多态傅里叶变换的地理空间对象编码方法	spatial relationship

⬅️ 返回 cs.LG 首页 · 🏠 返回主页