cs.LG(2024-10-10)

📊 共 11 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 Evolutionary Contrastive Distillation for Language Model Alignment 提出进化对比蒸馏(ECD)方法,提升LLM在复杂指令跟随任务上的性能 DPO contrastive learning distillation
2 Large Vision Model-Enhanced Digital Twin with Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks 提出基于大视觉模型增强数字孪生的深度强化学习方法,解决动态无线网络中的用户关联和负载均衡问题。 reinforcement learning deep reinforcement learning DRL
3 VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers VerifierQ:利用Q学习增强LLM测试时计算的验证器模型 reinforcement learning CQL IQL
4 COS-DPO: Conditioned One-Shot Multi-Objective Fine-Tuning Framework 提出COS-DPO,一种条件式单次多目标微调框架,用于解决多目标优化问题。 DPO direct preference optimization
5 Offline Hierarchical Reinforcement Learning via Inverse Optimization 提出OHIO框架,通过逆优化解决离线分层强化学习中的高层动作推断难题。 reinforcement learning offline reinforcement learning
6 Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning 提出过程优势验证器(PAV),通过奖励进步来提升LLM推理能力。 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
7 CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features 提出CSA,利用少量多模态数据高效学习单模态到多模态特征的映射 multimodal
8 Privately Learning from Graphs with Applications in Fine-tuning Large Language Models 提出一种面向图数据的差分隐私学习框架,用于安全微调大型语言模型 large language model
9 Think Beyond Size: Adaptive Prompting for More Effective Reasoning 提出自适应Prompting,提升LLM在复杂推理任务上的性能 large language model chain-of-thought
10 Chain-of-Sketch: Enabling Global Visual Reasoning 提出链式草图(CoS)方法,提升视觉模型在全局推理任务上的性能 chain-of-thought
11 Mars: Situated Inductive Reasoning in an Open-World Environment 提出Mars环境,用于评估智能体在开放世界中的情境归纳推理能力 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页