cs.LG(2024-06-26)
📊 共 21 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (10 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2)
支柱一:机器人控制 (Robot Control) (2)
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Preference Elicitation for Offline Reinforcement Learning | 提出Sim-OPRL算法,解决离线偏好强化学习中偏好反馈获取难题 | reinforcement learning offline RL offline reinforcement learning | ||
| 2 | Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | Step-DPO:面向LLM长链推理的分步偏好优化方法 | DPO direct preference optimization large language model | ✅ | |
| 3 | Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents | 提出S-DQN和S-PPO,提升平滑DRL智能体的效用性和鲁棒性 | reinforcement learning deep reinforcement learning DRL | ||
| 4 | CREMA: A Contrastive Regularized Masked Autoencoder for Robust ECG Diagnostics across Clinical Domains | CREMA:一种对比正则化掩码自编码器,用于跨临床领域的稳健心电图诊断 | masked autoencoder MAE foundation model | ||
| 5 | Mental Modeling of Reinforcement Learning Agents by Language Models | 利用语言模型对强化学习智能体进行心理建模,探索其行为理解能力 | reinforcement learning large language model | ||
| 6 | Mixture of Experts in a Mixture of RL settings | 在多任务强化学习中利用专家混合模型提升非平稳环境适应性 | reinforcement learning deep reinforcement learning DRL | ||
| 7 | Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control | 提出基于内在激励反馈图的强化学习方法,提升缺货库存控制的样本效率。 | reinforcement learning | ||
| 8 | PDFA Distillation via String Probability Queries | 提出基于字符串概率查询的PDFA蒸馏算法,用于从神经网络中提取可解释模型。 | distillation | ||
| 9 | Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies | 提出双向可达分层强化学习算法BrHPO,解决传统HRL单向依赖问题。 | reinforcement learning | ||
| 10 | Combining Automated Optimisation of Hyperparameters and Reward Shape | 提出超参数与奖励函数联合优化方法,提升强化学习在复杂任务中的性能与稳定性。 | reinforcement learning deep reinforcement learning |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Adversarial Search Engine Optimization for Large Language Models | 提出针对大语言模型的对抗性搜索引擎优化攻击,操纵LLM偏好选择。 | manipulation large language model | ||
| 21 | Jailbreaking LLMs with Arabic Transliteration and Arabizi | 利用阿拉伯语音译和Arabizi破解大型语言模型 | manipulation large language model |