cs.LG(2024-10-26)
📊 共 12 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2)
支柱一:机器人控制 (Robot Control) (3)
支柱二:RL算法与架构 (RL & Architecture) (2)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Centaur: a foundation model of human cognition | Centaur:一个预测人类认知的基础模型,可模拟多种实验场景下的人类行为。 | foundation model | ||
| 2 | Generative AI in Health Economics and Outcomes Research: A Taxonomy of Key Definitions and Emerging Applications, an ISPOR Working Group Report | 提出生成性人工智能以提升健康经济学与结果研究的效率与准确性 | foundation model chain-of-thought | ||
| 3 | Transferable Adversarial Attacks on SAM and Its Downstream Models | 提出UMI-GRAT,实现对SAM及其下游模型的可迁移对抗攻击 | foundation model | ✅ | |
| 4 | Prompt Diffusion Robustifies Any-Modality Prompt Learning | 提出Prompt Diffusion,提升任意模态Prompt Learning的鲁棒性。 | foundation model | ||
| 5 | Library Learning Doesn't: The Curious Case of the Single-Use "Library" | 揭示数学推理LLM库学习的单次使用现象,质疑其可重用性 | large language model | ✅ | |
| 6 | Model Equality Testing: Which Model Is This API Serving? | 提出模型等价性测试,用于检测黑盒API服务模型是否被篡改。 | large language model | ||
| 7 | Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading | 提出深度优化器状态以解决Transformer模型训练的内存瓶颈问题 | large language model |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL | 利用模拟器学习探索策略,提升真实世界强化学习效率,克服Sim2Real差距 | sim-to-real sim2real reinforcement learning | ||
| 9 | Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning | 提出CoDeTr,通过建模非马尔可夫奖励解决强化学习中复合延迟奖励问题 | locomotion reinforcement learning | ||
| 10 | Classification under strategic adversary manipulation using pessimistic bilevel optimisation | 提出基于悲观双层优化的对抗样本分类方法,提升恶意数据识别的鲁棒性。 | manipulation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Uncertainty-Penalized Direct Preference Optimization | 提出不确定性惩罚直接偏好优化方法,提升LLM对人类偏好对齐的鲁棒性。 | reinforcement learning offline reinforcement learning RLHF | ||
| 12 | GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks | 使用GFlowNet微调LLM,生成数学推理任务中多样化的正确解 | reinforcement learning large language model |