cs.LG(2025-04-06)
📊 共 9 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Hessian of Perplexity for Large Language Models by PyTorch autograd (Open Source) | 利用PyTorch autograd计算大语言模型困惑度的Hessian矩阵 | large language model | ||
| 2 | Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression | Thanos:一种用于高效压缩大语言模型的块状剪枝算法 | large language model | ||
| 3 | ZeroED: Hybrid Zero-shot Error Detection through Large Language Model Reasoning | ZeroED:结合LLM推理的混合零样本错误检测框架,提升表格数据质量 | large language model | ||
| 4 | AROMA: Autonomous Rank-one Matrix Adaptation | 提出AROMA以解决低秩适应方法的静态分配问题 | large language model | ✅ | |
| 5 | AutoPDL: Automatic Prompt Optimization for LLM Agents | 提出AutoPDL以自动优化大型语言模型的提示配置 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning | 提出TRPA算法,结合规则与偏好优化,提升LLM在推理任务中的性能与稳定性。 | reinforcement learning PPO DPO | ✅ | |
| 7 | Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers | 利用Transformer和可扩展离线强化学习,实现人类水平的宝可梦对战AI | reinforcement learning offline RL offline reinforcement learning | ||
| 8 | Gating is Weighting: Understanding Gated Linear Attention through In-context Learning | 通过上下文学习理解门控线性注意力:门控即权重 | Mamba linear attention | ||
| 9 | A Novel Algorithm for Personalized Federated Learning: Knowledge Distillation with Weighted Combination Loss | 提出pFedKD-WCL算法,通过知识蒸馏和加权组合损失解决个性化联邦学习中的非独立同分布问题。 | distillation |