cs.LG(2026-01-05)

📊 共 16 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents CORE:基于代码逆向自训练框架与图扩展,提升虚拟代理行为多样性 reinforcement learning behavior cloning reward design
2 MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics MDAgent2:用于分子动力学代码生成和知识问答的大语言模型 reinforcement learning large language model
3 Distorted Distributional Policy Evaluation for Offline Reinforcement Learning 提出扭曲分布策略评估,解决离线强化学习中过度保守问题 reinforcement learning DRL offline reinforcement learning
4 SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines 提出SRAS:一种轻量级强化学习文档选择器,用于边缘原生RAG流水线。 reinforcement learning PPO
5 ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense ACDZero:基于图嵌入树搜索的自动化网络防御方法 reinforcement learning deep reinforcement learning distillation
6 Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning 评估偏好强化学习中特征依赖噪声的影响,揭示现有噪声鲁棒方法的局限性。 reinforcement learning
7 Moments Matter:Stabilizing Policy Optimization using Return Distributions 利用回报分布矩稳定策略优化,提升连续控制任务的鲁棒性。 reinforcement learning deep reinforcement learning PPO
8 UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk UnPII:提出一种可量化风险的PII非学习方法,解决LLM中隐私数据删除问题。 direct preference optimization large language model
9 Latent Space Element Method 提出潜空间单元法(LSEM),用于构建可扩展的偏微分方程代理求解器。 latent dynamics foundation model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
10 ELLA: Efficient Lifelong Learning for Adapters in Large Language Models 提出ELLA框架以解决大语言模型的持续学习遗忘问题 large language model
11 Heterogeneous Low-Bandwidth Pre-Training of LLMs 提出异构低带宽预训练框架,结合SparseLoCo与压缩流水线并行,提升LLM训练效率。 large language model
12 DatBench: Discriminative, Faithful, and Efficient VLM Evaluations DatBench:提出兼具区分性、可靠性和高效性的VLM评估基准。 foundation model
13 Output Embedding Centering for Stable LLM Pretraining 提出输出嵌入中心化(OEC)方法,解决LLM预训练中输出Logit发散问题。 large language model
14 Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance 仅用单样本修复微调LLM安全性,实现效用无损且成本极低的安全对齐。 large language model
15 HyperCLOVA X 8B Omni HyperCLOVA X 8B Omni:首个支持任意模态输入输出的80亿参数全模态模型 multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data RealPDEBench:首个集成真实世界数据的复杂物理系统科学机器学习基准 sim-to-real foundation model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页