cs.LG(2026-01-09)

📊 共 14 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8) 支柱九:具身大模型 (Embodied Foundation Models) (6)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks 提出双阶段LLM推理框架,通过自进化数学框架提升模型数学问题求解能力 reinforcement learning large language model chain-of-thought
2 MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization MaxCode:基于最大奖励强化学习的代码自动优化框架 reinforcement learning large language model
3 DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis 提出DeMa:双路径延迟感知Mamba模型,高效处理多元时间序列分析任务 Mamba linear attention
4 AWaRe-SAC: Proactive Slice Admission Control under Weather-Induced Capacity Uncertainty 提出AWaRe-SAC框架,解决毫米波x-haul网络中天气不确定性下的切片准入控制问题 SAC
5 IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck 提出IIB-LPO,通过迭代信息瓶颈解决LLM推理中的探索崩溃问题。 reinforcement learning large language model
6 Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning 提出基于策略梯度强化学习的无限维序贯贝叶斯最优实验设计方法 reinforcement learning
7 Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR 提出动态混合策略优化DHPO,提升RLVR在数学推理任务中的性能与稳定性。 reinforcement learning large language model
8 Autonomous Discovery of the Ising Model's Critical Parameters with Reinforcement Learning 提出物理启发的自适应强化学习框架,自主发现Ising模型的临界参数。 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
9 CyberGFM: Graph Foundation Models for Lateral Movement Detection in Enterprise Networks CyberGFM:利用图基础模型检测企业网络中的横向移动 foundation model
10 Over-Searching in Search-Augmented Large Language Models 研究发现搜索增强大语言模型存在过度搜索问题,并提出评估指标与缓解方法。 large language model
11 Do Sparse Autoencoders Identify Reasoning Features in Language Models? 提出稀疏自编码器框架以识别语言模型中的推理特征 large language model
12 Transformer Is Inherently a Causal Learner 揭示Transformer自回归训练的因果学习能力,实现时间序列因果图发现 foundation model
13 DNATokenizer: A GPU-First Byte-to-Identifier Tokenizer for High-Throughput DNA Language Models 提出DNATokenizer,一种GPU优先的字节到标识符分词器,用于高吞吐量DNA语言模型。 foundation model
14 Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection 提出Hi-ZFO以解决大语言模型微调中的优化效率问题 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页