cs.LG(2025-05-19)

📊 共 19 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (9)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Fractured Chain-of-Thought Reasoning 提出Fractured Sampling,通过截断CoT推理链提升大语言模型推理效率。 large language model chain-of-thought
2 Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning 提出CPO方法,解决多模态大语言模型在非平稳RFT中推理漂移问题 large language model chain-of-thought
3 Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space 提出LatentSeek,通过隐空间测试时策略梯度提升LLM推理能力 large language model chain-of-thought
4 Fine-tuning Quantized Neural Networks with Zeroth-order Optimization 提出量化零阶优化(QZO),在极低内存下微调量化神经网络。 large language model
5 Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression UltraDelta:首个数据无关的超高效Delta压缩方案,突破压缩上限。 large language model
6 TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks TinyAlign:通过缓解模态对齐瓶颈来提升轻量级视觉-语言模型 multimodal
7 Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens 中间推理Token的有效性研究:语义并非必要条件 chain-of-thought
8 Panda: A pretrained forecast model for chaotic dynamics Panda:一种用于混沌动力学预测的预训练模型,实现零样本泛化 foundation model
9 Incentivizing Truthful Language Models via Peer Elicitation Games 提出基于博弈论的Peer Elicitation Games,无需微调即可提升LLM的事实准确性。 large language model
10 FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference FreeKV:通过增强KV缓存检索提升LLM推理效率 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
11 Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning 提出策略驱动的世界模型自适应方法,提升离线MBRL在噪声环境下的鲁棒性 reinforcement learning policy learning offline reinforcement learning
12 Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL 提出模块化扩散策略训练,解耦引导与扩散模型,提升离线强化学习性能。 reinforcement learning offline RL diffusion policy
13 Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning 提出时间距离感知的迁移增强方法TempDATA,解决离线MBRL在稀疏奖励、长程任务中的难题。 reinforcement learning offline reinforcement learning model-based RL
14 HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity 构建HR-VILAGE-3K3M,用于呼吸道病毒免疫纵向基因表达的AI驱动系统免疫研究。 predictive model foundation model multimodal
15 4Hammer: a board-game reinforcement learning environment for the hour long time frame 提出4Hammer环境,用于评估强化学习和LLM在长时程复杂棋盘游戏中的表现 reinforcement learning large language model
16 Mean Flows for One-step Generative Modeling 提出MeanFlow模型,通过平均速度建模实现高效单步生成建模,显著提升图像生成质量。 flow matching curriculum learning distillation
17 RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs 分析RL微调LLM的结构性假设,揭示其退化为监督学习的本质 reinforcement learning large language model
18 Optimizing Anytime Reasoning via Budget Relative Policy Optimization 提出AnytimeReasoner,通过预算相对策略优化提升LLM在不同计算预算下的推理性能。 reinforcement learning large language model
19 One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling 提出基于Koopman理论的扩散模型单步离线蒸馏方法KDM,加速生成过程。 distillation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页