cs.LG(2025-10-13)

📊 共 38 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (24 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (11) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (24 篇)

#题目一句话要点标签🔗
1 Offline Reinforcement Learning with Generative Trajectory Policies 提出生成轨迹策略(GTP),提升离线强化学习中生成模型的性能与效率。 reinforcement learning offline RL offline reinforcement learning
2 ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding ReLook:利用多模态LLM进行视觉引导的强化学习,用于Agentic Web Coding reinforcement learning large language model multimodal
3 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models 提出边界引导策略优化(BGPO),解决扩散大语言模型RL训练中的内存瓶颈问题 reinforcement learning large language model
4 How Reinforcement Learning After Next-Token Prediction Facilitates Learning 提出强化学习后接续预测框架,提升LLM在推理任务中的泛化能力 reinforcement learning large language model chain-of-thought
5 Vision-LLMs for Spatiotemporal Traffic Forecasting 提出ST-Vision-LLM,将时空交通预测转化为视觉-语言融合问题,提升预测精度。 reinforcement learning spatiotemporal large language model
6 PhysioME: A Robust Multimodal Self-Supervised Framework for Physiological Signals with Missing Modalities PhysioME:针对生理信号缺失模态的鲁棒多模态自监督学习框架 contrastive learning multimodal
7 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs QeRL:面向LLM的量化增强强化学习框架,提升效率并增强探索能力 reinforcement learning large language model
8 ADARL: Adaptive Low-Rank Structures for Robust Policy Learning under Uncertainty 提出自适应低秩结构(AdaRL),用于不确定性下的鲁棒策略学习。 reinforcement learning policy learning SAC
9 Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling 提出CART,增强Decision Transformer在对抗随机博弈中的鲁棒性。 reinforcement learning decision transformer transformer policy
10 Query-Specific GNN: A Comprehensive Graph Representation Learning Method for Retrieval Augmented Generation 提出查询特定图神经网络(QSGNN)用于增强检索生成中多跳问题的知识检索。 representation learning large language model
11 AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution 提出AMiD,利用α混合辅助分布进行LLM知识蒸馏,提升性能和训练稳定性。 distillation large language model
12 Reinforcement Learning for Tool-Integrated Interleaved Thinking towards Cross-Domain Generalization 提出RITE方法以解决跨领域工具增强强化学习的泛化问题 reinforcement learning large language model
13 Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning 提出Cog-Rethinker以解决LLM推理中的样本利用效率问题 reinforcement learning large language model
14 Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM 提出RFTHGS框架,通过强化学习微调小型LLM,为CVRP的HGS求解器生成高性能交叉算子。 reinforcement learning large language model
15 Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs 提出AT-GRPO算法,解决多智能体LLM协作中的策略优化难题 reinforcement learning large language model
16 Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning 提出自适应熵正则化AER,解决LLM强化学习中策略熵崩溃问题 reinforcement learning large language model
17 GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving 提出GAR:生成对抗强化学习框架,用于形式化定理证明,提升训练效率和性能。 reinforcement learning curriculum learning
18 Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning 针对非平稳强化学习,提出高效重启策略以提升动态后悔值 reinforcement learning
19 MEET-Sepsis: Multi-Endogenous-View Enhanced Time-Series Representation Learning for Early Sepsis Prediction MEET-Sepsis:用于早期脓毒症预测的多内生视图增强时间序列表示学习 representation learning
20 Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony ROLL Flash:通过异步化加速RLVR和Agentic任务中的强化学习训练 reinforcement learning large language model
21 Emergence of hybrid computational dynamics through reinforcement learning 强化学习驱动循环神经网络涌现混合计算动力学,提升决策任务性能 reinforcement learning
22 Robust Photoplethysmography Signal Denoising via Mamba Networks 提出基于Mamba网络的DPNet,用于鲁棒的光电容积脉搏波信号去噪,提升可穿戴设备心率估计精度。 Mamba
23 Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation 提出PerSyn:通过路由引导的多教师蒸馏实现个性化数据合成,提升学生模型性能。 distillation
24 Don't Walk the Line: Boundary Guidance for Filtered Generation 提出边界引导方法,提升生成模型安全性与效用性,避免生成结果落入分类器决策边界附近。 reinforcement learning reward design

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
25 Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection 评估开源视觉-语言模型在多模态讽刺检测中的性能 multimodal
26 Medical Interpretability and Knowledge Maps of Large Language Models 研究大型语言模型在医学领域的知识表征与处理方式,揭示模型内部知识图谱。 large language model
27 Protein as a Second Language for LLMs 提出蛋白质二语框架,利用LLM零样本理解蛋白质功能,超越特定领域模型。 large language model foundation model
28 Indoor Localization using Compact, Telemetry-Agnostic, Transfer-Learning Enabled Decoder-Only Transformer 提出Locaris,一种基于Decoder-Only Transformer的室内定位方法,无需校准且具有良好的迁移能力。 large language model
29 Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models 提出一种追踪和操控LLM推理过程的算法原语框架,揭示其组合几何特性。 large language model
30 Z0-Inf: Zeroth Order Approximation for Data Influence 提出Z0-Inf,一种高效的零阶近似方法用于数据影响评估,适用于大型模型。 large language model
31 Instruction Tuning Chronologically Consistent Language Models 构建时间一致的指令调优语言模型,消除前瞻偏差 large language model
32 Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews 利用大语言模型进行系统文献综述的半自动语料过滤 large language model
33 ENIGMA: The Geometry of Reasoning and Alignment in Large-Language Models ENIGMA:通过信息几何优化LLM的推理、对齐和鲁棒性 chain-of-thought
34 Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs 提出面向稀疏分块对角LLM的高效存内加速框架,提升资源受限系统性能。 large language model
35 Bolster Hallucination Detection via Prompt-Guided Data Augmentation 提出PALE框架,通过提示引导的数据增强提升大语言模型幻觉检测性能 large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
36 FUSE: Fast Semi-Supervised Node Embedding Learning via Structural and Label-Aware Optimization FUSE:一种快速半监督节点嵌入学习方法,通过结构和标签感知优化 structure preservation
37 Learning the Structure of Connection Graphs 提出SCGL算法,从观测信号中学习连接图结构,提升拓扑恢复和几何保真度。 geometric consistency

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
38 Cross-Scale Reservoir Computing for large spatio-temporal forecasting and modeling 提出跨尺度储层计算方法,用于高分辨率时空数据长期预测与建模。 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页