cs.LG(2025-08-07)

📊 共 27 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Shuffle-R1:通过数据中心动态重组提升多模态大语言模型强化学习效率 reinforcement learning large language model multimodal
2 Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies 提出MLES:利用多模态LLM辅助进化搜索生成可解释程序化控制策略 reinforcement learning deep reinforcement learning PPO
3 Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning 分析多模态感知对模仿学习中样本复杂度和优化地形的影响 imitation learning multimodal
4 SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models SPaRFT:基于自步强化微调的大语言模型高效学习框架 reinforcement learning curriculum learning large language model
5 RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders 提出基于强化学习的LLM微调方法,利用隐式用户反馈优化对话式推荐系统 reinforcement learning PPO RLHF
6 Reasoning through Exploration: A Reinforcement Learning Framework for Robust Function Calling 提出基于探索性推理的强化学习框架EGPO,提升LLM函数调用能力 reinforcement learning large language model chain-of-thought
7 On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification 提出动态微调(DFT)方法,通过修正奖励结构提升SFT泛化能力 reinforcement learning offline RL large language model
8 Advanced Hybrid Transformer LSTM Technique with Attention and TS Mixer for Drilling Rate of Penetration Prediction 提出混合Transformer-LSTM模型,融合注意力机制与TS-Mixer,用于提升钻井ROP预测精度。 representation learning penetration
9 FlowState: Sampling Rate Invariant Time Series Forecasting FlowState:一种采样率不变的时间序列预测框架,提升泛化性和效率。 SSM state space model foundation model
10 Domain-driven Metrics for Reinforcement Learning: A Case Study on Epidemic Control using Agent-based Simulation 提出领域驱动的强化学习评估指标,用于基于Agent的疫情控制仿真。 reinforcement learning
11 R-Zero: Self-Evolving Reasoning LLM from Zero Data R-Zero:一种从零数据自进化的推理大语言模型框架 reinforcement learning large language model
12 Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas 提出基于分布式可重构智能超表面天线的抗干扰无线感知方法 reinforcement learning deep reinforcement learning DRL
13 R-Zero: Self-Evolving Reasoning LLM from Zero Data 提出R-Zero以解决自我进化推理模型的数据依赖问题 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
14 MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling MoMA:一种混合多模态Agent架构,用于增强临床预测建模 large language model multimodal
15 Iterative Learning of Computable Phenotypes for Treatment Resistant Hypertension using Large Language Models 利用大语言模型迭代学习可计算表型,解决难治性高血压问题 large language model
16 Group Causal Policy Optimization for Post-Training Large Language Models 提出Group Causal Policy Optimization (GCPO)以提升后训练大语言模型在推理任务上的性能 large language model
17 Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis 提出MMCI模型,通过解耦模内和模间因果注意力来消除多模态情感分析中的偏差。 multimodal
18 A Metric for MLLM Alignment in Large-scale Recommendation 提出Leakage Impact Score (LIS)指标,用于大规模推荐系统中多模态大语言模型对齐评估。 large language model multimodal
19 TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution TrajEvo:利用LLM驱动的进化算法自动设计轨迹预测启发式规则 large language model
20 An Effective Approach for Node Classification in Textual Graphs 提出TAPE与Graphormer融合框架,有效提升文本图节点分类精度 large language model
21 Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms Echo:解耦异构集群上的RL对齐推理与训练,提升LLM性能。 large language model
22 MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs MoBE:一种用于压缩MoE架构LLM的混合基专家方法,显著降低精度损失。 large language model
23 Cross-LoRA: A Data-Free LoRA Transfer Framework across Heterogeneous LLMs 提出Cross-LoRA,解决异构LLM间LoRA模块免数据迁移问题 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
24 ASkDAgger: Active Skill-level Data Aggregation for Interactive Imitation Learning 提出ASkDAgger,通过主动技能级数据聚合提升交互式模仿学习效率。 manipulation imitation learning language conditioned
25 Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes 提出基于信息论的LLM评估方法,提升对抗攻击下的鲁棒性 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
26 On the Design of Expressive and Trainable Pulse-based Quantum Machine Learning Models 研究脉冲量子机器学习模型的设计,兼顾表达性和可训练性 PULSE
27 Will You Be Aware? Eye Tracking-Based Modeling of Situational Awareness in Augmented Reality 提出基于眼动追踪的FixGraphPool模型,用于增强现实中情境感知建模 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页