cs.LG(2025-10-31)

📊 共 32 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (16 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (15 🔗2) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (16 篇)

#题目一句话要点标签🔗
1 DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models DeepThinkVLA通过混合注意力机制和双阶段训练提升VLA模型推理能力 reinforcement learning vision-language-action VLA
2 Iterative Foundation Model Fine-Tuning on Multiple Rewards 提出基于多重奖励的迭代式基础模型微调方法,提升生成任务性能 reinforcement learning foundation model
3 A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control 提出HeraldLight,一种双LLM架构,用于并行细粒度交通信号控制,显著降低平均通行时间和排队长度。 reinforcement learning large language model
4 MVeLMA: Multimodal Vegetation Loss Modeling Architecture for Predicting Post-fire Vegetation Loss MVeLMA:多模态植被损失建模架构,用于预测火灾后植被损失 predictive model multimodal
5 MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data MedM2T:一种用于电子病历和心电图数据的时间感知多模态建模框架 MAE multimodal
6 When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making 利用强化学习的市场做市商对抗中频交易者,揭示逆向选择机制 reinforcement learning PPO imitation learning
7 Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning 提出基于深度强化学习的供应链金融决策模型,提升企业经济效益预测精度。 reinforcement learning deep reinforcement learning
8 Higher-order Linear Attention 提出高阶线性注意力机制,解决自回归语言模型长文本处理的二次复杂度问题 SSM state space model linear attention
9 Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning 提出领域知识驱动的强化学习方法,提升混沌对流控制的鲁棒性 reinforcement learning reward design
10 LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers LC-Opt:数据中心液冷优化基准,利用强化学习和Agentic AI实现端到端控制。 reinforcement learning distillation
11 Soft Task-Aware Routing of Experts for Equivariant Representation Learning 提出软任务感知路由专家(STAR),提升等变表征学习效率。 representation learning
12 Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems 研究开放多智能体系统中信用分配难题,揭示开放性对性能的影响 reinforcement learning
13 Simplex-to-Euclidean Bijections for Categorical Flow Matching 提出基于单纯形-欧几里得空间双射的分类流匹配方法,用于学习单纯形上的概率分布。 flow matching
14 Reasoning Models Sometimes Output Illegible Chains of Thought 强化学习训练的推理模型CoT链条可读性降低,影响意图理解与恶意行为检测。 reinforcement learning chain-of-thought
15 Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation 提出IWD:一种基于影响函数的数据集蒸馏方法,提升模型性能。 distillation
16 Towards Understanding Self-play for LLM Reasoning 分析自博弈训练机制,提升LLM推理能力,揭示其与RLVR和SFT的差异与局限。 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (15 篇)

#题目一句话要点标签🔗
17 TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control TetraJet-v2:通过抑制振荡和控制异常值实现大语言模型精确NVFP4训练 large language model
18 Leveraging Generic Time Series Foundation Models for EEG Classification 利用通用时间序列基础模型进行脑电图(EEG)分类 foundation model
19 Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity 提出基于忠实性和冗余度的CoT可监控性度量方法,评估模型推理过程透明度。 chain-of-thought
20 Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler 提出贝叶斯数据调度器(BDS),自适应防御大语言模型有害微调 large language model
21 Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring 利用大语言模型生成理由提升自动作文评分性能 large language model
22 QiNN-QJ: A Quantum-inspired Neural Network with Quantum Jump for Multimodal Sentiment Analysis 提出基于量子跳跃的量子启发神经网络QiNN-QJ,用于多模态情感分析。 multimodal
23 A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data 提出混合生成框架,提升LLM合成数据在因果推断中的平均处理效应估计准确性。 large language model
24 Calibration Across Layers: Understanding Calibration Evolution in LLMs 揭示LLM校准机制:层间校准演化与低维校准方向 large language model
25 Position: Vibe Coding Needs Vibe Reasoning: Improving Vibe Coding with Formal Verification 提出形式化验证辅助的Vibe Coding框架,提升LLM驱动软件开发的可靠性 large language model
26 PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes PDE-SHARP:通过分析与优化迭代,显著降低LLM驱动的PDE求解器计算成本。 chain-of-thought
27 On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection 针对LLM代码漏洞检测,提出基于错误模式和相似性的Few-Shot样例选择方法 large language model
28 ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling ORGEval:基于图论的LLM优化建模能力评估框架 large language model
29 Thought Branches: Interpreting LLM Reasoning Requires Resampling 提出基于重采样的Thought Branches方法,用于更可靠地解释LLM的推理过程。 chain-of-thought
30 A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios 数据稀缺场景下LLM适应性对比分析:SFT、LoRA与ICL large language model
31 ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models 提出ECVL-ROUTER,针对不同场景需求动态路由视觉-语言模型。 multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
32 MDAS-GNN: Multi-Dimensional Spatiotemporal GNN with Spatial Diffusion for Urban Traffic Risk Forecasting MDAS-GNN:融合空间扩散的多维时空图神经网络用于城市交通风险预测 spatial relationship spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页