cs.LG（2025-10-31）

📊 共 32 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (16 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (15 🔗2) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (16 篇)

#	题目	一句话要点	标签	🔗	⭐
1	DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models	DeepThinkVLA通过混合注意力机制和双阶段训练提升VLA模型推理能力	reinforcement learning vision-language-action VLA
2	Iterative Foundation Model Fine-Tuning on Multiple Rewards	提出基于多重奖励的迭代式基础模型微调方法，提升生成任务性能	reinforcement learning foundation model
3	A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control	提出HeraldLight，一种双LLM架构，用于并行细粒度交通信号控制，显著降低平均通行时间和排队长度。	reinforcement learning large language model	✅
4	MVeLMA: Multimodal Vegetation Loss Modeling Architecture for Predicting Post-fire Vegetation Loss	MVeLMA：多模态植被损失建模架构，用于预测火灾后植被损失	predictive model multimodal
5	MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data	MedM2T：一种用于电子病历和心电图数据的时间感知多模态建模框架	MAE multimodal	✅
6	When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making	利用强化学习的市场做市商对抗中频交易者，揭示逆向选择机制	reinforcement learning PPO imitation learning
7	Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning	提出基于深度强化学习的供应链金融决策模型，提升企业经济效益预测精度。	reinforcement learning deep reinforcement learning
8	Higher-order Linear Attention	提出高阶线性注意力机制，解决自回归语言模型长文本处理的二次复杂度问题	SSM state space model linear attention	✅
9	Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning	提出领域知识驱动的强化学习方法，提升混沌对流控制的鲁棒性	reinforcement learning reward design
10	LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers	LC-Opt：数据中心液冷优化基准，利用强化学习和Agentic AI实现端到端控制。	reinforcement learning distillation
11	Soft Task-Aware Routing of Experts for Equivariant Representation Learning	提出软任务感知路由专家（STAR），提升等变表征学习效率。	representation learning	✅
12	Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems	研究开放多智能体系统中信用分配难题，揭示开放性对性能的影响	reinforcement learning
13	Simplex-to-Euclidean Bijections for Categorical Flow Matching	提出基于单纯形-欧几里得空间双射的分类流匹配方法，用于学习单纯形上的概率分布。	flow matching
14	Reasoning Models Sometimes Output Illegible Chains of Thought	强化学习训练的推理模型CoT链条可读性降低，影响意图理解与恶意行为检测。	reinforcement learning chain-of-thought
15	Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation	提出IWD：一种基于影响函数的数据集蒸馏方法，提升模型性能。	distillation
16	Towards Understanding Self-play for LLM Reasoning	分析自博弈训练机制，提升LLM推理能力，揭示其与RLVR和SFT的差异与局限。	reinforcement learning large language model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
17	TetraJet-v2: Accurate NVFP4 Training for Large Language Models with Oscillation Suppression and Outlier Control	TetraJet-v2：通过抑制振荡和控制异常值实现大语言模型精确NVFP4训练	large language model
18	Leveraging Generic Time Series Foundation Models for EEG Classification	利用通用时间序列基础模型进行脑电图(EEG)分类	foundation model
19	Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity	提出基于忠实性和冗余度的CoT可监控性度量方法，评估模型推理过程透明度。	chain-of-thought
20	Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler	提出贝叶斯数据调度器(BDS)，自适应防御大语言模型有害微调	large language model	✅
21	Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring	利用大语言模型生成理由提升自动作文评分性能	large language model
22	QiNN-QJ: A Quantum-inspired Neural Network with Quantum Jump for Multimodal Sentiment Analysis	提出基于量子跳跃的量子启发神经网络QiNN-QJ，用于多模态情感分析。	multimodal
23	A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data	提出混合生成框架，提升LLM合成数据在因果推断中的平均处理效应估计准确性。	large language model	✅
24	Calibration Across Layers: Understanding Calibration Evolution in LLMs	揭示LLM校准机制：层间校准演化与低维校准方向	large language model
25	Position: Vibe Coding Needs Vibe Reasoning: Improving Vibe Coding with Formal Verification	提出形式化验证辅助的Vibe Coding框架，提升LLM驱动软件开发的可靠性	large language model
26	PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes	PDE-SHARP：通过分析与优化迭代，显著降低LLM驱动的PDE求解器计算成本。	chain-of-thought
27	On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection	针对LLM代码漏洞检测，提出基于错误模式和相似性的Few-Shot样例选择方法	large language model
28	ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling	ORGEval：基于图论的LLM优化建模能力评估框架	large language model
29	Thought Branches: Interpreting LLM Reasoning Requires Resampling	提出基于重采样的Thought Branches方法，用于更可靠地解释LLM的推理过程。	chain-of-thought
30	A Comparative Analysis of LLM Adaptation: SFT, LoRA, and ICL in Data-Scarce Scenarios	数据稀缺场景下LLM适应性对比分析：SFT、LoRA与ICL	large language model
31	ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models	提出ECVL-ROUTER，针对不同场景需求动态路由视觉-语言模型。	multimodal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
32	MDAS-GNN: Multi-Dimensional Spatiotemporal GNN with Spatial Diffusion for Urban Traffic Risk Forecasting	MDAS-GNN：融合空间扩散的多维时空图神经网络用于城市交通风险预测	spatial relationship spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页