cs.LG（2025-10-20）

📊 共 42 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (18 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (14 🔗1) 支柱八：物理动画 (Physics-based Animation) (4 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱五：交互与反应 (Interaction & Reaction) (2 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (18 篇)

#	题目	一句话要点	标签	🔗	⭐
1	An Enhanced Dual Transformer Contrastive Network for Multimodal Sentiment Analysis	提出双Transformer对比网络DTCN，用于增强多模态情感分析性能。	representation learning contrastive learning multimodal
2	Inference-Time Compute Scaling For Flow Matching	针对Flow Matching，提出保持线性插值的推理时计算缩放方法，提升生成质量。	flow matching large language model
3	UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts	UniRL-Zero：提出融合语言模型和扩散模型专家的统一强化学习框架	reinforcement learning multimodal	✅
4	Plasma Shape Control via Zero-shot Generative Reinforcement Learning	提出基于零样本生成强化学习的等离子体形状控制方法	reinforcement learning imitation learning representation learning
5	Diffusion Models as Dataset Distillation Priors	提出DAP：利用扩散模型先验提升数据集蒸馏的代表性，无需额外训练。	distillation foundation model
6	Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning	提出COMPASS，解决无监督测试时强化学习中LLM奖励估计难题	reinforcement learning large language model
7	Fine-tuning Flow Matching Generative Models with Intermediate Feedback	提出AC-Flow框架，通过中间反馈微调Flow Matching生成模型，提升文图对齐。	flow matching reward shaping
8	TrajMamba: An Efficient and Semantic-rich Vehicle Trajectory Pre-training Model	TrajMamba：高效且语义丰富的车辆轨迹预训练模型，解决轨迹数据利用难题。	Mamba distillation
9	Provably Optimal Reinforcement Learning under Safety Filtering	提出安全过滤下的可证明最优强化学习方法，解决安全约束下的性能下降问题	reinforcement learning
10	An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning	研究安全强化学习中拉格朗日方法的性能与稳定性，揭示自动更新乘子的挑战与改进方向。	reinforcement learning	✅
11	Demystifying Transition Matching: When and Why It Can Beat Flow Matching	揭示Transition Matching优势：在分离模态和非零方差目标分布下超越Flow Matching	flow matching
12	Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods	构建批量精馏异常检测机器学习方法开发所需的大规模开放实验数据集	distillation
13	R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning	提出R2L：一种可靠强化学习方法，保证回报并优化不确定性下的策略。	reinforcement learning
14	Efficient Algorithms for Mitigating Uncertainty and Risk in Reinforcement Learning	针对不确定性强化学习，提出高效算法以优化风险指标并提升策略性能	reinforcement learning
15	Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs	提出自洽性认证框架，为LLM推理提供统计保证和测试时训练方法。	reinforcement learning large language model
16	TabR1: Taming GRPO for tabular reasoning LLMs	TabR1：提出基于GRPO的表格推理LLM，提升零样本和小样本学习能力	reinforcement learning large language model
17	Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks	利用物理信息神经网络辅助强化学习优化智能电网能量管理	reinforcement learning
18	EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning	EvoSyn：面向可验证学习的通用进化数据合成框架	reinforcement learning distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
19	MILES: Modality-Informed Learning Rate Scheduler for Balancing Multimodal Learning	提出MILES：一种模态感知学习率调度器，用于平衡多模态学习。	multimodal
20	Foundation Models for Discovery and Exploration in Chemical Space	MIST分子基石模型：助力化学空间探索与材料发现	foundation model
21	Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning	提出GMM引导的自适应损失，量化多模态不平衡并提升音视频学习性能	multimodal
22	Deeper with Riemannian Geometry: Overcoming Oversmoothing and Oversquashing for Graph Foundation Models	提出基于黎曼几何的GBN网络，解决图神经网络的过平滑和过挤压问题	foundation model
23	Efficient Long-context Language Model Training by Core Attention Disaggregation	提出核心注意力解耦（CAD）技术，高效训练长文本语言模型。	large language model
24	Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth	Any-Depth Alignment (ADA)解锁LLM的深度安全对齐，无需模型参数修改。	large language model
25	Benchmarking Probabilistic Time Series Forecasting Models on Neural Activity	评估概率时间序列预测模型在神经活动预测中的性能，为闭环控制提供基础。	foundation model
26	Unbiased Gradient Low-Rank Projection	提出GUM：一种基于层采样和Muon算法的无偏梯度低秩投影优化方法	large language model
27	MARS-M: When Variance Reduction Meets Matrices	提出MARS-M优化器以提升大规模神经网络训练效率	large language model	✅
28	Enabling Fine-Grained Operating Points for Black-Box LLMs	针对黑盒LLM，提出提升操作粒度且不损失性能的有效方法	large language model
29	LILO: Bayesian Optimization with Interactive Natural Language Feedback	提出LILO框架，利用自然语言反馈进行贝叶斯优化，提升人机交互效率。	large language model
30	I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models	I-RAVEN-X：用于评估LLM/LRM类比和数学推理泛化性与鲁棒性的基准	large language model
31	Localist LLMs with Recruitment Learning	提出基于招募学习的局部化LLM框架，实现可解释性与高性能的动态平衡。	large language model
32	Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling	提出Auto-Rubric框架，通过学习可泛化准则提升奖励模型的数据效率和可解释性	large language model

🔬 支柱八：物理动画 (Physics-based Animation) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
33	MEG-GPT: A transformer-based foundation model for magnetoencephalography data	MEG-GPT：基于Transformer的脑磁图数据基础模型，提升神经解码性能。	spatiotemporal foundation model
34	CEPerFed: Communication-Efficient Personalized Federated Learning for Multi-Pulse MRI Classification	提出CEPerFed，一种面向多脉冲MRI分类的通信高效个性化联邦学习方法	PULSE	✅
35	SAFE-D: A Spatiotemporal Detection Framework for Abnormal Driving Among Parkinson's Disease-like Drivers	SAFE-D：针对帕金森病患者驾驶异常行为的时空检测框架	spatiotemporal
36	Cross-Domain Long-Term Forecasting: Radiation Dose from Sparse Neutron Sensor via Spatio-Temporal Operator Network	提出STONe，通过时空算子网络解决稀疏中子传感器下的跨域长期辐射剂量预测问题	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
37	D2C-HRHR: Discrete Actions with Double Distributional Critics for High-Risk-High-Return Tasks	提出D2C-HRHR框架，解决高风险高回报任务中多模态动作分布的强化学习问题	locomotion manipulation reinforcement learning
38	Closing the Sim2Real Performance Gap in RL	提出双层强化学习框架，直接优化仿真参数以缩小Sim2Real性能差距	sim2real

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
39	Quantum Federated Learning: Architectural Elements and Future Directions	综述量子联邦学习：架构要素、分类体系与未来方向	OMOMO
40	RINS-T: Robust Implicit Neural Solvers for Time Series Linear Inverse Problems	RINS-T：针对时间序列线性反问题的鲁棒隐式神经求解器，无需预训练。	IMoS	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
41	Attention-Guided Deep Adversarial Temporal Subspace Clustering (A-DATSC) Model for multivariate spatiotemporal data	提出A-DATSC模型，用于解决多变量时空数据深度子空间聚类问题。	spatial relationship spatiotemporal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
42	On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration	提出FLAME框架，通过主动边缘样本探索实现开放词汇目标检测的快速领域自适应。	open-vocabulary open vocabulary foundation model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页