cs.LG（2025-12-16）

📊 共 13 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (9 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Understanding the Gain from Data Filtering in Multimodal Contrastive Learning	提出教师模型过滤以提升多模态对比学习效果	representation learning contrastive learning multimodal
2	Joint Multimodal Contrastive Learning for Robust Spoken Term Detection and Keyword Spotting	提出联合多模态对比学习框架，提升语音检索任务的鲁棒性与效率	contrastive learning multimodal
3	A First-Order Logic-Based Alternative to Reward Models in RLHF	提出基于逻辑相似性的S-GRPO，替代RLHF中的奖励模型，提升对齐效果。	reinforcement learning PPO preference learning	✅
4	EXAONE Path 2.5: Pathology Foundation Model with Multi-Omics Alignment	EXAONE Path 2.5：多组学对齐的病理学基础模型，用于更全面的肿瘤生物学理解	contrastive learning foundation model multimodal
5	Understanding and Improving Hyperbolic Deep Reinforcement Learning	提出Hyper++，解决双曲深度强化学习中梯度不稳定和训练困难问题	reinforcement learning deep reinforcement learning PPO	✅
6	Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes	提出QR-MAX算法，解决离散动作非马尔可夫奖励决策过程中的模型学习与策略优化问题	reinforcement learning model-based RL
7	ParaFormer: A Generalized PageRank Graph Transformer for Graph Representation Learning	提出ParaFormer，一种基于PageRank增强的图Transformer，缓解图表示学习中的过平滑问题。	representation learning	✅
8	Kinetic-Mamba: Mamba-Assisted Predictions of Stiff Chemical Kinetics	Kinetic-Mamba：利用Mamba架构预测刚性化学动力学，提升燃烧模拟精度。	Mamba
9	Explainable Preference Learning: a Decision Tree-based Surrogate Model for Preferential Bayesian Optimization	提出基于决策树的可解释偏好学习模型以优化偏好贝叶斯优化	preference learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Cornserve: Efficiently Serving Any-to-Any Multimodal Models	Cornserve：高效服务任意到任意多模态模型的在线服务系统	large language model multimodal
11	Estimating problem difficulty without ground truth using Large Language Model comparisons	提出LLM compare以解决无基准真值问题的难度估计	large language model
12	RePo: Language Models with Context Re-Positioning	提出RePo：通过上下文重定位增强语言模型处理噪声、结构化数据和长文本能力	large language model	✅
13	FLAME: Flow Enhanced Legendre Memory Models for General Time Series Forecasting	FLAME：基于流增强勒让德记忆模型，用于通用时间序列预测	foundation model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页