cs.LG（2025-09-29）

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (11 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (10 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models	ChessArena：用于评估大语言模型战略推理能力的国际象棋测试平台	large language model
2	MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series	MAESTRO：自适应稀疏注意力与鲁棒学习用于多模态动态时间序列分析	multimodal
3	Conda: Column-Normalized Adam for Training Large Language Models Faster	Conda：面向大规模语言模型，通过列归一化Adam加速训练。	large language model	✅
4	FM-FoG: A Real-Time Foundation Model-based Wearable System for Freezing-of-Gait Mitigation	提出基于Foundation Model的FM-FoG系统，无需患者特定训练即可实时缓解步态冻结	foundation model
5	A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity	提出TRIANGLE，通过三角形面积相似度提升多模态对齐效果	multimodal
6	Negative Pre-activations Differentiate Syntax	发现语言模型中负激活区分句法的机制，揭示Wasserstein神经元的关键作用	large language model
7	Model Correlation Detection via Random Selection Probing	提出随机选择探测方法以解决模型相关性检测问题	large language model
8	Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs	提出有界参数高效遗忘方法，解决LLM中不稳定的遗忘问题	large language model
9	Discrete Variational Autoencoding via Policy Search	提出基于策略搜索的离散变分自编码器，用于高效高维数据重建	multimodal
10	Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs	提出ALoRA：一种非对称多LoRA微调方法，提升LLM在多任务和联邦学习场景下的性能。	large language model	✅
11	OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment	OrthAlign：正交子空间分解解决大模型多目标对齐中的冲突问题	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
12	World Model for AI Autonomous Navigation in Mechanical Thrombectomy	提出基于世界模型的TD-MPC2算法，提升机械取栓术中AI自主导航性能	reinforcement learning SAC world model
13	MDD-Thinker: Towards Large Reasoning Models for Major Depressive Disorder Diagnosis	MDD-Thinker：面向重度抑郁症诊断的推理增强大语言模型	reinforcement learning large language model multimodal
14	When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training	元-Bandit LLM训练中涌现的贪婪利用偏差研究	reinforcement learning reward design large language model
15	Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends	揭示GRPO的Off-Policy本质：为LLM的Off-Policy强化学习提供理论基础与算法指导	reinforcement learning large language model	✅
16	Safe In-Context Reinforcement Learning	提出安全上下文强化学习方法，解决无参数更新适应过程中的安全约束问题	reinforcement learning
17	SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression	SIRI：通过交错压缩迭代强化学习，提升大型推理模型的效率与准确性。	reinforcement learning
18	ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation	提出ORPO-Distill，通过混合策略偏好优化实现跨架构LLM蒸馏	distillation
19	Safe Reinforcement Learning-Based Vibration Control: Overcoming Training Risks with LQR Guidance	提出基于LQR引导的安全强化学习振动控制方法，解决训练过程中的安全风险。	reinforcement learning
20	Machine Learning Algorithms for Improving Black Box Optimization Solvers	综述：机器学习算法提升黑盒优化求解器性能	reinforcement learning Mamba
21	LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection	LEAF：一种鲁棒的基于专家模型的少样本持续事件检测框架	contrastive learning distillation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页