cs.LG(2025-09-29)

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models ChessArena:用于评估大语言模型战略推理能力的国际象棋测试平台 large language model
2 MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series MAESTRO:自适应稀疏注意力与鲁棒学习用于多模态动态时间序列分析 multimodal
3 Conda: Column-Normalized Adam for Training Large Language Models Faster Conda:面向大规模语言模型,通过列归一化Adam加速训练。 large language model
4 FM-FoG: A Real-Time Foundation Model-based Wearable System for Freezing-of-Gait Mitigation 提出基于Foundation Model的FM-FoG系统,无需患者特定训练即可实时缓解步态冻结 foundation model
5 A TRIANGLE Enables Multimodal Alignment Beyond Cosine Similarity 提出TRIANGLE,通过三角形面积相似度提升多模态对齐效果 multimodal
6 Negative Pre-activations Differentiate Syntax 发现语言模型中负激活区分句法的机制,揭示Wasserstein神经元的关键作用 large language model
7 Model Correlation Detection via Random Selection Probing 提出随机选择探测方法以解决模型相关性检测问题 large language model
8 Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs 提出有界参数高效遗忘方法,解决LLM中不稳定的遗忘问题 large language model
9 Discrete Variational Autoencoding via Policy Search 提出基于策略搜索的离散变分自编码器,用于高效高维数据重建 multimodal
10 Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs 提出ALoRA:一种非对称多LoRA微调方法,提升LLM在多任务和联邦学习场景下的性能。 large language model
11 OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment OrthAlign:正交子空间分解解决大模型多目标对齐中的冲突问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
12 World Model for AI Autonomous Navigation in Mechanical Thrombectomy 提出基于世界模型的TD-MPC2算法,提升机械取栓术中AI自主导航性能 reinforcement learning SAC world model
13 MDD-Thinker: Towards Large Reasoning Models for Major Depressive Disorder Diagnosis MDD-Thinker:面向重度抑郁症诊断的推理增强大语言模型 reinforcement learning large language model multimodal
14 When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training 元-Bandit LLM训练中涌现的贪婪利用偏差研究 reinforcement learning reward design large language model
15 Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends 揭示GRPO的Off-Policy本质:为LLM的Off-Policy强化学习提供理论基础与算法指导 reinforcement learning large language model
16 Safe In-Context Reinforcement Learning 提出安全上下文强化学习方法,解决无参数更新适应过程中的安全约束问题 reinforcement learning
17 SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression SIRI:通过交错压缩迭代强化学习,提升大型推理模型的效率与准确性。 reinforcement learning
18 ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation 提出ORPO-Distill,通过混合策略偏好优化实现跨架构LLM蒸馏 distillation
19 Safe Reinforcement Learning-Based Vibration Control: Overcoming Training Risks with LQR Guidance 提出基于LQR引导的安全强化学习振动控制方法,解决训练过程中的安全风险。 reinforcement learning
20 Machine Learning Algorithms for Improving Black Box Optimization Solvers 综述:机器学习算法提升黑盒优化求解器性能 reinforcement learning Mamba
21 LEAF: A Robust Expert-Based Framework for Few-Shot Continual Event Detection LEAF:一种鲁棒的基于专家模型的少样本持续事件检测框架 contrastive learning distillation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页