cs.LG（2026-01-08）

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (11 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱三：空间感知与语义 (Perception & Semantics) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration	MPM-LLM4DSE：利用多模态学习和LLM驱动探索，达到HLS设计空间的帕累托前沿	predictive model large language model multimodal	✅
2	Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following	高精度奖励胜过多样性：提升指令跟随的鲁棒性与泛化性	reinforcement learning instruction following
3	Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead	提出Nightmare Dreamer，通过预测不安全状态进行安全强化学习。	reinforcement learning world model dreamer
4	On the Hidden Objective Biases of Group-based Reinforcement Learning	揭示基于群组强化学习的隐藏目标偏差，为未来设计提供指导	reinforcement learning large language model
5	FedKDX: Federated Learning with Negative Knowledge Distillation for Enhanced Healthcare AI Systems	FedKDX：基于负知识蒸馏的联邦学习框架，提升医疗AI系统性能。	contrastive learning distillation	✅
6	TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation	提出TSSR：一种双阶段交换奖励驱动的强化学习方法，用于字符级SMILES生成。	reinforcement learning PPO
7	Safe Continual Reinforcement Learning Methods for Nonstationary Environments. Towards a Survey of the State of the Art	针对非平稳环境，综述安全持续强化学习方法的研究进展。	reinforcement learning
8	DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights	DeepWeightFlow：一种用于生成神经网络权重的重定基流匹配方法	flow matching
9	AgentOCR: Reimagining Agent History via Optical Self-Compression	AgentOCR：通过光学自压缩重构Agent历史，提升token效率	reinforcement learning large language model
10	Improving Semi-Supervised Contrastive Learning via Entropy-Weighted Confidence Integration of Anchor-Positive Pairs	提出基于熵加权置信度集成的半监督对比学习方法，提升低标签数据下的分类精度。	contrastive learning
11	Not All Steps are Informative: On the Linearity of LLMs' RLVR Training	揭示LLM的RLVR训练线性特性，提出权重/Logits外推加速训练。	reinforcement learning large language model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
12	GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models	提出GPU加速的INT8量化方法，用于压缩大语言模型中的KV缓存。	large language model
13	IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation	IGenBench：评估文本生成信息图可靠性的基准测试	large language model multimodal
14	Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward	提出SGVR框架，通过子目标验证奖励提升MLLM几何推理能力	large language model multimodal
15	A Vision for Multisensory Intelligence: Sensing, Synergy, and Science	提出多感官智能研究方向，旨在提升AI对世界的感知、理解与交互能力	multimodal	✅
16	Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers	提出可学习乘子，解除语言模型矩阵层权重的尺度限制，提升模型性能。	large language model
17	Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks?	提出一种轻量级投影模块，将用户和物品嵌入融入LLM以提升推荐性能	large language model

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Robust Reasoning as a Symmetry-Protected Topological Phase	提出Holonomic Network，通过对称保护拓扑相实现对语义噪声的鲁棒推理。	manipulation large language model
19	On the Definition and Detection of Cherry-Picking in Counterfactual Explanations	定义并研究了反事实解释中的“挑选”现象，揭示了检测此类操纵的局限性。	manipulation

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Intraday spatiotemporal PV power prediction at national scale using satellite-based solar forecast models	提出基于卫星的太阳能预测模型，实现国家尺度光伏功率时空预测	optical flow spatiotemporal

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Density Matrix RNN (DM-RNN): A Quantum Information Theoretic Framework for Modeling Musical Context and Polyphony	提出基于密度矩阵RNN的音乐建模框架，捕捉音乐中的不确定性和复调关系	CHOIS

⬅️ 返回 cs.LG 首页 · 🏠 返回主页