cs.LG（2026-03-10）

📊 共 25 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (15 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱一：机器人控制 (Robot Control) (1) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
1	When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic	提出基于OUI的PPO早期结构信号分析方法，加速超参数寻优。	reinforcement learning deep reinforcement learning PPO
2	Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning	提出In-Context RLVR，通过上下文强化学习提升大语言模型推理质量。	reinforcement learning large language model
3	Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning	提出Reward-Zero，利用语言嵌入驱动强化学习中的隐式奖励机制	reinforcement learning PPO reward shaping
4	Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards	提出DCPO框架，解耦推理与置信度，提升可验证奖励强化学习的校准性能	reinforcement learning large language model
5	A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System	提出一种多原型引导的联邦知识蒸馏方法，用于AI-RAN赋能的多接入边缘计算系统	MAE distillation
6	ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning	提出ActiveUltraFeedback，利用主动学习高效生成偏好数据，提升LLM对齐效率。	reinforcement learning RLHF large language model	✅
7	From Representation to Clusters: A Contrastive Learning Approach for Attributed Hypergraph Clustering	提出CAHC：一种基于对比学习的属性超图聚类端到端方法	representation learning contrastive learning
8	SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space	SPAARS：通过抽象探索和精细动作空间利用实现更安全的强化学习策略对齐	reinforcement learning IQL curriculum learning
9	Task Aware Modulation Using Representation Learning for Upsaling of Terrestrial Carbon Fluxes	提出TAM-RL框架，利用表征学习提升陆地碳通量估算的准确性和泛化性	representation learning
10	Towards a Neural Debugger for Python	提出神经调试器，通过模拟调试操作实现对Python代码执行过程的交互式控制。	world model large language model
11	Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation	提出RQRE-OVI算法，通过风险敏感的量化响应均衡提升多智能体强化学习的策略鲁棒性。	reinforcement learning
12	Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL	利用不完美的LLM生成RTL学习网表表示，突破电路表示学习的数据瓶颈。	representation learning large language model
13	PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing	提出基于PPO的混合优化算法，解决RIS辅助的语义车载边缘计算中的低延迟问题	PPO
14	Learning Adaptive LLM Decoding	提出自适应LLM解码方法，通过强化学习动态调整采样策略以提升性能。	reinforcement learning large language model
15	Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference	利用XLA编译器优化，实现Mamba-2在多平台上的高效可移植推理。	Mamba SSM	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
16	SignalMC-MED: A Multimodal Benchmark for Evaluating Biosignal Foundation Models on Single-Lead ECG and PPG	SignalMC-MED：用于评估生物信号基础模型的心电与光电容积脉搏波多模态基准	foundation model multimodal
17	GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection	GAST：梯度对齐的稀疏调优方法，用于高效微调大语言模型	large language model
18	MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning	提出MSSR：记忆感知自适应重放方法，解决LLM持续微调中的灾难性遗忘问题	large language model
19	Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers	提出VMoER：一种可扩展的贝叶斯框架，用于校准混合专家Transformer的不确定性。	foundation model
20	Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting	利用微调LLM提取主题和事件条件情感，用于铝价预测，提升高波动期预测精度。	large language model
21	FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation	FlexServe：一种快速安全的移动端LLM服务系统，具备灵活的资源隔离能力	large language model

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning	分析MDP设计对Sim-to-Real强化学习的影响，提升工业过程控制精度	sim-to-real reinforcement learning

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Flow Field Reconstruction via Voronoi-Enhanced Physics-Informed Neural Networks with End-to-End Sensor Placement Optimization	提出VSOPINN，通过Voronoi增强的物理信息神经网络实现流动场重建与传感器优化。	sparse sensors spatiotemporal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	The Radio-Frequency Transformer for Signal Separation	提出基于Transformer的射频信号分离器，显著降低信号误码率。	VQ-VAE

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data	提出DendroNN：一种用于事件数据高效分类的树突中心神经网络	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页