cs.LG(2026-03-10)

📊 共 25 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (15 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (15 篇)

#题目一句话要点标签🔗
1 When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic 提出基于OUI的PPO早期结构信号分析方法,加速超参数寻优。 reinforcement learning deep reinforcement learning PPO
2 Good Reasoning Makes Good Demonstrations: Implicit Reasoning Quality Supervision via In-Context Reinforcement Learning 提出In-Context RLVR,通过上下文强化学习提升大语言模型推理质量。 reinforcement learning large language model
3 Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning 提出Reward-Zero,利用语言嵌入驱动强化学习中的隐式奖励机制 reinforcement learning PPO reward shaping
4 Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards 提出DCPO框架,解耦推理与置信度,提升可验证奖励强化学习的校准性能 reinforcement learning large language model
5 A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System 提出一种多原型引导的联邦知识蒸馏方法,用于AI-RAN赋能的多接入边缘计算系统 MAE distillation
6 ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning 提出ActiveUltraFeedback,利用主动学习高效生成偏好数据,提升LLM对齐效率。 reinforcement learning RLHF large language model
7 From Representation to Clusters: A Contrastive Learning Approach for Attributed Hypergraph Clustering 提出CAHC:一种基于对比学习的属性超图聚类端到端方法 representation learning contrastive learning
8 SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space SPAARS:通过抽象探索和精细动作空间利用实现更安全的强化学习策略对齐 reinforcement learning IQL curriculum learning
9 Task Aware Modulation Using Representation Learning for Upsaling of Terrestrial Carbon Fluxes 提出TAM-RL框架,利用表征学习提升陆地碳通量估算的准确性和泛化性 representation learning
10 Towards a Neural Debugger for Python 提出神经调试器,通过模拟调试操作实现对Python代码执行过程的交互式控制。 world model large language model
11 Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation 提出RQRE-OVI算法,通过风险敏感的量化响应均衡提升多智能体强化学习的策略鲁棒性。 reinforcement learning
12 Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL 利用不完美的LLM生成RTL学习网表表示,突破电路表示学习的数据瓶颈。 representation learning large language model
13 PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing 提出基于PPO的混合优化算法,解决RIS辅助的语义车载边缘计算中的低延迟问题 PPO
14 Learning Adaptive LLM Decoding 提出自适应LLM解码方法,通过强化学习动态调整采样策略以提升性能。 reinforcement learning large language model
15 Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference 利用XLA编译器优化,实现Mamba-2在多平台上的高效可移植推理。 Mamba SSM

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
16 SignalMC-MED: A Multimodal Benchmark for Evaluating Biosignal Foundation Models on Single-Lead ECG and PPG SignalMC-MED:用于评估生物信号基础模型的心电与光电容积脉搏波多模态基准 foundation model multimodal
17 GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection GAST:梯度对齐的稀疏调优方法,用于高效微调大语言模型 large language model
18 MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning 提出MSSR:记忆感知自适应重放方法,解决LLM持续微调中的灾难性遗忘问题 large language model
19 Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers 提出VMoER:一种可扩展的贝叶斯框架,用于校准混合专家Transformer的不确定性。 foundation model
20 Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting 利用微调LLM提取主题和事件条件情感,用于铝价预测,提升高波动期预测精度。 large language model
21 FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation FlexServe:一种快速安全的移动端LLM服务系统,具备灵活的资源隔离能力 large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 Impact of Markov Decision Process Design on Sim-to-Real Reinforcement Learning 分析MDP设计对Sim-to-Real强化学习的影响,提升工业过程控制精度 sim-to-real reinforcement learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
23 Flow Field Reconstruction via Voronoi-Enhanced Physics-Informed Neural Networks with End-to-End Sensor Placement Optimization 提出VSOPINN,通过Voronoi增强的物理信息神经网络实现流动场重建与传感器优化。 sparse sensors spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
24 The Radio-Frequency Transformer for Signal Separation 提出基于Transformer的射频信号分离器,显著降低信号误码率。 VQ-VAE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
25 DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data 提出DendroNN:一种用于事件数据高效分类的树突中心神经网络 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页