| 31 |
Structured Agent Distillation for Large Language Model |
提出结构化Agent蒸馏方法,压缩LLM Agent并保持推理和行动一致性 |
imitation learning distillation large language model |
|
|
| 32 |
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining |
提出MBPO,通过对抗负样本挖掘和模态平衡优化解决大模型中的模态不平衡问题 |
preference learning large language model multimodal |
|
|
| 33 |
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models |
InfiFPO:通过偏好优化实现大语言模型中的隐式模型融合 |
DPO direct preference optimization large language model |
|
|
| 34 |
FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning |
FlowQ:基于能量引导流策略的离线强化学习算法 |
reinforcement learning offline reinforcement learning flow matching |
|
|
| 35 |
Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions |
CHARM:一种结合通道描述的时间序列基础嵌入模型,实现卓越的表征学习。 |
representation learning foundation model |
|
|
| 36 |
Energy-Efficient Deep Reinforcement Learning with Spiking Transformers |
提出基于脉冲Transformer的强化学习算法,实现能量高效的复杂决策。 |
reinforcement learning deep reinforcement learning |
|
|
| 37 |
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum |
提出AAPO算法,利用优势动量提升LLM在数学推理中的能力 |
reinforcement learning PPO large language model |
|
|
| 38 |
Imitation Learning via Focused Satisficing |
提出基于专注性Satisficing的模仿学习方法,提升示范轨迹质量。 |
reinforcement learning deep reinforcement learning imitation learning |
|
|
| 39 |
The Evolution of Alpha in Finance Harnessing Human Insight and LLM Agents |
提出基于LLM金融Agent的Alpha策略演进框架,提升投资决策智能化水平 |
representation learning large language model multimodal |
|
|
| 40 |
Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks |
提出基于Kolmogorov-Arnold Networks的可解释强化学习负载均衡方法 |
reinforcement learning PPO |
|
|
| 41 |
Preference Learning with Lie Detectors can Induce Honesty or Evasion |
利用测谎器进行偏好学习可能诱导诚实或规避行为 |
preference learning DPO |
|
|
| 42 |
Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation |
提出一种高效的连续时间强化学习算法以解决样本和计算效率问题 |
reinforcement learning |
|
|
| 43 |
Text embedding models can be great data engineers |
ADEPT:利用文本嵌入自动构建数据工程流水线,提升预测模型性能。 |
predictive model TAMP |
|
|
| 44 |
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning |
TinyV:通过减少验证中的假阴性来改进LLM推理的强化学习 |
reinforcement learning large language model |
✅ |
|
| 45 |
Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning |
提出基于深度Q网络的认知无线电能量收集优化方案,提升次级用户数据速率 |
reinforcement learning |
|
|
| 46 |
KIPPO: Koopman-Inspired Proximal Policy Optimization |
提出KIPPO,利用Koopman理论提升PPO在复杂控制任务中的性能与稳定性。 |
reinforcement learning policy learning PPO |
|
|
| 47 |
Bellman operator convergence enhancements in reinforcement learning algorithms |
通过改进贝尔曼算子,提升强化学习算法的收敛性和性能 |
reinforcement learning |
|
|
| 48 |
Personalised Insulin Adjustment with Reinforcement Learning: An In-Silico Validation for People with Diabetes on Intensive Insulin Treatment |
提出基于强化学习的个性化胰岛素调整方案ABBA,优化糖尿病患者血糖控制。 |
reinforcement learning |
|
|
| 49 |
FlowTSE: Target Speaker Extraction with Flow Matching |
FlowTSE:基于流匹配的说话人提取方法,简化流程并提升性能 |
flow matching |
|
|
| 50 |
Self Distillation via Iterative Constructive Perturbations |
提出迭代构造扰动自蒸馏框架,提升深度神经网络的泛化性能。 |
distillation |
|
|
| 51 |
From Reasoning to Code: GRPO Optimization for Underrepresented Languages |
提出GRPO优化方法,提升LLM在低资源语言上的代码生成能力 |
reinforcement learning large language model |
|
|
| 52 |
Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry |
提出DiffeoCFM,通过拉回几何实现脑连接矩阵的黎曼流匹配生成模型。 |
flow matching |
✅ |
|
| 53 |
When to retrain a machine learning model |
提出基于不确定性的模型重训练方法,应对数据漂移下的性能退化问题 |
reinforcement learning offline reinforcement learning |
|
|