| 1 |
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach |
CoMET:一种无需微调的模块化多模态分类方法,通过组合预训练模型实现 |
representation learning foundation model multimodal |
|
|
| 2 |
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation |
CAdam:上下文自适应矩估计,用于生成蒸馏中3D高斯快速优化 |
distillation 3D gaussian splatting 3DGS |
|
|
| 3 |
PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment |
PREFINE:基于偏好的隐式奖励与代价微调,实现安全对齐 |
reinforcement learning offline RL imitation learning |
|
|
| 4 |
Behavior-Consistent Deep Reinforcement Learning |
提出QED算法,通过控制策略分布一致性提升强化学习的可靠性 |
reinforcement learning deep reinforcement learning |
|
|
| 5 |
Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards |
提出NFPO算法,通过多步似然比校正提升RLVR中语言模型的推理能力 |
reinforcement learning PPO large language model |
|
|
| 6 |
Distributed Direct Preference Optimization |
提出分布式DPO算法,解决异构用户偏好数据下的策略对齐问题 |
reinforcement learning offline RL DPO |
|
|
| 7 |
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards |
提出DelTA以解决响应级奖励与token级概率变化不明的问题 |
reinforcement learning large language model |
|
|
| 8 |
Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards |
提出一种领域自适应强化学习框架,通过密集奖励提升代码生成质量,尤其在机器人领域。 |
reinforcement learning large language model |
|
|
| 9 |
Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning |
FISolver:利用反向生成数据和引导强化学习发现动力系统首次积分 |
reinforcement learning large language model |
|
|
| 10 |
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression |
提出Distribution-Aware Reward,用于提升LLM回归任务中预测分布的质量。 |
reinforcement learning large language model |
|
|
| 11 |
TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health |
TimeSRL:通过语义强化学习微调LLM,实现可泛化的时间序列行为建模,应用于精神健康领域。 |
reinforcement learning MAE large language model |
|
|
| 12 |
\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent |
提出随机均流策略(SMFP),通过单步生成控制解决强化学习中的多模态动作分布问题。 |
reinforcement learning SAC multimodal |
|
|
| 13 |
PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG |
PACD-Net:基于伪增强对比蒸馏的血糖控制指标估计方法 |
contrastive learning distillation |
|
|
| 14 |
AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals |
AVSD:通过平衡共识和教师特定特权信号实现自适应视角自蒸馏 |
distillation privileged information |
|
|
| 15 |
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories |
提出RELEX以高效外推RLVR训练结果 |
reinforcement learning large language model |
✅ |
|
| 16 |
Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search |
FedKDNAS:结合分布式NAS与知识蒸馏的优化联邦学习框架 |
distillation |
|
|
| 17 |
How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR |
G2D:通过适度在线RL预热提升离线偏好优化,降低计算成本 |
reinforcement learning DPO direct preference optimization |
|
|
| 18 |
Efficient Learning of Deep State Space Models via Importance Smoothing |
提出并行变分蒙特卡洛(PVMC)方法,高效训练深度状态空间模型(DSSM) |
state space model |
|
|
| 19 |
A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation |
统一因果与传统表示学习框架,实现优势互补与性能提升 |
representation learning |
|
|
| 20 |
PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR |
PlexRL:面向RLVR的LLM服务集群级编排,提升资源利用率 |
reinforcement learning large language model |
|
|
| 21 |
REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak |
REFLECTOR:通过内化逐步反思机制,防御针对大型语言模型的间接越狱攻击 |
reinforcement learning large language model |
|
|
| 22 |
Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines |
提出FPRO框架以解决航空发动机管道布置的可制造性问题 |
reinforcement learning |
|
|
| 23 |
Time-Dependent PDE-Constrained Optimization via Weak-Form Latent Dynamics |
提出基于弱形式潜在动力学的PDE约束优化方法,加速高维时变偏微分方程优化。 |
latent dynamics |
|
|
| 24 |
ReversedQ: Opportunities for Faster Q-Learning in Episodic Online Reinforcement Learning |
ReversedQ:通过优化Q学习更新策略加速在线强化学习 |
reinforcement learning |
|
|
| 25 |
Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting |
提出PG-DPO,通过庞特里亚金最大值原理解决非指数贴现强化学习问题。 |
reinforcement learning DPO |
|
|
| 26 |
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback |
提出AGPO以解决PPO/GRPO训练不稳定问题 |
reinforcement learning PPO |
✅ |
|