| 1 |
A Survey of Circuit Foundation Model: Foundation AI Models for VLSI Circuit Design and EDA |
综述电路基础模型:用于VLSI电路设计和EDA的基础AI模型 |
representation learning large language model foundation model |
|
|
| 2 |
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback |
针对RLHF中数据瓶颈,提出混合奖励与Prompt选择方法,提升模型性能与多样性 |
reinforcement learning PPO RLHF |
|
|
| 3 |
Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning |
Arch-LLM:利用无监督离散表示学习,驯服LLM以生成神经架构 |
representation learning VQ-VAE large language model |
|
|
| 4 |
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models |
Quamba2:一种稳健且可扩展的后训练量化框架,用于选择性状态空间模型 |
Mamba SSM state space model |
✅ |
|
| 5 |
RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack |
提出RLDBF方法,利用数据库反馈强化学习提升LLM在化学分子科学中的性能。 |
reinforcement learning large language model |
|
|
| 6 |
Efficient Verified Machine Unlearning For Distillation |
提出PURGE框架,加速知识蒸馏场景下的高效可验证机器卸载 |
teacher-student distillation |
|
|
| 7 |
Probabilistic Uncertain Reward Model |
提出概率不确定奖励模型(PURM),解决RLHF中奖励模型过度自信问题 |
reinforcement learning RLHF large language model |
|
|
| 8 |
Generative Latent Neural PDE Solver using Flow Matching |
提出基于Flow Matching的生成式隐空间神经PDE求解器,提升精度和长期稳定性。 |
flow matching |
|
|
| 9 |
Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments |
提出基于多臂老虎机的强化学习方法,用于自动化机器学习模型部署与管理。 |
reinforcement learning |
|
|
| 10 |
Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games |
针对均值-方差团队随机博弈,提出基于策略优化的多智能体强化学习算法 |
reinforcement learning |
|
|
| 11 |
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning |
提出EGSW,通过熵引导序列加权提升RL微调LLM的探索效率 |
reinforcement learning large language model |
|
|
| 12 |
Invariant Control Strategies for Active Flow Control using Graph Neural Networks |
提出基于图神经网络的流体主动控制策略,提升泛化性并降低计算成本。 |
reinforcement learning spatial relationship |
|
|
| 13 |
Fuzzy Cluster-Aware Contrastive Clustering for Time Series |
提出模糊聚类感知的对比聚类框架FCACC,用于提升时间序列的无监督聚类效果 |
representation learning contrastive learning |
|
|