| 1 |
$φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models |
提出$φ$-DPO,解决大型多模态模型持续学习中的公平性问题。 |
DPO direct preference optimization multimodal |
|
|
| 2 |
MetaOthello: A Controlled Study of Multiple World Models in Transformers |
MetaOthello:研究Transformer中多个世界模型的受控实验 |
world model foundation model |
|
|
| 3 |
PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA |
提出PSQE,通过增强伪种子质量提升无监督多模态实体对齐性能。 |
contrastive learning large language model multimodal |
|
|
| 4 |
Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning |
提出难度感知熵正则化方法CEEH,提升LLM推理效率并保持精度。 |
reinforcement learning large language model chain-of-thought |
|
|
| 5 |
Regularized Online RLHF with Generalized Bilinear Preferences |
提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题 |
preference learning RLHF |
|
|
| 6 |
Multilingual Safety Alignment Via Sparse Weight Editing |
提出稀疏权重编辑方法以解决多语言安全对齐问题 |
reinforcement learning RLHF large language model |
|
|
| 7 |
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization |
提出EMPO$^2$,通过混合策略优化和记忆增强提升LLM Agent的探索能力 |
reinforcement learning large language model |
|
|
| 8 |
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks |
提出层级分组策略优化(HGPO)以解决长时程Agent任务中的上下文不一致问题。 |
reinforcement learning large language model |
✅ |
|
| 9 |
Multi-agent imitation learning with function approximation: Linear Markov games and beyond |
针对线性马尔可夫博弈,提出基于函数逼近的多智能体模仿学习方法 |
imitation learning |
|
|
| 10 |
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning |
提出GeoDPO,通过翻译器引导的强化学习提升VLM的几何感知能力 |
reinforcement learning |
✅ |
|
| 11 |
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning |
EvolveGen:利用强化学习生成硬件模型检测算法级基准测试,提升验证效率。 |
reinforcement learning |
|
|
| 12 |
Transformers converge to invariant algorithmic cores |
揭示Transformer不变算法核心:跨训练和尺度共享的低维结构 |
predictive model large language model |
|
|
| 13 |
Autoregressive Visual Decoding from EEG Signals |
提出AVDE:一种轻量高效的自回归模型,用于脑电信号到视觉信息的解码。 |
contrastive learning VQ-VAE |
|
|
| 14 |
Prediction of Diffusion Coefficients in Mixtures with Tensor Completion |
提出混合张量补全方法,结合贝叶斯框架和主动学习,提升混合物扩散系数预测精度。 |
predictive model PULSE |
|
|
| 15 |
Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability |
提出残差Koopman谱分析(RKSP)以预测和预防Transformer训练不稳定 |
Mamba SSM |
|
|
| 16 |
Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks |
通过激活子空间瓶颈解释和引导状态空间模型 |
Mamba SSM |
|
|
| 17 |
Component Centric Placement Using Deep Reinforcement Learning |
提出基于深度强化学习的元件中心PCB布局方法,优化元件布局。 |
reinforcement learning deep reinforcement learning |
|
|
| 18 |
Regularized Online RLHF with Generalized Bilinear Preferences |
提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题 |
preference learning RLHF |
|
|
| 19 |
Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning |
提出人类监督信息瓶颈理论,解释并缓解人机对齐中的误差上限问题 |
reinforcement learning large language model |
|
|