| 1 |
Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials |
Zatom-1:用于3D分子和材料的多模态流动基础模型,统一生成与预测任务。 |
flow matching foundation model multimodal |
|
|
| 2 |
Understanding protein function with a multimodal retrieval-augmented foundation model |
提出PoET-2:一种多模态检索增强蛋白质基础模型,用于提升蛋白质功能理解。 |
representation learning foundation model multimodal |
|
|
| 3 |
$ϕ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models |
提出$ϕ$-DPO框架以解决大规模多模态模型中的公平性问题 |
DPO direct preference optimization multimodal |
|
|
| 4 |
Reinforcement-aware Knowledge Distillation for LLM Reasoning |
提出RLAD:一种强化学习感知的知识蒸馏方法,用于提升LLM推理能力。 |
reinforcement learning PPO teacher-student |
|
|
| 5 |
Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing |
Decision MetaMamba:异构序列混合增强离线强化学习中的选择性SSM |
offline RL Mamba SSM |
|
|
| 6 |
Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory |
利用随机矩阵理论进行谱分析,提升大语言模型的可靠性和效率 |
distillation large language model |
|
|
| 7 |
Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection |
提出基于对抗逆强化学习的机械故障检测方法,无需故障标签。 |
reinforcement learning inverse reinforcement learning |
|
|
| 8 |
Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning |
提出难度感知熵正则化方法,提升LLM推理效率并保持精度。 |
reinforcement learning large language model chain-of-thought |
|
|
| 9 |
MetaOthello: A Controlled Study of Multiple World Models in Transformers |
MetaOthello:研究Transformer中多个世界模型的受控实验平台 |
world model foundation model |
|
|
| 10 |
PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA |
提出PSQE,通过增强伪种子质量提升无监督多模态实体对齐性能 |
contrastive learning large language model multimodal |
|
|
| 11 |
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits |
提出针对离线策略学习的$f$-散度正则化分析方法 |
reinforcement learning policy learning offline reinforcement learning |
|
|
| 12 |
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs |
提出UniQL框架以解决边缘设备上大语言模型的资源限制问题 |
Mamba SSM state space model |
|
|
| 13 |
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation |
提出广义On-Policy蒸馏框架G-OPD,通过奖励外推提升学生模型性能,甚至超越教师模型。 |
reinforcement learning teacher-student distillation |
|
|
| 14 |
Multilingual Safety Alignment Via Sparse Weight Editing |
提出基于稀疏权重编辑的多语言安全对齐方法,解决低资源语言安全防护不足问题。 |
reinforcement learning RLHF large language model |
|
|
| 15 |
Regularized Online RLHF with Generalized Bilinear Preferences |
提出广义双线性偏好模型以解决在线RLHF中的纳什均衡问题 |
preference learning RLHF |
|
|
| 16 |
Soft Sequence Policy Optimization |
提出软序列策略优化(SSPO),提升LLM在数学推理任务中的训练稳定性和性能。 |
reinforcement learning PPO large language model |
|
|
| 17 |
Entropy-Controlled Flow Matching |
提出熵控制流匹配方法以解决信息几何问题 |
flow matching |
|
|
| 18 |
Code World Models for Parameter Control in Evolutionary Algorithms |
利用LLM构建代码世界模型,实现进化算法参数自适应控制 |
world model |
|
|
| 19 |
When Should a Model Change Its Mind? An Energy-Based Theory and Regularizer for Concept Drift in Electrocardiogram (ECG) Signals |
提出基于能量的生理信号概念漂移理论与正则化方法,提升心电信号模型稳定性。 |
representation learning multimodal |
|
|
| 20 |
UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs |
UpSkill:基于互信息技能学习提升LLM在结构化响应中的多样性 |
reinforcement learning large language model |
|
|
| 21 |
Transformers converge to invariant algorithmic cores |
揭示Transformer不变算法核心:跨训练和尺度共享的低维结构 |
predictive model large language model |
|
|
| 22 |
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning |
提出GeoDPO,通过翻译器引导的强化学习提升视觉语言模型中的几何感知能力 |
reinforcement learning |
|
|
| 23 |
Multi-agent imitation learning with function approximation: Linear Markov games and beyond |
针对线性马尔可夫博弈,提出基于函数逼近的多智能体模仿学习方法 |
imitation learning |
|
|
| 24 |
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks |
提出层级群组策略优化(HGPO)以解决长时程Agent任务中的上下文不一致问题。 |
reinforcement learning large language model |
|
|
| 25 |
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization |
提出EMPO$^2$,结合记忆与混合策略优化,提升LLM Agent探索能力与泛化性 |
reinforcement learning large language model |
|
|
| 26 |
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning |
EvolveGen:基于强化学习的硬件模型检测算法级基准生成框架 |
reinforcement learning |
|
|
| 27 |
RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors? |
RL-Obfuscation:利用强化学习使语言模型逃避潜在空间监控 |
reinforcement learning large language model |
|
|
| 28 |
Fast and Flexible Probabilistic Forecasting of Dynamical Systems using Flow Matching and Physical Perturbation |
提出基于流匹配与物理扰动的动态系统快速概率预测方法 |
flow matching |
|
|
| 29 |
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression |
单位置回归任务揭示Softmax注意力机制的统计优势 |
linear attention large language model |
|
|
| 30 |
Simplex-to-Euclidean Bijections for Categorical Flow Matching |
提出基于单纯形-欧几里得空间双射的分类流匹配方法,用于学习单纯形上的概率分布。 |
flow matching |
|
|
| 31 |
On the Equivalence of Random Network Distillation, Deep Ensembles, and Bayesian Inference |
揭示随机网络蒸馏、深度集成和贝叶斯推断的等价性,用于高效不确定性量化。 |
distillation |
|
|
| 32 |
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression |
AngelSlim:腾讯混元团队推出的大模型压缩工具包,提升效率与可及性 |
distillation multimodal |
|
|
| 33 |
On the Interpolation Error of Nonlinear Attention versus Linear Regression |
研究表明非线性Attention在高维情形下插值误差通常大于线性回归,但结构化信号可缩小差距。 |
linear attention |
|
|
| 34 |
One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow |
提出基于自蒸馏和确定性流的单步扩散采样器,加速采样并稳定证据估计。 |
distillation |
|
|
| 35 |
Autoregressive Visual Decoding from EEG Signals |
提出AVDE:一种轻量高效的脑电信号自回归视觉解码框架,用于脑机接口应用。 |
contrastive learning VQ-VAE |
|
|
| 36 |
Prediction of Diffusion Coefficients in Mixtures with Tensor Completion |
提出混合张量补全方法,结合贝叶斯框架与主动学习,提升混合物扩散系数预测精度。 |
predictive model PULSE |
|
|
| 37 |
Aligning Few-Step Diffusion Models with Dense Reward Difference Learning |
提出SDPO,通过密集奖励差异学习对齐少步扩散模型与下游目标 |
reinforcement learning diffusion policy |
|
|
| 38 |
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks |
WebGym:构建大规模真实Web环境,提升视觉Web代理任务性能 |
reinforcement learning policy learning |
|
|
| 39 |
Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks |
通过激活子空间瓶颈提升状态空间模型的可解释性与可操控性 |
Mamba SSM |
|
|
| 40 |
Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability |
提出残差Koopman谱分析方法,用于预测和预防Transformer训练中的不稳定性 |
Mamba SSM |
|
|