| 1 |
IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal Hallucination |
IRIS:利用隐式奖励引导内部筛选,缓解多模态大语言模型的幻觉问题 |
DPO direct preference optimization large language model |
|
|
| 2 |
Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation |
提出基于知识蒸馏的高效LLM不确定性估计方法,降低幻觉并提升安全性。 |
distillation large language model |
|
|
| 3 |
T-LLM: Teaching Large Language Models to Forecast Time Series via Temporal Distillation |
T-LLM:通过时序蒸馏教导大语言模型进行时间序列预测 |
distillation large language model |
|
|
| 4 |
A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning |
提出相对预算理论以优化大语言模型的强化学习效果 |
reinforcement learning large language model |
|
|
| 5 |
Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment |
提出语义感知的Wasserstein策略正则化方法,提升大语言模型对齐效果 |
reinforcement learning RLHF large language model |
✅ |
|
| 6 |
From Perception to Action: Spatial AI Agents and World Models |
构建空间智能Agent:提出统一框架连接Agent能力与空间任务,解决物理世界感知与行动难题。 |
world model large language model symbolic grounding |
|
|
| 7 |
FORLER: Federated Offline Reinforcement Learning with Q-Ensemble and Actor Rectification |
提出FORLER,解决低质量异构数据下的联邦离线强化学习策略污染问题 |
reinforcement learning offline RL offline reinforcement learning |
|
|
| 8 |
SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization |
SLIME:稳定似然隐式边际强化,用于偏好优化,解决LLM对齐中的“遗忘”和“格式崩溃”问题。 |
reinforcement learning preference learning RLHF |
|
|
| 9 |
DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations |
DCoPilot:利用生成式AI进行动态数据中心运营的策略自适应 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 10 |
David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning |
提出Slingshot框架,通过强化学习实现对Agent的零样本越狱攻击。 |
reinforcement learning large language model |
|
|
| 11 |
State Rank Dynamics in Linear Attention LLMs |
揭示线性注意力LLM状态秩动态特性,提出联合秩-范数剪枝优化KV缓存。 |
linear attention large language model |
|
|
| 12 |
ECHO-2: A Large Scale Distributed Rollout Framework for Cost-efficient Reinforcement Learning |
ECHO-2:一种大规模分布式Rollout框架,用于高性价比的强化学习 |
reinforcement learning large language model |
|
|
| 13 |
VLM-Guided Experience Replay |
利用VLM引导经验回放,提升强化学习样本效率与性能 |
reinforcement learning large language model multimodal |
|
|
| 14 |
Beyond Mode Elicitation: Diversity-Preserving Reinforcement Learning via Latent Diffusion Reasoner |
提出LaDi-RL,通过潜在扩散推理增强强化学习,解决LLM推理中多样性崩溃问题 |
reinforcement learning chain-of-thought |
|
|
| 15 |
ASGMamba: Adaptive Spectral Gating Mamba for Multivariate Time Series Forecasting |
提出ASGMamba,通过自适应频谱门控Mamba实现高效多元时间序列预测 |
Mamba SSM state space model |
✅ |
|
| 16 |
Expanding the Capabilities of Reinforcement Learning via Text Feedback |
通过文本反馈扩展强化学习能力以解决信息稀缺问题 |
reinforcement learning distillation |
|
|
| 17 |
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models |
CurioSFT:通过自适应自蒸馏保持熵的监督微调,提升大型推理模型的探索能力 |
reinforcement learning distillation |
|
|
| 18 |
DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics |
DIA-CLIP:用于零样本DIA蛋白质组学的通用表征学习框架 |
representation learning contrastive learning |
|
|
| 19 |
A Provable Expressiveness Hierarchy in Hybrid Linear-Full Attention |
证明混合线性-全注意力机制表达能力存在层级差异 |
Mamba linear attention large language model |
|
|
| 20 |
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards |
提出VIP算法,通过自适应rollout分配提升可验证奖励在线强化学习的采样效率 |
reinforcement learning VIP |
✅ |
|
| 21 |
Segment to Focus: Guiding Latent Action Models in the Presence of Distractors |
MaskLAM:通过视觉分割引导潜在动作模型,解决背景干扰问题 |
reinforcement learning foundation model |
|
|
| 22 |
An Empirical Study of World Model Quantization |
针对DINO-WM世界模型,系统性研究了后训练量化对视觉规划任务的影响。 |
world model |
✅ |
|
| 23 |
Generative Visual Code Mobile World Models |
提出基于可渲染代码生成的移动GUI世界模型gWorld,提升移动GUI代理性能。 |
world model |
|
|
| 24 |
Active Causal Experimentalist (ACE): Learning Intervention Strategies via Direct Preference Optimization |
提出ACE,通过直接偏好优化学习因果干预策略,提升实验设计效率。 |
direct preference optimization |
|
|
| 25 |
Masked Autoencoders as Universal Speech Enhancer |
提出基于掩码自编码器的通用语音增强器,实现自监督学习和多场景适应。 |
masked autoencoder |
|
|
| 26 |
Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning |
提出DAIL方法,利用少量专家解提升LLM推理能力并提高效率。 |
imitation learning large language model |
|
|
| 27 |
Self-Supervised Learning from Structural Invariance |
提出AdaSSL,通过结构不变性进行自监督学习,解决一对多映射问题。 |
world model representation learning distillation |
|
|
| 28 |
STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs |
提出STILL框架以高效线性化大型语言模型 |
linear attention large language model |
|
|
| 29 |
ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning |
提出ECHO算法,解决测试时强化学习中rollout崩塌和伪标签偏差问题 |
reinforcement learning |
|
|
| 30 |
Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning |
提出多任务强化学习性能保证方法,为安全关键应用提供高置信度保障。 |
reinforcement learning |
|
|
| 31 |
Dissecting Outlier Dynamics in LLM NVFP4 Pretraining |
针对LLM NVFP4预训练中的异常值问题,提出热通道补偿(HCP)和CHON训练方案。 |
linear attention large language model |
|
|
| 32 |
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning |
提出基于Transformer强化学习的时间序列A/B测试设计方法,优化策略评估。 |
reinforcement learning |
|
|
| 33 |
Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It |
提出动态学习率调度方法,解决LLM强化学习训练中的训练-推理不匹配问题 |
reinforcement learning large language model |
|
|
| 34 |
Softmax Linear Attention: Reclaiming Global Competition |
提出Softmax线性注意力以解决全局竞争不足问题 |
linear attention |
|
|
| 35 |
Choice-Model-Assisted Q-learning for Delayed-Feedback Revenue Management |
提出选择模型辅助Q学习以解决延迟反馈的收益管理问题 |
reinforcement learning world model |
|
|
| 36 |
Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting |
提出Agentic时间序列预测,将模型中心范式转变为智能体驱动的工作流 |
reinforcement learning predictive model |
|
|