| 1 |
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models |
提出UE-DPO,通过不确定性引导探索,提升多模态大语言模型视觉对齐能力 |
DPO direct preference optimization large language model |
|
|
| 2 |
Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis |
提出门控多模态学习模型,用于可解释的建筑能效预测和改造方案分析。 |
MAE multimodal |
|
|
| 3 |
To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition |
提出双路径冲突解决框架DCR,用于多模态情感识别中的模态冲突问题。 |
distillation multimodal |
|
|
| 4 |
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation |
提出Power自蒸馏方法,桥接采样、自奖励强化学习和自蒸馏,提升LLM推理能力。 |
reinforcement learning distillation large language model |
|
|
| 5 |
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback |
提出数据依赖探索方法以优化人类反馈的在线强化学习 |
reinforcement learning RLHF large language model |
|
|
| 6 |
Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization |
提出基于偏好的自蒸馏PBSD,提升数学推理和工具使用中的训练稳定性和性能。 |
reinforcement learning preference learning distillation |
|
|
| 7 |
CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies |
CRAFT:用于自动驾驶策略的反事实到交互式强化微调 |
imitation learning distillation vision-language-action |
✅ |
|
| 8 |
Towards General Preference Alignment: Diffusion Models at Nash Equilibrium |
提出Diffusion-NPO,通过博弈论视角提升扩散模型与人类偏好对齐 |
reinforcement learning RLHF DPO |
|
|
| 9 |
Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning |
提出自适应策略选择与微调方法,解决离线到在线强化学习中的交互预算限制问题。 |
reinforcement learning offline RL |
|
|
| 10 |
Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations |
提出基于模仿学习的Vlasov-Poisson方程控制方法,解决核聚变等离子体不稳定性问题 |
imitation learning behavior cloning |
|
|
| 11 |
Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning |
提出Graph-SND以解决多智能体强化学习中的行为多样性问题 |
reinforcement learning PPO |
|
|
| 12 |
Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning |
提出统一框架以研究多臂老虎机与强化学习中的分布性遗憾 |
reinforcement learning |
|
|
| 13 |
The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence |
揭示预测学习中的预测-因果差距:理论证明与大规模神经证据 |
world model world models representation learning |
|
|
| 14 |
Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization |
提出基于结果级优化的强化学习方法,提升组合泛化能力 |
reinforcement learning |
|
|
| 15 |
A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs |
提出基于调和平均的平均奖励强化学习算法,解决SMDPs中非稳态问题 |
reinforcement learning |
|
|
| 16 |
Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs |
提出极性感知的子句-文字超图表示学习框架,用于提升不可满足核心预测。 |
representation learning |
|
|
| 17 |
Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models |
提出Counter-Dyna以解决HVAC控制中的数据效率问题 |
reinforcement learning PPO predictive model |
|
|
| 18 |
Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning |
提出Spline-Pullback Metric (SPM)用于通用微分同胚SPD表示学习,突破刚性几何限制。 |
representation learning |
|
|
| 19 |
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior |
提出流形引导以揭示神经网络表示与行为的共享几何结构 |
world model world models |
|
|
| 20 |
A geometric relation of the error introduced by sampling a language model's output distribution to its internal state |
提出几何关系以解决语言模型输出分布采样误差问题 |
world model world models |
|
|
| 21 |
Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models |
提出非单调三角结构因果模型,实现具身交互中反事实推断的精确性和稳定性。 |
world model world models |
|
|
| 22 |
Extending Differential Temporal Difference Methods for Episodic Problems |
扩展差分时序差分方法至 episodic 问题,提升样本效率 |
reinforcement learning deep reinforcement learning |
|
|