| 24 |
PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis |
PanoWorld:用于生成一致全屋全景图的生成式空间世界模型 |
world model world models 3D gaussian splatting |
✅ |
|
| 25 |
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation |
Vision-OPD:通过On-Policy自蒸馏提升多模态LLM的细粒度视觉理解能力 |
distillation large language model multimodal |
|
|
| 26 |
CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook |
CodeBind:通过解耦表示学习和统一组合码本实现多模态对齐 |
representation learning large language model multimodal |
|
|
| 27 |
The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting |
提出MixCount数据集以解决混合物体计数问题 |
MAE open-vocabulary open vocabulary |
|
|
| 28 |
Vision Foundation Models as Generalist Tokenizers for Image Generation |
提出VFMTok,一种基于视觉基础模型的通用图像Tokenizer,显著提升图像生成质量和效率。 |
contrastive learning classifier-free guidance foundation model |
|
|
| 29 |
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models |
Incantation:提出自然语言作为多实体视频世界模型的动作接口 |
world model world models distillation |
✅ |
|
| 30 |
Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving |
小米提出JWM,融合重建与生成的世界模型,用于自动驾驶。 |
world model world models distillation |
|
|
| 31 |
LatentUMM: Dual Latent Alignment for Unified Multimodal Models |
提出LatentUMM,通过双重潜在空间对齐提升统一多模态模型的跨模态一致性。 |
latent dynamics multimodal |
✅ |
|
| 32 |
Semi-LAR: Semi-supervised Contrastive Learning with Linear Attention for Removal of Nighttime Flares |
提出Semi-LAR半监督对比学习框架,有效去除夜间图像的镜头光晕 |
linear attention contrastive learning |
|
|
| 33 |
LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift |
LESSViT:一种鲁棒的高光谱表征学习方法,解决光谱配置偏移问题 |
representation learning masked autoencoder HSI |
|
|
| 34 |
Leveraging Latent Visual Reasoning in Silence |
提出基于注意力奖励的隐式视觉推理方法,提升多模态任务性能。 |
reinforcement learning multimodal visual grounding |
✅ |
|
| 35 |
HexagonalWarriorMamba: Superior Threshold-Dependent Multi-label Classification of 12-Lead ECG Cardiac Abnormalities |
提出HexagonalWarriorMamba模型,提升12导联心电图多标签心脏异常分类性能 |
Mamba spatial relationship |
|
|
| 36 |
Patch-MoE Mamba: A Patch-Ordered Mixture-of-Experts State Space Architecture for Medical Image Segmentation |
提出Patch-MoE Mamba,用于提升医学图像分割性能。 |
Mamba state space model |
|
|
| 37 |
WavFlow: Audio Generation in Waveform Space |
WavFlow:提出一种直接在波形空间生成音频的框架,无需中间表示。 |
flow matching multimodal |
|
|
| 38 |
TIGER-FG: Text-Guided Implicit Fine-Grained Grounding for E-commerce Retrieval |
提出TIGER-FG框架,利用文本引导的隐式细粒度 grounding 解决电商检索中的模态和粒度差异问题。 |
distillation multimodal |
|
|
| 39 |
WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens |
WinTok:解耦视觉理解与生成,实现双赢的混合型视觉Token化器 |
distillation foundation model |
✅ |
|
| 40 |
SAS: Semantic-aware Sampling for Generative Dataset Distillation |
SAS:利用语义感知采样进行生成式数据集精馏,提升精馏数据集的语义信息。 |
distillation |
|
|
| 41 |
MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation |
提出MoASE++,通过混合激活稀疏专家和领域自适应策略蒸馏,解决持续测试时自适应问题。 |
distillation |
|
|
| 42 |
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training |
SafeDiffusion-R1:提出在线奖励引导的安全扩散模型后训练方法,无需监督数据。 |
reinforcement learning offline reinforcement learning |
✅ |
|