| 1 |
RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models |
RaTA-Tool:基于检索的多模态大语言模型工具选择框架 |
DPO direct preference optimization large language model |
|
|
| 2 |
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet |
HAMSA:通过SpectralPulseNet实现无扫描的视觉状态空间模型 |
Mamba SSM state space model |
|
|
| 3 |
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts |
提出DETR-ViP,通过增强视觉提示的判别性,提升开放词汇目标检测性能 |
contrastive learning VIP distillation |
|
|
| 4 |
Beyond Independent Frames: Latent Attention Masked Autoencoders for Multi-View Echocardiography |
提出LAMAE,利用潜在注意力机制的掩码自编码器处理多视角超声心动图,提升心脏表征学习。 |
masked autoencoder MAE spatiotemporal |
|
|
| 5 |
Integrating Object Detection, LiDAR-Enhanced Depth Estimation, and Segmentation Models for Railway Environments |
提出铁路环境障碍物检测框架,融合目标检测、LiDAR增强深度估计和分割模型 |
MAE depth estimation monocular depth |
|
|
| 6 |
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework |
RAD-2:一种生成器-判别器框架下的强化学习方法,提升自动驾驶运动规划的稳定性和安全性。 |
reinforcement learning imitation learning multimodal |
|
|
| 7 |
Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models |
提出Visual-Switch知识蒸馏框架,解决视觉语言模型多模态知识对齐问题。 |
distillation multimodal |
|
|
| 8 |
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories |
LeapAlign:通过构建两步轨迹对Flow Matching模型进行任意生成步骤的后训练对齐。 |
flow matching |
|
|
| 9 |
TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation |
TurboTalk:用于一步式音频驱动说话人头像生成的渐进式蒸馏框架 |
distillation |
|
|