| 1 |
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis |
提出SeeUnsafe框架,利用多模态大语言模型进行视频交通安全分析,实现交互式事故分析。 |
large language model multimodal visual grounding |
✅ |
|
| 2 |
FaceXBench: Evaluating Multimodal LLMs on Face Understanding |
提出FaceXBench以评估多模态大语言模型的面部理解能力 |
large language model multimodal chain-of-thought |
✅ |
|
| 3 |
Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks |
提出一种基于基础模型和图神经网络的少样本机械部件结构感知分割方法 |
foundation model |
|
|
| 4 |
FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization |
FiLo++:融合细粒度描述与可变形定位的零/少样本异常检测 |
large language model foundation model multimodal |
✅ |
|
| 5 |
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis |
提出FLORA,利用形式语言模型实现鲁棒的无训练零样本对象指代表达式理解 |
large language model visual grounding |
|
|
| 6 |
HiMix: Reducing Computational Complexity in Large Vision-Language Models |
HiMix:通过分层视觉注入混合注意力机制降低大型视觉语言模型的计算复杂度 |
large language model |
✅ |
|
| 7 |
Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions |
提出MIAVLM,利用多视角图像和负指令缓解LVLM在物体属性上的幻觉问题 |
large language model |
|
|