| 1 |
Reference-Guided Diffusion Inpainting For Multimodal Counterfactual Generation |
提出MObI和AnydoorMed,实现参考图像引导的多模态扩散模型图像修复与生成。 |
foundation model multimodal |
|
|
| 2 |
A Large Language Model Powered Integrated Circuit Footprint Geometry Understanding |
提出LLM4-IC8K框架,利用大语言模型解决集成电路封装几何尺寸理解难题。 |
large language model multimodal |
|
|
| 3 |
Zero-Shot Image Anomaly Detection Using Generative Foundation Models |
利用生成式预训练模型实现零样本图像异常检测 |
foundation model |
|
|
| 4 |
Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards |
提出U3-Attack,一种通用的、输入无关的多模态对抗攻击,用于绕过文本到图像模型的安全防护。 |
multimodal |
|
|
| 5 |
Gems: Group Emotion Profiling Through Multimodal Situational Understanding |
GEMS:通过多模态情境理解进行群体情绪分析 |
multimodal |
✅ |
|
| 6 |
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception |
DeltaVLM:通过指令引导的差异感知实现交互式遥感图像变化分析 |
large language model multimodal instruction following |
✅ |
|
| 7 |
What is Beneath Misogyny: Misogynous Memes Classification and Explanation |
提出MM-Misogyny模型,用于检测、分类和解释网络仇恨女性的梗图 |
large language model multimodal |
✅ |
|
| 8 |
Goal-Based Vision-Language Driving |
NovaDrive:基于视觉语言模型的单分支自动驾驶架构,提升安全性与效率 |
embodied AI |
|
|
| 9 |
Vocabulary-free Fine-grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model |
提出E-FineR,一种基于上下文增强视觉-语言模型的免词汇细粒度图像识别方法。 |
large language model |
✅ |
|
| 10 |
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation |
提出OmniAVS数据集和OISA模型,用于解决多模态融合的指代音视频分割任务。 |
multimodal |
|
|
| 11 |
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention |
提出MoCHA以解决视觉语言模型的训练与推理成本问题 |
large language model |
|
|
| 12 |
Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings |
提出FetalCLIP$_{CLS}$,利用胎儿超声图像基础模型提升低资源环境下的图像质量评估。 |
foundation model |
✅ |
|
| 13 |
Segment Anything for Video: A Comprehensive Review of Video Object Segmentation and Tracking from Past to Future |
综述基于SAM的视频目标分割与跟踪方法,展望未来发展趋势 |
foundation model |
|
|
| 14 |
A Linear N-Point Solver for Structure and Motion from Asynchronous Tracks |
提出一种线性N点解算器,用于从异步轨迹中进行结构和运动估计 |
TAMP |
✅ |
|