| 1 |
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation |
提出LongVie以解决超长视频生成中的可控性与一致性问题 |
multimodal |
|
|
| 2 |
SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks |
提出SAM2-UNeXT以提升基础模型在下游分割任务中的表现 |
foundation model |
✅ |
|
| 3 |
Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection |
提出ARAS方法以解决现有异常合成的结构缺陷问题 |
language conditioned |
|
|
| 4 |
Semantic Mosaicing of Histo-Pathology Image Fragments using Visual Foundation Models |
提出SemanticStitcher以解决组织病理图像拼接问题 |
foundation model |
|
|
| 5 |
MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis |
提出MedCAL-Bench以解决医疗图像分析中的冷启动主动学习问题 |
foundation model |
✅ |
|
| 6 |
Zero-shot Shape Classification of Nanoparticles in SEM Images using Vision Foundation Models |
提出零-shot分类方法以解决纳米颗粒形态识别问题 |
foundation model |
|
|
| 7 |
Beyond Meme Templates: Limitations of Visual Similarity Measures in Meme Matching |
提出超越模板匹配的视觉相似性度量以解决表情包匹配问题 |
large language model multimodal |
|
|
| 8 |
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation |
提出CoEmoGen以解决情感图像生成中的语义不一致问题 |
large language model multimodal |
✅ |
|
| 9 |
R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation |
提出R2GenKG以解决X光报告生成中的幻觉与诊断能力不足问题 |
large language model foundation model |
✅ |
|
| 10 |
Less is More: Token-Efficient Video-QA via Adaptive Frame-Pruning and Semantic Graph Integration |
提出自适应帧剪枝与语义图集成以解决视频问答中的冗余问题 |
large language model multimodal |
|
|
| 11 |
Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts |
提出利用LLM生成视觉概念以增强疾病持续学习 |
large language model multimodal |
|
|
| 12 |
Enhancing Long Video Question Answering with Scene-Localized Frame Grouping |
提出SLFG方法以解决长视频问答中的信息提取问题 |
large language model multimodal |
|
|
| 13 |
ParticleSAM: Small Particle Segmentation for Material Quality Monitoring in Recycling Processes |
提出ParticleSAM以解决建筑材料回收中小颗粒分割问题 |
foundation model |
|
|
| 14 |
VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation |
提出VLMQ以解决视觉语言模型的后训练量化问题 |
large language model |
|
|
| 15 |
Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA |
提出反事实视觉问答基准以审计黑箱LVLM的决策偏差 |
multimodal |
|
|
| 16 |
Multi-Granularity Feature Calibration via VFM for Domain Generalized Semantic Segmentation |
提出多粒度特征校准方法以解决领域泛化语义分割问题 |
foundation model |
|
|