| 1 |
Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras |
提出多模态大语言模型以解决肩部疾病诊断问题 |
large language model multimodal |
|
|
| 2 |
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs |
PhysToolBench:首个面向MLLM的物理工具理解能力评测基准 |
embodied AI vision-language-action VLA |
|
|
| 3 |
Goal-oriented Backdoor Attack against Vision-Language-Action Models via Physical Objects |
提出面向视觉-语言-动作模型的物理对象后门攻击GoBA,实现目标导向的恶意行为。 |
embodied AI vision-language-action VLA |
✅ |
|
| 4 |
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception |
BLINK-Twice:提出视觉感知推理基准,强调细粒度观察与分析,挑战多模态大语言模型。 |
large language model foundation model multimodal |
✅ |
|
| 5 |
Task-Aware Resolution Optimization for Visual Large Language Models |
提出任务感知分辨率优化方法,提升视觉大语言模型在不同任务上的性能 |
large language model |
|
|
| 6 |
Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning |
研究多模态语境下外语学习者对词义歧义消解的推理能力 |
multimodal |
|
|
| 7 |
Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models |
提出动态链式思考方法,提升视觉-语言模型在多模态关键短语预测任务上的性能 |
chain-of-thought |
✅ |
|
| 8 |
Tag-Enriched Multi-Attention with Large Language Models for Cross-Domain Sequential Recommendation |
提出TEMA-LLM,利用LLM增强的多注意力机制解决跨域序列推荐问题 |
large language model |
|
|
| 9 |
Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition |
Cattle-CLIP:利用多模态学习框架进行牛行为识别,提升数据稀缺场景下的性能。 |
multimodal |
|
|
| 10 |
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation |
提出MSDM,一种多模态条件扩散模型,用于生成细胞和细胞核分割任务的病理图像。 |
multimodal |
|
|
| 11 |
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping |
提出AttWarp,利用注意力引导图像扭曲提升多模态大语言模型性能 |
large language model multimodal |
|
|
| 12 |
CapGeo: A Caption-Assisted Approach to Geometric Reasoning |
CapGeo:一种基于图文描述的几何推理方法 |
large language model multimodal |
|
|
| 13 |
HandEval: Taking the First Step Towards Hand Quality Evaluation in Generated Images |
提出HandEval,用于评估生成图像中手部质量,提升AIGC应用效果。 |
large language model multimodal |
|
|
| 14 |
Hierarchical Scheduling for Multi-Vector Image Retrieval |
HiMIR:面向多向量图像检索的分层调度框架,提升精度和效率 |
large language model multimodal |
|
|
| 15 |
Cluster-Aware Prompt Ensemble Learning for Few-Shot Vision-Language Model Adaptation |
提出聚类感知的提示集成学习框架,提升少样本视觉-语言模型的适应性 |
zero-shot transfer |
|
|
| 16 |
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models |
针对大视觉语言模型中的对象幻觉,提出基于视觉token认知不确定性的缓解方法 |
large language model |
|
|
| 17 |
RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos |
提出RO-Bench,用于大规模评估MLLM在文本驱动对抗视频上的鲁棒性 |
large language model |
|
|