| 1 |
LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models |
提出LUQ:多模态大语言模型的分层超低比特量化方法,降低内存占用。 |
large language model multimodal |
|
|
| 2 |
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications |
提出PCRI指标,评估多模态模型在企业应用中对视觉上下文的鲁棒性。 |
large language model multimodal |
|
|
| 3 |
Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models |
提出视觉隐私分类法,评估视觉-语言模型在隐私理解上的局限性 |
large language model multimodal |
|
|
| 4 |
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks |
提出RCI指标,评估多模态基准测试中全局和局部推理的依赖程度 |
large language model multimodal |
|
|
| 5 |
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training |
LLaVA-OneVision-1.5:全开放多模态训练框架,降低训练成本并提升性能 |
multimodal chain-of-thought |
|
|
| 6 |
Uncovering Grounding IDs: How External Cues Shape Multimodal Binding |
提出Grounding IDs概念,揭示外部线索如何塑造多模态绑定 |
multimodal |
|
|
| 7 |
HunyuanImage 3.0 Technical Report |
腾讯混元发布HunyuanImage 3.0,开源最大规模的图像生成MoE模型 |
foundation model multimodal chain-of-thought |
✅ |
|
| 8 |
Adapting Large Language Models to Mitigate Skin Tone Biases in Clinical Dermatology Tasks: A Mixed-Methods Study |
通过适配大型语言模型缓解临床皮肤病学任务中的肤色偏差 |
large language model |
|
|
| 9 |
ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation |
ColLab:一种用于指代表达式理解与生成的协同空间渐进式数据引擎 |
large language model multimodal |
|
|
| 10 |
HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling |
HiDe:通过分层解耦重新思考高分辨率MLLM中的Zoom-IN方法 |
large language model multimodal |
✅ |
|
| 11 |
SVAC: Scaling Is All You Need For Referring Video Object Segmentation |
SVAC:通过放大输入和分割token,提升指称视频对象分割性能。 |
large language model |
✅ |
|
| 12 |
Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis |
通过梯度范数调整损失权重,解决多任务学习中的优化不平衡问题 |
foundation model |
|
|
| 13 |
HIVTP: A Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score |
提出HIVTP,一种免训练的分层视觉Token剪枝方法,提升VLM推理效率。 |
multimodal |
|
|
| 14 |
RIV: Recursive Introspection Mask Diffusion Vision Language Model |
提出递归自省掩码扩散视觉语言模型(RIV),赋予模型自纠错能力。 |
multimodal |
|
|
| 15 |
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data |
StolenLoRA:提出基于合成数据的LoRA提取攻击方法 |
large language model |
|
|