| 1 |
Learning from Limited and Incomplete Data: A Multimodal Framework for Predicting Pathological Response in NSCLC |
提出一种多模态深度学习框架,用于预测非小细胞肺癌新辅助治疗后的病理缓解情况。 |
foundation model multimodal |
|
|
| 2 |
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal |
提出MER-Bench:一个用于多模态Meme内容重构的综合基准 |
large language model multimodal |
✅ |
|
| 3 |
DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery |
DamageArbiter:一种CLIP增强的多模态仲裁框架,用于街景图像的飓风灾害评估 |
large language model multimodal |
|
|
| 4 |
Multimodal Connectome Fusion via Cross-Attention for Autism Spectrum Disorder Classification Using Graph Learning |
提出基于交叉注意力的多模态图学习框架,用于自闭症谱系障碍的分类。 |
multimodal |
|
|
| 5 |
VAREX: A Benchmark for Multi-Modal Structured Extraction from Documents |
VAREX:一个用于评估多模态文档结构化信息提取的基准 |
foundation model multimodal instruction following |
|
|
| 6 |
Anchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation |
提出TAEMI框架,利用文本锚定和跨模态注意力,提升噪声环境下情感模仿强度估计的鲁棒性。 |
multimodal |
|
|
| 7 |
Evaluating Time Awareness and Cross-modal Active Perception of Large Models via 4D Escape Room Task |
提出EscapeCraft-4D环境,评估大模型在时序感知和跨模态主动感知方面的能力 |
large language model multimodal |
|
|
| 8 |
GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI Agents |
提出GUI-CEval以解决中文移动GUI代理评估不足问题 |
large language model multimodal |
|
|
| 9 |
A Skill-augmented Agentic Framework and Benchmark for Multi-Video Understanding |
提出SAMA框架和MVX-Bench基准,用于提升多视频理解中的跨视频推理能力。 |
large language model multimodal |
|
|
| 10 |
Severe Domain Shift in Skeleton-Based Action Recognition:A Study of Uncertainty Failure in Real-World Gym Environments |
针对骨骼动作识别中的严重领域偏移,提出基于微调门控机制的校准方法。 |
zero-shot transfer |
|
|
| 11 |
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning |
HalDec-Bench:用于图像描述幻觉检测的综合基准测试平台 |
multimodal |
✅ |
|
| 12 |
HYDRA: Unifying Multi-modal Generation and Understanding via Representation-Harmonized Tokenization |
提出HYDRA,通过表征协调的Token化统一多模态生成与理解。 |
multimodal |
|
|
| 13 |
Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding |
提出QViC-MF框架,利用记忆反馈提升长视频理解中时序事件的建模能力。 |
multimodal |
|
|
| 14 |
MMSpec: Benchmarking Speculative Decoding for Vision-Language Models |
MMSpec:针对视觉-语言模型推测解码的基准测试与ViSkip加速方法 |
multimodal |
|
|
| 15 |
GT-PCQA: Geometry-Texture Decoupled Point Cloud Quality Assessment with MLLM |
提出GT-PCQA,利用MLLM解决点云质量评估中几何结构敏感性不足的问题。 |
large language model |
|
|
| 16 |
Balancing Saliency and Coverage: Semantic Prominence-Aware Budgeting for Visual Token Compression in VLMs |
提出PromPrune,通过语义显著性感知预算分配实现VLM视觉token自适应压缩。 |
multimodal |
|
|
| 17 |
Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection |
提出Two Birds, One Projection,通过推理时特征投影调和LVLM的安全性与效用。 |
large language model |
|
|