| 1 |
Explaining multimodal LLMs via intra-modal token interactions |
通过模态内token交互增强多模态LLM的可解释性 |
large language model multimodal |
|
|
| 2 |
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation |
JanusVLN:利用双重隐式记忆解耦语义与空间信息,提升视觉语言导航性能 |
VLN large language model multimodal |
✅ |
|
| 3 |
Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model |
提出基于多模态通用模型的睡眠分期新范式,提升PSG分析的准确性和鲁棒性 |
multimodal |
|
|
| 4 |
Effectiveness of Large Multimodal Models in Detecting Disinformation: Experimental Results |
利用GPT-4o模型,结合优化Prompt工程,解决多模态信息伪造检测难题 |
multimodal |
|
|
| 5 |
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning |
提出MILR,一种测试时潜在推理方法,提升多模态图像生成质量。 |
multimodal |
|
|
| 6 |
DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images |
提出基于Stable Diffusion的多模态方法DeHate,以减轻图像中的仇恨言论。 |
multimodal |
|
|
| 7 |
DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation |
DynaNav:针对高效视觉导航的动态特征与层选择框架 |
embodied AI foundation model |
|
|
| 8 |
FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning |
FishAI 2.0:结合多模态少样本学习进行海洋鱼类图像分类 |
large language model multimodal |
|
|
| 9 |
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision |
提出Labeling Copilot,用于计算机视觉中自动化数据标注的深度研究Agent。 |
foundation model multimodal |
|
|
| 10 |
UML-CoT: Structured Reasoning and Planning with Unified Modeling Language for Robotic Room Cleaning |
提出UML-CoT框架,利用UML进行机器人房间清洁任务的结构化推理与规划 |
large language model chain-of-thought |
|
|
| 11 |
Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation |
EAGLE:轻量级黑盒框架,解释多模态大语言模型自回归token生成过程。 |
large language model multimodal |
|
|
| 12 |
Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language Models |
提出自适应全局上下文注入(AGCI)以解决大视觉语言模型中的空间偏见问题 |
large language model multimodal |
|
|
| 13 |
CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process |
CircuitSense:提出电路系统基准,桥接工程设计中的视觉理解与符号推理。 |
large language model |
|
|