| 1 |
Vision Foundation Model Embedding-Based Semantic Anomaly Detection |
提出基于视觉基础模型嵌入的语义异常检测框架,用于提升自动驾驶系统的安全性。 |
foundation model |
|
|
| 2 |
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning |
提出Skywork-VL Reward,用于提升多模态理解和推理任务的奖励模型。 |
multimodal |
|
|
| 3 |
Visually Interpretable Subtask Reasoning for Visual Question Answering |
VISTAR:通过视觉可解释的子任务推理提升视觉问答能力 |
large language model multimodal |
✅ |
|
| 4 |
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning |
提出Re-Critic框架,通过增强推理链缓解多模态大模型中的幻觉问题 |
multimodal chain-of-thought |
|
|
| 5 |
Gameplay Highlights Generation |
提出基于微调X-CLIP的多模态游戏精彩片段自动生成方法,无需游戏引擎集成或OCR。 |
multimodal |
|
|
| 6 |
Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs |
提出自监督事件表示方法,实现片上FPGA的精确、实时事件相机感知 |
TAMP |
✅ |
|
| 7 |
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs |
提出基于图神经网络的文档布局分析方法,提升公共事务文档理解精度。 |
multimodal |
|
|
| 8 |
L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers |
提出L-SWAG以解决零成本神经架构搜索在视觉变换器中的应用问题 |
large language model |
|
|
| 9 |
Synthetic Similarity Search in Automotive Production |
提出基于合成数据的相似性搜索方案,用于汽车生产中的视觉质量检测。 |
foundation model |
|
|