Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

作者: Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr

分类: cs.CV, cs.AI, cs.CL, cs.MA

发布日期: 2025-05-27 (更新: 2025-10-30)

备注: Project Page: https://github.com/Paper2Poster/Paper2Poster

🔗 代码/项目: GITHUB

💡 一句话要点

Paper2Poster提出多模态海报自动生成框架，解决科研论文海报制作难题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 海报生成 多模态学习 视觉语言模型 多智能体系统 论文摘要 自动排版 科研交流

📋 核心要点

学术海报制作是重要的科研交流环节，但将长篇论文压缩为视觉连贯的海报极具挑战。
Paper2Poster提出了一种多智能体框架PosterAgent，通过Parser、Planner和Painter-Commenter循环实现海报自动生成。
实验表明，Paper2Poster开源变体在多项指标上优于GPT-4o驱动的系统，且显著降低了token使用量和成本。

📝 摘要（中文）

本文提出了一种针对科学论文自动生成多模态海报的框架，旨在解决将长文本论文压缩成具有视觉连贯性的单页海报这一挑战。为此，作者构建了首个海报生成基准和评估指标体系，该体系将最新的会议论文与作者设计的海报配对，并从四个方面评估生成的海报：(i)视觉质量-与人工海报的语义对齐程度；(ii)文本连贯性-语言流畅度；(iii)整体评估-由视觉语言模型（VLM）作为评判者，对六个细粒度的美学和信息标准进行评分；(iv)PaperQuiz-通过VLM回答生成的测验来衡量海报传达核心论文内容的能力。基于此基准，作者提出了PosterAgent，一个自顶向下、视觉在环的多智能体流程。Parser将论文提炼成结构化的资源库；Planner将文本-视觉对齐成二叉树布局，以保持阅读顺序和空间平衡；Painter-Commenter循环通过执行渲染代码和使用VLM反馈来优化每个面板，消除溢出并确保对齐。实验表明，尽管GPT-4o的输出在视觉上具有吸引力，但通常存在文本噪声和PaperQuiz得分较低的问题。人类设计海报主要依靠视觉语义来传达意义，读者参与度是主要的美学瓶颈。基于Qwen-2.5系列的全开源变体在几乎所有指标上都优于现有的4o驱动的多智能体系统，同时使用的token减少了87%。该方法可以将一篇22页的论文转换为最终的可编辑.pptx海报，成本仅为0.005美元。代码和数据集已开源。

🔬 方法详解

问题定义：学术海报制作是将长篇论文压缩成视觉连贯、信息丰富的单页海报的过程，现有方法通常依赖人工设计，耗时耗力且难以保证质量。现有的自动生成方法，例如基于GPT-4o的多智能体系统，虽然在视觉上具有吸引力，但往往存在文本噪声、信息传达不准确等问题，难以有效传递论文的核心内容。

核心思路：Paper2Poster的核心思路是将海报生成过程分解为多个可控的子任务，并利用多智能体协作的方式，实现自顶向下、视觉在环的海报自动生成。通过结构化论文信息、优化布局和视觉反馈，确保生成的海报既美观又准确地传达论文的核心内容。

技术框架：Paper2Poster采用一个三阶段的多智能体流程PosterAgent：(a)Parser：将论文解析为结构化的资源库，提取关键文本和图像。(b)Planner：将文本-视觉对齐成二叉树布局，保持阅读顺序和空间平衡。(c)Painter-Commenter循环：Painter负责渲染海报面板，Commenter利用VLM进行视觉反馈，迭代优化面板，消除溢出并确保对齐。

关键创新：Paper2Poster的关键创新在于其多智能体协作的框架和视觉在环的优化机制。通过Parser将论文结构化，Planner保证布局合理，Painter-Commenter循环利用VLM的视觉理解能力进行反馈和优化，从而生成高质量的海报。此外，Paper2Poster还构建了首个海报生成基准和评估指标体系，为后续研究提供了标准化的评估方法。

关键设计：Planner模块采用二叉树布局，保证阅读顺序和空间平衡。Painter-Commenter循环中，Commenter使用VLM对海报面板进行评估，并提供反馈，指导Painter进行优化。此外，Paper2Poster还设计了PaperQuiz指标，通过VLM回答生成的测验来衡量海报传达核心论文内容的能力。

🖼️ 关键图片

📊 实验亮点

Paper2Poster的实验结果表明，基于Qwen-2.5系列的全开源变体在视觉质量、文本连贯性、整体评估和PaperQuiz等指标上均优于现有的GPT-4o驱动的多智能体系统，同时使用的token减少了87%，成本显著降低。该方法可以将一篇22页的论文转换为最终的可编辑.pptx海报，成本仅为0.005美元。

🎯 应用场景

Paper2Poster技术可应用于学术会议、科研机构等场景，大幅降低科研人员制作海报的时间和精力成本，提高科研成果的传播效率。未来，该技术可扩展到其他类型文档的自动排版和可视化，例如报告、PPT等，具有广阔的应用前景。

📄 摘要（原文）

Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser distills the paper into a structured asset library; the (b)Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c)Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs-though visually appealing at first glance-often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g. based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper into a finalized yet editable .pptx poster - all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models. The code and datasets are available at https://github.com/Paper2Poster/Paper2Poster.

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理