PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
作者: Shangrui Nie, Kian Omoomi, Lucie Flek, Zhixue Zhao, Charles Welch
分类: cs.CL
发布日期: 2026-02-09
备注: 15 pages, 1 figure
💡 一句话要点
提出PERSPECTRA以解决多元观点评估问题
🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 多元主义 语言模型 辩论分析 观点评估 自然语言处理
📋 核心要点
- 现有方法在多元主义研究中缺乏有效的评估基准,尤其是在辩论内容的结构性和语言多样性方面存在不足。
- PERSPECTRA通过结合Kialo的结构性与Reddit的语言多样性,构建了一个可扩展的多元基准,支持多种自然变体的生成。
- 实验结果显示,当前最先进的LLM在观点识别和分类上存在系统性错误,突显了多元理解的复杂性。
📝 摘要(中文)
多元主义,即在不将多种观点合并为单一视角的情况下与不同观点进行互动,对于开发能够真实反映人类多样性的语言模型至关重要。然而,这一特性在大型语言模型(LLM)研究中尚未得到充分探讨。本文提出PERSPECTRA,一个结合Kialo辩论图的结构清晰性与Reddit讨论的语言多样性的多元基准,构建了3810个丰富的论点,涵盖100个有争议话题的762个支持/反对立场。通过三个任务的初始化,实验揭示了当前模型在多元理解和推理方面的系统性缺陷,强调了该领域的挑战。
🔬 方法详解
问题定义:本文旨在解决当前大型语言模型在多元观点理解和推理中的不足,尤其是缺乏有效的评估基准和方法。现有的辩论平台如Reddit和Kialo虽然提供了丰富的内容,但各自存在结构不清晰或信息过于简洁的问题。
核心思路:PERSPECTRA的核心思想是结合辩论的结构性与语言的多样性,通过构建丰富的论点库来支持多元观点的评估。该方法通过控制检索和扩展流程,生成多个自然变体,以增强评估的鲁棒性。
技术框架:整体架构包括三个主要模块:首先是从多种辩论平台中检索数据,其次是对检索到的论点进行扩展,最后是构建评估任务(观点计数、观点匹配和极性检查),以实现对多元观点的系统评估。
关键创新:PERSPECTRA的创新在于首次将结构化的辩论图与自然语言讨论结合,形成一个可扩展的评估基准,能够有效区分和推理多种观点。这一方法在多元主义研究中具有重要的里程碑意义。
关键设计:在参数设置上,PERSPECTRA采用了多样化的观点扩展策略,确保生成的论点在语言上具有自然性和多样性。同时,设计了针对不同任务的损失函数,以优化模型在观点识别和分类上的表现。
📊 实验亮点
实验结果表明,当前最先进的开源和专有LLM在观点识别和分类任务中存在显著的系统性错误,例如高估观点数量和错误分类让步结构。这些发现强调了多元理解和推理的复杂性,推动了该领域的进一步研究。
🎯 应用场景
PERSPECTRA的研究成果可广泛应用于自然语言处理、社会网络分析和人机交互等领域。通过提升模型对多元观点的理解能力,该基准能够促进更为公平和全面的AI系统发展,未来可能在教育、法律和公共政策等领域产生深远影响。
📄 摘要(原文)
Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human heterogeneity. Yet this characteristic has not been carefully examined in the LLM research community and remains absent from most alignment studies. Debate-oriented sources provide a natural entry point for pluralism research. Previous work builds on online debate sources but remains constrained by costly human validation. Other debate-rich platforms such as Reddit and Kialo also offer promising material: Reddit provides linguistic diversity and scale but lacks clear argumentative structure, while Kialo supplies explicit pro/con graphs but remains overly concise and detached from natural discourse. We introduce PERSPECTRA, a pluralist benchmark that integrates the structural clarity of Kialo debate graphs with the linguistic diversity of real Reddit discussions. Using a controlled retrieval-and-expansion pipeline, we construct 3,810 enriched arguments spanning 762 pro/con stances on 100 controversial topics. Each opinion is expanded to multiple naturalistic variants, enabling robust evaluation of pluralism. We initialise three tasks with PERSPECTRA: opinion counting (identifying distinct viewpoints), opinion matching (aligning supporting stances and discourse to source opinions), and polarity check (inferring aggregate stance in mixed discourse). Experiments with state-of-the-art open-source and proprietary LLMs, highlight systematic failures, such as overestimating the number of viewpoints and misclassifying concessive structures, underscoring the difficulty of pluralism-aware understanding and reasoning. By combining diversity with structure, PERSPECTRA establishes the first scalable, configurable benchmark for evaluating how well models represent, distinguish, and reason over multiple perspectives.