This paper presents PURE (PUblic REquirements dataset), a dataset of 79 publicly available natural language requirements documents collected from the Web. The dataset includes 34,268 sentences and can be used for natural language processing tasks that are typical in requirements engineering, such as model synthesis, abstraction identification and document structure assessment. It can be further annotated to work as a benchmark for other tasks, such as ambiguity detection, requirements categorisation and identification of equivalent re-quirements. In the paper, we present the dataset and we compare its language with generic English texts, showing the peculiarities of the requirements jargon, made of a restricted vocabulary of domain-specific acronyms and words, and long sentences. We also present the common XML format to which we have manually ported a subset of the documents, with the goal of facilitating replication of NLP experiments.

PURE: A Dataset of Public Requirements Documents

Ferrari A;Spagnolo G O;Gnesi S
2017

Abstract

This paper presents PURE (PUblic REquirements dataset), a dataset of 79 publicly available natural language requirements documents collected from the Web. The dataset includes 34,268 sentences and can be used for natural language processing tasks that are typical in requirements engineering, such as model synthesis, abstraction identification and document structure assessment. It can be further annotated to work as a benchmark for other tasks, such as ambiguity detection, requirements categorisation and identification of equivalent re-quirements. In the paper, we present the dataset and we compare its language with generic English texts, showing the peculiarities of the requirements jargon, made of a restricted vocabulary of domain-specific acronyms and words, and long sentences. We also present the common XML format to which we have manually ported a subset of the documents, with the goal of facilitating replication of NLP experiments.
2017
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
25th IEEE International Requirements Engineering Conference
502
505
9781538631911
https://ieeexplore.ieee.org/document/8049173/?reload=true
Sì, ma tipo non specificato
04/09/2017
Lisbon, Portugal
Empirical Software Engine
Empirical Studies
Model Synthesis
Natural Language Requirements
NLP
NLP Tasks
Public Requirements
PURE
Requirements Abstraction
Requirements Ambiguity Detection
Requirements Categorisation
Requirements Dataset
XML
3
restricted
Ferrari A.; Spagnolo G. O.; Gnesi S.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_382380-doc_132964.pdf

solo utenti autorizzati

Descrizione: PURE: A Dataset of Public Requirements Documents
Tipologia: Versione Editoriale (PDF)
Dimensione 547.96 kB
Formato Adobe PDF
547.96 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/335225
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 188
  • ???jsp.display-item.citation.isi??? 127
social impact