[Context and motivation] The current breakthrough of natural language processing (NLP) techniques can provide the requirements engineering (RE) community with powerful tools that can help addressing specic tasks of natural language (NL) requirements analysis, such as traceability, ambiguity detection and requirements classification, to name a few. [Question/problem] However, modern NLP techniques are mainly statistical, and need large NL requirements datasets, to support appropriate training, test and validation of the techniques. The RE community has experimented with NLP since long time, but datasets were often proprietary, or limited to few software projects for which requirements were publicly available. Hence, replication of the experiments and generalization have always been an issue. [Principal idea/results] Our near future commitment is to provide a publicly available NL requirements dataset. [Contribution] To this end, we are collecting requirements documents from the Web, and we are representing them in a common XML format. In this paper, we present the current version of the dataset, together with our agenda concerning formatting, extension, and annotation of the dataset.
Towards a dataset for natural language requirements processing
Ferrari A;Spagnolo G O;Gnesi S
2017
Abstract
[Context and motivation] The current breakthrough of natural language processing (NLP) techniques can provide the requirements engineering (RE) community with powerful tools that can help addressing specic tasks of natural language (NL) requirements analysis, such as traceability, ambiguity detection and requirements classification, to name a few. [Question/problem] However, modern NLP techniques are mainly statistical, and need large NL requirements datasets, to support appropriate training, test and validation of the techniques. The RE community has experimented with NLP since long time, but datasets were often proprietary, or limited to few software projects for which requirements were publicly available. Hence, replication of the experiments and generalization have always been an issue. [Principal idea/results] Our near future commitment is to provide a publicly available NL requirements dataset. [Contribution] To this end, we are collecting requirements documents from the Web, and we are representing them in a common XML format. In this paper, we present the current version of the dataset, together with our agenda concerning formatting, extension, and annotation of the dataset.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_382379-doc_132986.pdf
solo utenti autorizzati
Descrizione: Towards a Dataset for Natural Language Requirements Processing
Tipologia:
Versione Editoriale (PDF)
Dimensione
327.77 kB
Formato
Adobe PDF
|
327.77 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


