CNR Institutional Research Information System

Flaky tests are tests whose outcomes are non-deterministic. Despite the recent research activity on this topic, no effort has been made on understanding the vocabulary of flaky tests. This work proposes to automatically classify tests as flaky or not based on their vocabulary. Static classification of flaky tests is important, for example, to detect the introduction of flaky tests and to search for flaky tests after they are introduced in regression test suites. We evaluated performance of various machine learning algorithms to solve this problem. We constructed a data set of flaky and non-flaky tests by running every test case, in a set of 64k tests, 100 times (6.4 million test executions). We then used machine learning techniques on the resulting data set to predict which tests are flaky from their source code. Based on features, such as counting stemmed tokens extracted from source code identifiers, we achieved an F-measure of 0.95 for the identification of flaky tests. The best prediction performance was obtained when using Random Forest and Support Vector Machines. In terms of the code identifiers that are most strongly associated with test flakiness, we noted that job, action, and services are commonly associated with flaky tests. Overall, our results provides initial yet strong evidence that static detection of flaky tests is effective.

What is the Vocabulary of Flaky Tests?

Pinto G;Miranda B;Dissanayake S;d'Amorim M;Treude C;Bertolino A

2020

Abstract

Flaky tests are tests whose outcomes are non-deterministic. Despite the recent research activity on this topic, no effort has been made on understanding the vocabulary of flaky tests. This work proposes to automatically classify tests as flaky or not based on their vocabulary. Static classification of flaky tests is important, for example, to detect the introduction of flaky tests and to search for flaky tests after they are introduced in regression test suites. We evaluated performance of various machine learning algorithms to solve this problem. We constructed a data set of flaky and non-flaky tests by running every test case, in a set of 64k tests, 100 times (6.4 million test executions). We then used machine learning techniques on the resulting data set to predict which tests are flaky from their source code. Based on features, such as counting stemmed tokens extracted from source code identifiers, we achieved an F-measure of 0.95 for the identification of flaky tests. The best prediction performance was obtained when using Random Forest and Support Vector Machines. In terms of the code identifiers that are most strongly associated with test flakiness, we noted that job, action, and services are commonly associated with flaky tests. Overall, our results provides initial yet strong evidence that static detection of flaky tests is effective.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				978-1-4503-7517-7
			
	Parole chiave
	
				Test flakiness
Regression testing
Text classification
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_435436-doc_155726.pdf solo utenti autorizzati Descrizione: What is the Vocabulary of Flaky Tests? Tipologia: Versione Editoriale (PDF) Dimensione 176.84 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	176.84 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/377583

Citazioni

ND

88

52

social impact