This repo is a stand-alone (re)implementation of the Distributional Random Oversampling (DRO) method presented in SIGIR'16. The former implementation was part of the JaTeCs framework for Java. Distributional Random Oversampling (DRO) is an oversampling method to counter data imbalance in binary text classification. DRO generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. The variability introduced by the oversampling method is enclosed in a latent space; the original space is replicated and left untouched.

PyDRO: A Python reimplementation of the Distributional Random Oversampling method for binary text classification

Moreo Fernandez AD
2020

Abstract

This repo is a stand-alone (re)implementation of the Distributional Random Oversampling (DRO) method presented in SIGIR'16. The former implementation was part of the JaTeCs framework for Java. Distributional Random Oversampling (DRO) is an oversampling method to counter data imbalance in binary text classification. DRO generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. The variability introduced by the oversampling method is enclosed in a latent space; the original space is replicated and left untouched.
2020
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Python
Distributional Random Oversampling
Imbalanced Classification
Binary Classification
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/374206
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact