MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.

Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review

Molinari A.
2019

Abstract

MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Automated classifiers
Classification criterion
Cost-sensitive
Expected costs
Misclassifications
Posterior probability
Textual documents
Training time
File in questo prodotto:
File Dimensione Formato  
prod_442377-doc_158747.pdf

accesso aperto

Descrizione: Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 455.37 kB
Formato Adobe PDF
455.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/424409
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact