CNR Institutional Research Information System

MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.

Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review

Molinari A.

2019

Abstract

MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Automated classifiers
Classification criterion
Cost-sensitive
Expected costs
Misclassifications
Posterior probability
Textual documents
Training time
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_442377-doc_158747.pdf accesso aperto Descrizione: Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 455.37 kB Formato Adobe PDF Visualizza/Apri	455.37 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/424409

Citazioni

ND

2

ND

social impact