MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.
Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review
Molinari A.
2019
Abstract
MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.File | Dimensione | Formato | |
---|---|---|---|
prod_442377-doc_158747.pdf
accesso aperto
Descrizione: Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
455.37 kB
Formato
Adobe PDF
|
455.37 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.