This paper presents the BO-ECLI Parser Engine, an open source Java framework for the automatic extraction of case-law and legislation references from case-law texts issued in the European context. Differences of languages and jurisdictions are tackled with an extensible design that guides and facilitates the development of pluggable national extensions, resulting in a considerably reduced effort with respect to the development of a full national legal link extractor from scratch. Thanks to a well-defined pipeline of services that synthesize the whole extraction process and to an internal annotation system that is used to convey the information along the pipeline, the software ensures both overall efficiency and flexibility in absolving language and jurisdiction dependent tasks. Services can be provided either by the common part of the software or by a national extension. For the implementation of services performing rule-based textual analysis (like entity identification), JFlex is used in the common part and recommended in the national extensions. Finally, through identifier generation services, the BOECLI Parser Engine can produce standard identifiers, like ECLI or CELEX, for each recognized legal reference. Starting from a Template project, two different national extensions have been successfully developed and tested in order to support the extraction of legal links from case-law texts written in the Italian and Spanish languages.

BO-ECLI parser engine: The extensible European solution for the automatic extraction of legal links

Agnoloni T;Bacci L;
2017

Abstract

This paper presents the BO-ECLI Parser Engine, an open source Java framework for the automatic extraction of case-law and legislation references from case-law texts issued in the European context. Differences of languages and jurisdictions are tackled with an extensible design that guides and facilitates the development of pluggable national extensions, resulting in a considerably reduced effort with respect to the development of a full national legal link extractor from scratch. Thanks to a well-defined pipeline of services that synthesize the whole extraction process and to an internal annotation system that is used to convey the information along the pipeline, the software ensures both overall efficiency and flexibility in absolving language and jurisdiction dependent tasks. Services can be provided either by the common part of the software or by a national extension. For the implementation of services performing rule-based textual analysis (like entity identification), JFlex is used in the common part and recommended in the national extensions. Finally, through identifier generation services, the BOECLI Parser Engine can produce standard identifiers, like ECLI or CELEX, for each recognized legal reference. Starting from a Template project, two different national extensions have been successfully developed and tested in order to support the extraction of legal links from case-law texts written in the Italian and Spanish languages.
2017
Istituto di Teoria e Tecniche dell'Informazione Giuridica - ITTIG - Sede Firenze
Istituto di Informatica Giuridica e Sistemi Giudiziari - IGSG
legal references
information extraction
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/426226
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact