This paper presents the BO-ECLI Parser Engine, an open source Java framework for the automatic extraction of case-law and legislation references from case-law texts issued in the European context. Differences of languages and jurisdictions are tackled with an extensible design that guides and facilitates the development of pluggable national extensions, resulting in a considerably reduced effort with respect to the development of a full national legal link extractor from scratch. Thanks to a well-defined pipeline of services that synthesize the whole extraction process and to an internal annotation system that is used to convey the information along the pipeline, the software ensures both overall efficiency and flexibility in absolving language and jurisdiction dependent tasks. Services can be provided either by the common part of the software or by a national extension. For the implementation of services performing rule-based textual analysis (like entity identification), JFlex is used in the common part and recommended in the national extensions. Finally, through identifier generation services, the BOECLI Parser Engine can produce standard identifiers, like ECLI or CELEX, for each recognized legal reference. Starting from a Template project, two different national extensions have been successfully developed and tested in order to support the extraction of legal links from case-law texts written in the Italian and Spanish languages.
BO-ECLI parser engine: The extensible European solution for the automatic extraction of legal links
Agnoloni T;Bacci L;
2017
Abstract
This paper presents the BO-ECLI Parser Engine, an open source Java framework for the automatic extraction of case-law and legislation references from case-law texts issued in the European context. Differences of languages and jurisdictions are tackled with an extensible design that guides and facilitates the development of pluggable national extensions, resulting in a considerably reduced effort with respect to the development of a full national legal link extractor from scratch. Thanks to a well-defined pipeline of services that synthesize the whole extraction process and to an internal annotation system that is used to convey the information along the pipeline, the software ensures both overall efficiency and flexibility in absolving language and jurisdiction dependent tasks. Services can be provided either by the common part of the software or by a national extension. For the implementation of services performing rule-based textual analysis (like entity identification), JFlex is used in the common part and recommended in the national extensions. Finally, through identifier generation services, the BOECLI Parser Engine can produce standard identifiers, like ECLI or CELEX, for each recognized legal reference. Starting from a Template project, two different national extensions have been successfully developed and tested in order to support the extraction of legal links from case-law texts written in the Italian and Spanish languages.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.