Web Language Identier (WLI) is a service that, startingfrom the URL of a Web page or a plain text and exploiting a pool oflanguage identification tools, returns a set of candidate languages witha confidence score. Currently embedded tools are Chromium CompactLanguage Detector, Lingua::Identify, and a simple one based on HTML attributes. The service can be exploited through a Web application orvia an API. To globally evaluate the identifiers, we constructed a test set of Web pages extracted from 146 Wikipedia projects. This allows using WLI also as a service to compare language identification tools in terms of supported languages and precision of the results. The charts summarizing the comparison can be visualized in the WLI Web application. We plan to extend the service making it possible for the users to add their own identifier.

WLI: a Web application for Language Identification and evaluation of available tools

A Marchetti;C Bacciu;M Abrate
2012

Abstract

Web Language Identier (WLI) is a service that, startingfrom the URL of a Web page or a plain text and exploiting a pool oflanguage identification tools, returns a set of candidate languages witha confidence score. Currently embedded tools are Chromium CompactLanguage Detector, Lingua::Identify, and a simple one based on HTML attributes. The service can be exploited through a Web application orvia an API. To globally evaluate the identifiers, we constructed a test set of Web pages extracted from 146 Wikipedia projects. This allows using WLI also as a service to compare language identification tools in terms of supported languages and precision of the results. The charts summarizing the comparison can be visualized in the WLI Web application. We plan to extend the service making it possible for the users to add their own identifier.
2012
Istituto di informatica e telematica - IIT
Multilingual Web
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/312895
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact