We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We presentand evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.

WEIR-P: An Information Extraction Pipeline for the Wastewater Domain

Francesca Frontini
Co-primo
;
2021

Abstract

We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We presentand evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.
Campo DC Valore Lingua
dc.authority.anceserie LECTURE NOTES IN BUSINESS INFORMATION PROCESSING en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Nanée Chahinian en
dc.authority.people Thierry Bonnabaud La Bruyère en
dc.authority.people Francesca Frontini en
dc.authority.people Carole Delenne en
dc.authority.people Marin Julien en
dc.authority.people Rachel Panckhurst en
dc.authority.people Mathieu Roche en
dc.authority.people Lucile Sautot en
dc.authority.people Laurent Deruelle en
dc.authority.people Maguelonne Teisseire en
dc.collection.id.s 8c50ea44-be95-498f-946e-7bb5bd666b7c *
dc.collection.name 02.01 Contributo in volume (Capitolo o Saggio) *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/02/20 18:45:59 -
dc.date.available 2024/02/20 18:45:59 -
dc.date.firstsubmission 2025/01/24 17:17:00 *
dc.date.issued 2021 -
dc.date.submission 2025/01/30 15:25:15 *
dc.description.abstracteng We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We presentand evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work. -
dc.description.affiliations HSM, Univ. Montpellier, CNRS, IRD, Montpellier, France Istituto di Linguistica Computazionale "A. Zampolli" - CNR, Pisa, Italy Inria Lemon, CRISAM - Inria Sophia Antipolis - Méditerranée, Montpellier, France Dipralang, UPVM, Montpellier, France TETIS, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France Berger Levrault, Perols, France -
dc.description.allpeople Chahinian, Nanée; Bonnabaud La Bruyère, Thierry; Frontini, Francesca; Delenne, Carole; Julien, Marin; Panckhurst, Rachel; Roche, Mathieu; Sautot, Lucile; Deruelle, Laurent; Teisseire, Maguelonne -
dc.description.allpeopleoriginal Nanée Chahinian, Thierry Bonnabaud La Bruyère, Francesca Frontini, Carole Delenne, Marin Julien, Rachel Panckhurst, Mathieu Roche, Lucile Sautot, Laurent Deruelle, Maguelonne Teisseire en
dc.description.fulltext restricted en
dc.description.numberofauthors 10 -
dc.identifier.isbn 978-3-030-75017-6 en
dc.identifier.isi WOS:000886549300011 en
dc.identifier.scopus 2-s2.0-85111129575 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/394922 -
dc.identifier.url https://www.springer.com/gp/book/9783030750176 en
dc.language.iso eng en
dc.miur.last.status.update 2025-01-22T11:18:33Z *
dc.publisher.country CHE en
dc.publisher.name Springer Nature Switzerland en
dc.publisher.place Basel en
dc.relation.allauthors Samira Cherfi, Anna Perini, Selmin Nurcan en
dc.relation.firstpage 171 en
dc.relation.ispartofbook Research Challenges in Information Science - 15th International Conference, RCIS 2021, Limassol, Cyprus, May 11-14, 2021, Proceedings en
dc.relation.lastpage 188 en
dc.relation.numberofpages 18 en
dc.subject.keywordseng Wastewater -
dc.subject.keywordseng text mining -
dc.subject.keywordseng Information extraction -
dc.subject.keywordseng NLP -
dc.subject.keywordseng NER -
dc.subject.keywordseng Domain adapted systems -
dc.subject.singlekeyword Wastewater *
dc.subject.singlekeyword text mining *
dc.subject.singlekeyword Information extraction *
dc.subject.singlekeyword NLP *
dc.subject.singlekeyword NER *
dc.subject.singlekeyword Domain adapted systems *
dc.title WEIR-P: An Information Extraction Pipeline for the Wastewater Domain en
dc.type.driver info:eu-repo/semantics/bookPart -
dc.type.full 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) it
dc.type.miur 268 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 453819 -
iris.isi.metadataErrorDescription 0 -
iris.isi.metadataErrorType ERROR_NO_MATCH -
iris.isi.metadataStatus ERROR -
iris.mediafilter.data 2025/04/04 04:17:23 *
iris.orcid.lastModifiedDate 2025/03/05 11:10:10 *
iris.orcid.lastModifiedMillisecond 1741169410024 *
iris.scopus.extIssued 2021 -
iris.scopus.extTitle WEIR-P: An Information Extraction Pipeline for the Wastewater Domain -
iris.sitodocente.maxattempts 1 -
scopus.authority.anceserie LECTURE NOTES IN BUSINESS INFORMATION PROCESSING###1865-1348 *
scopus.category 1404 *
scopus.category 2207 *
scopus.category 1403 *
scopus.category 1710 *
scopus.category 2611 *
scopus.category 1802 *
scopus.contributor.affiliation IRD -
scopus.contributor.affiliation IRD -
scopus.contributor.affiliation CLARIN ERIC -
scopus.contributor.affiliation CRISAM - Inria Sophia Antipolis – Méditerranée -
scopus.contributor.affiliation IRD -
scopus.contributor.affiliation UPVM -
scopus.contributor.affiliation INRAE -
scopus.contributor.affiliation INRAE -
scopus.contributor.affiliation Berger Levrault -
scopus.contributor.affiliation INRAE -
scopus.contributor.afid 60108488 -
scopus.contributor.afid 60108488 -
scopus.contributor.afid 117760649 -
scopus.contributor.afid 60032385 -
scopus.contributor.afid 60108488 -
scopus.contributor.afid 60009278 -
scopus.contributor.afid 60103240 -
scopus.contributor.afid 60103240 -
scopus.contributor.afid 60273169 -
scopus.contributor.afid 60103240 -
scopus.contributor.auid 8625087900 -
scopus.contributor.auid 57226305812 -
scopus.contributor.auid 55162070400 -
scopus.contributor.auid 25222360500 -
scopus.contributor.auid 57226320959 -
scopus.contributor.auid 16408440500 -
scopus.contributor.auid 15136810000 -
scopus.contributor.auid 56326932800 -
scopus.contributor.auid 6506080527 -
scopus.contributor.auid 6601930149 -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country Netherlands -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Nanée -
scopus.contributor.name Thierry -
scopus.contributor.name Francesca -
scopus.contributor.name Carole -
scopus.contributor.name Marin -
scopus.contributor.name Rachel -
scopus.contributor.name Mathieu -
scopus.contributor.name Lucile -
scopus.contributor.name Laurent -
scopus.contributor.name Maguelonne -
scopus.contributor.subaffiliation HSM;Univ. Montpellier;CNRS; -
scopus.contributor.subaffiliation HSM;Univ. Montpellier;CNRS; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Inria Lemon; -
scopus.contributor.subaffiliation HSM;Univ. Montpellier;CNRS; -
scopus.contributor.subaffiliation Dipralang; -
scopus.contributor.subaffiliation TETIS;AgroParisTech;CIRAD;CNRS; -
scopus.contributor.subaffiliation TETIS;AgroParisTech;CIRAD;CNRS; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation TETIS;AgroParisTech;CIRAD;CNRS; -
scopus.contributor.surname Chahinian -
scopus.contributor.surname Bonnabaud La Bruyère -
scopus.contributor.surname Frontini -
scopus.contributor.surname Delenne -
scopus.contributor.surname Julien -
scopus.contributor.surname Panckhurst -
scopus.contributor.surname Roche -
scopus.contributor.surname Sautot -
scopus.contributor.surname Deruelle -
scopus.contributor.surname Teisseire -
scopus.date.issued 2021 *
scopus.description.abstracteng We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We present and evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work. *
scopus.description.allpeopleoriginal Chahinian N.; Bonnabaud La Bruyere T.; Frontini F.; Delenne C.; Julien M.; Panckhurst R.; Roche M.; Sautot L.; Deruelle L.; Teisseire M. *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.identifier.doi *
scopus.differences scopus.relation.volume *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.doi 10.1007/978-3-030-75018-3_11 *
scopus.identifier.eissn 1865-1356 *
scopus.identifier.isbn 9783030750176 *
scopus.identifier.pui 635559327 *
scopus.identifier.scopus 2-s2.0-85111129575 *
scopus.journal.sourceid 17500155101 *
scopus.language.iso eng *
scopus.publisher.name Springer Science and Business Media Deutschland GmbH *
scopus.relation.conferencedate 2021 *
scopus.relation.conferencename 15th International Conference on Research Challenges in Information Science, RCIS 2021 *
scopus.relation.firstpage 171 *
scopus.relation.lastpage 188 *
scopus.relation.volume 415 *
scopus.subject.keywords Domain adapted systems; Information extraction; NER; NLP; Text mining; Wastewater; *
scopus.title WEIR-P: An Information Extraction Pipeline for the Wastewater Domain *
scopus.titleeng WEIR-P: An Information Extraction Pipeline for the Wastewater Domain *
Appare nelle tipologie: 02.01 Contributo in volume (Capitolo o Saggio)
File in questo prodotto:
File Dimensione Formato  
prod_453819-doc_174566.pdf

solo utenti autorizzati

Descrizione: WEIR-P: An Information Extraction Pipeline for the Wastewater Domain
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.42 MB
Formato Adobe PDF
2.42 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/394922
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact