We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We presentand evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.
WEIR-P: An Information Extraction Pipeline for the Wastewater Domain
Francesca FrontiniCo-primo
;
2021
Abstract
We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We presentand evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.anceserie | LECTURE NOTES IN BUSINESS INFORMATION PROCESSING | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Nanée Chahinian | en |
| dc.authority.people | Thierry Bonnabaud La Bruyère | en |
| dc.authority.people | Francesca Frontini | en |
| dc.authority.people | Carole Delenne | en |
| dc.authority.people | Marin Julien | en |
| dc.authority.people | Rachel Panckhurst | en |
| dc.authority.people | Mathieu Roche | en |
| dc.authority.people | Lucile Sautot | en |
| dc.authority.people | Laurent Deruelle | en |
| dc.authority.people | Maguelonne Teisseire | en |
| dc.collection.id.s | 8c50ea44-be95-498f-946e-7bb5bd666b7c | * |
| dc.collection.name | 02.01 Contributo in volume (Capitolo o Saggio) | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2024/02/20 18:45:59 | - |
| dc.date.available | 2024/02/20 18:45:59 | - |
| dc.date.firstsubmission | 2025/01/24 17:17:00 | * |
| dc.date.issued | 2021 | - |
| dc.date.submission | 2025/01/30 15:25:15 | * |
| dc.description.abstracteng | We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We presentand evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work. | - |
| dc.description.affiliations | HSM, Univ. Montpellier, CNRS, IRD, Montpellier, France Istituto di Linguistica Computazionale "A. Zampolli" - CNR, Pisa, Italy Inria Lemon, CRISAM - Inria Sophia Antipolis - Méditerranée, Montpellier, France Dipralang, UPVM, Montpellier, France TETIS, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France Berger Levrault, Perols, France | - |
| dc.description.allpeople | Chahinian, Nanée; Bonnabaud La Bruyère, Thierry; Frontini, Francesca; Delenne, Carole; Julien, Marin; Panckhurst, Rachel; Roche, Mathieu; Sautot, Lucile; Deruelle, Laurent; Teisseire, Maguelonne | - |
| dc.description.allpeopleoriginal | Nanée Chahinian, Thierry Bonnabaud La Bruyère, Francesca Frontini, Carole Delenne, Marin Julien, Rachel Panckhurst, Mathieu Roche, Lucile Sautot, Laurent Deruelle, Maguelonne Teisseire | en |
| dc.description.fulltext | restricted | en |
| dc.description.numberofauthors | 10 | - |
| dc.identifier.isbn | 978-3-030-75017-6 | en |
| dc.identifier.isi | WOS:000886549300011 | en |
| dc.identifier.scopus | 2-s2.0-85111129575 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/394922 | - |
| dc.identifier.url | https://www.springer.com/gp/book/9783030750176 | en |
| dc.language.iso | eng | en |
| dc.miur.last.status.update | 2025-01-22T11:18:33Z | * |
| dc.publisher.country | CHE | en |
| dc.publisher.name | Springer Nature Switzerland | en |
| dc.publisher.place | Basel | en |
| dc.relation.allauthors | Samira Cherfi, Anna Perini, Selmin Nurcan | en |
| dc.relation.firstpage | 171 | en |
| dc.relation.ispartofbook | Research Challenges in Information Science - 15th International Conference, RCIS 2021, Limassol, Cyprus, May 11-14, 2021, Proceedings | en |
| dc.relation.lastpage | 188 | en |
| dc.relation.numberofpages | 18 | en |
| dc.subject.keywordseng | Wastewater | - |
| dc.subject.keywordseng | text mining | - |
| dc.subject.keywordseng | Information extraction | - |
| dc.subject.keywordseng | NLP | - |
| dc.subject.keywordseng | NER | - |
| dc.subject.keywordseng | Domain adapted systems | - |
| dc.subject.singlekeyword | Wastewater | * |
| dc.subject.singlekeyword | text mining | * |
| dc.subject.singlekeyword | Information extraction | * |
| dc.subject.singlekeyword | NLP | * |
| dc.subject.singlekeyword | NER | * |
| dc.subject.singlekeyword | Domain adapted systems | * |
| dc.title | WEIR-P: An Information Extraction Pipeline for the Wastewater Domain | en |
| dc.type.driver | info:eu-repo/semantics/bookPart | - |
| dc.type.full | 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) | it |
| dc.type.miur | 268 | - |
| dc.type.referee | Sì, ma tipo non specificato | en |
| dc.ugov.descaux1 | 453819 | - |
| iris.isi.metadataErrorDescription | 0 | - |
| iris.isi.metadataErrorType | ERROR_NO_MATCH | - |
| iris.isi.metadataStatus | ERROR | - |
| iris.mediafilter.data | 2025/04/04 04:17:23 | * |
| iris.orcid.lastModifiedDate | 2025/03/05 11:10:10 | * |
| iris.orcid.lastModifiedMillisecond | 1741169410024 | * |
| iris.scopus.extIssued | 2021 | - |
| iris.scopus.extTitle | WEIR-P: An Information Extraction Pipeline for the Wastewater Domain | - |
| iris.sitodocente.maxattempts | 1 | - |
| scopus.authority.anceserie | LECTURE NOTES IN BUSINESS INFORMATION PROCESSING###1865-1348 | * |
| scopus.category | 1404 | * |
| scopus.category | 2207 | * |
| scopus.category | 1403 | * |
| scopus.category | 1710 | * |
| scopus.category | 2611 | * |
| scopus.category | 1802 | * |
| scopus.contributor.affiliation | IRD | - |
| scopus.contributor.affiliation | IRD | - |
| scopus.contributor.affiliation | CLARIN ERIC | - |
| scopus.contributor.affiliation | CRISAM - Inria Sophia Antipolis – Méditerranée | - |
| scopus.contributor.affiliation | IRD | - |
| scopus.contributor.affiliation | UPVM | - |
| scopus.contributor.affiliation | INRAE | - |
| scopus.contributor.affiliation | INRAE | - |
| scopus.contributor.affiliation | Berger Levrault | - |
| scopus.contributor.affiliation | INRAE | - |
| scopus.contributor.afid | 60108488 | - |
| scopus.contributor.afid | 60108488 | - |
| scopus.contributor.afid | 117760649 | - |
| scopus.contributor.afid | 60032385 | - |
| scopus.contributor.afid | 60108488 | - |
| scopus.contributor.afid | 60009278 | - |
| scopus.contributor.afid | 60103240 | - |
| scopus.contributor.afid | 60103240 | - |
| scopus.contributor.afid | 60273169 | - |
| scopus.contributor.afid | 60103240 | - |
| scopus.contributor.auid | 8625087900 | - |
| scopus.contributor.auid | 57226305812 | - |
| scopus.contributor.auid | 55162070400 | - |
| scopus.contributor.auid | 25222360500 | - |
| scopus.contributor.auid | 57226320959 | - |
| scopus.contributor.auid | 16408440500 | - |
| scopus.contributor.auid | 15136810000 | - |
| scopus.contributor.auid | 56326932800 | - |
| scopus.contributor.auid | 6506080527 | - |
| scopus.contributor.auid | 6601930149 | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | Netherlands | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | France | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Nanée | - |
| scopus.contributor.name | Thierry | - |
| scopus.contributor.name | Francesca | - |
| scopus.contributor.name | Carole | - |
| scopus.contributor.name | Marin | - |
| scopus.contributor.name | Rachel | - |
| scopus.contributor.name | Mathieu | - |
| scopus.contributor.name | Lucile | - |
| scopus.contributor.name | Laurent | - |
| scopus.contributor.name | Maguelonne | - |
| scopus.contributor.subaffiliation | HSM;Univ. Montpellier;CNRS; | - |
| scopus.contributor.subaffiliation | HSM;Univ. Montpellier;CNRS; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Inria Lemon; | - |
| scopus.contributor.subaffiliation | HSM;Univ. Montpellier;CNRS; | - |
| scopus.contributor.subaffiliation | Dipralang; | - |
| scopus.contributor.subaffiliation | TETIS;AgroParisTech;CIRAD;CNRS; | - |
| scopus.contributor.subaffiliation | TETIS;AgroParisTech;CIRAD;CNRS; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | TETIS;AgroParisTech;CIRAD;CNRS; | - |
| scopus.contributor.surname | Chahinian | - |
| scopus.contributor.surname | Bonnabaud La Bruyère | - |
| scopus.contributor.surname | Frontini | - |
| scopus.contributor.surname | Delenne | - |
| scopus.contributor.surname | Julien | - |
| scopus.contributor.surname | Panckhurst | - |
| scopus.contributor.surname | Roche | - |
| scopus.contributor.surname | Sautot | - |
| scopus.contributor.surname | Deruelle | - |
| scopus.contributor.surname | Teisseire | - |
| scopus.date.issued | 2021 | * |
| scopus.description.abstracteng | We present the MeDO project, aimed at developing resources for text mining and information extraction in the wastewater domain. We developed a specific Natural Language Processing (NLP) pipeline named WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to information, wastewater treatment, accidents and works, organizations, spatio-temporal information, measures and water quality. We present and evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotation from texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of the extension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work. | * |
| scopus.description.allpeopleoriginal | Chahinian N.; Bonnabaud La Bruyere T.; Frontini F.; Delenne C.; Julien M.; Panckhurst R.; Roche M.; Sautot L.; Deruelle L.; Teisseire M. | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.identifier.isbn | * |
| scopus.differences | scopus.identifier.doi | * |
| scopus.differences | scopus.relation.volume | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.identifier.doi | 10.1007/978-3-030-75018-3_11 | * |
| scopus.identifier.eissn | 1865-1356 | * |
| scopus.identifier.isbn | 9783030750176 | * |
| scopus.identifier.pui | 635559327 | * |
| scopus.identifier.scopus | 2-s2.0-85111129575 | * |
| scopus.journal.sourceid | 17500155101 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Springer Science and Business Media Deutschland GmbH | * |
| scopus.relation.conferencedate | 2021 | * |
| scopus.relation.conferencename | 15th International Conference on Research Challenges in Information Science, RCIS 2021 | * |
| scopus.relation.firstpage | 171 | * |
| scopus.relation.lastpage | 188 | * |
| scopus.relation.volume | 415 | * |
| scopus.subject.keywords | Domain adapted systems; Information extraction; NER; NLP; Text mining; Wastewater; | * |
| scopus.title | WEIR-P: An Information Extraction Pipeline for the Wastewater Domain | * |
| scopus.titleeng | WEIR-P: An Information Extraction Pipeline for the Wastewater Domain | * |
| Appare nelle tipologie: | 02.01 Contributo in volume (Capitolo o Saggio) | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_453819-doc_174566.pdf
solo utenti autorizzati
Descrizione: WEIR-P: An Information Extraction Pipeline for the Wastewater Domain
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2.42 MB
Formato
Adobe PDF
|
2.42 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


