Efficient wastewater and stormwater management is mandatory for sustainable cities. Extracting structured knowledge from reports and regulations is challenging due to domain-specific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages.

Novel benchmark for NER in the wastewater and stormwater domain

Cardillo F. A.;Debole F.;Frontini F.;
2025

Abstract

Efficient wastewater and stormwater management is mandatory for sustainable cities. Extracting structured knowledge from reports and regulations is challenging due to domain-specific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages.
Campo DC Valore Lingua
dc.authority.anceserie COLLOQUIUM IN INFORMATION SCIENCE AND TECHNOLOGY en
dc.authority.orgunit Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Cardillo F. A. en
dc.authority.people Debole F. en
dc.authority.people Frontini F. en
dc.authority.people Aelami M. en
dc.authority.people Chahinian N. en
dc.authority.people Conrad S. en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.appartenenza.mi 973 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2026/01/15 12:16:26 -
dc.date.available 2026/01/15 12:16:26 -
dc.date.firstsubmission 2026/01/13 22:43:58 *
dc.date.issued 2025 -
dc.date.submission 2026/01/13 22:43:58 *
dc.description.abstracteng Efficient wastewater and stormwater management is mandatory for sustainable cities. Extracting structured knowledge from reports and regulations is challenging due to domain-specific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages. -
dc.description.allpeople Cardillo, F. A.; Debole, F.; Frontini, F.; Aelami, M.; Chahinian, N.; Conrad, S. -
dc.description.allpeopleoriginal Cardillo F.A.; Debole F.; Frontini F.; Aelami M.; Chahinian N.; Conrad S. en
dc.description.fulltext restricted en
dc.description.numberofauthors 6 -
dc.identifier.doi 10.1109/cist65886.2025.11224095 en
dc.identifier.isbn 979-8-3315-4384-6 en
dc.identifier.scopus 2-s2.0-105024952471 en
dc.identifier.source orcid *
dc.identifier.uri https://hdl.handle.net/20.500.14243/562981 -
dc.identifier.url https://ieeexplore.ieee.org/document/11224095 en
dc.language.iso eng en
dc.publisher.country USA en
dc.publisher.name Institute of Electrical and Electronics Engineers en
dc.relation.conferencedate 2025 en
dc.relation.conferencename Cist 2025 - 8th IEEE International Congress on Information Science and Technology en
dc.relation.conferenceplace Marrakech, Morocco en
dc.relation.firstpage 226 en
dc.relation.ispartofbook Cist 2025 proceedings en
dc.relation.lastpage 231 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 6 en
dc.subject.keywordseng Annotation projection -
dc.subject.keywordseng Domain-specific corpus -
dc.subject.keywordseng LLMs for NER -
dc.subject.keywordseng Multilingual NLP -
dc.subject.keywordseng Named Entity Recognition -
dc.subject.singlekeyword Annotation projection *
dc.subject.singlekeyword Domain-specific corpus *
dc.subject.singlekeyword LLMs for NER *
dc.subject.singlekeyword Multilingual NLP *
dc.subject.singlekeyword Named Entity Recognition *
dc.title Novel benchmark for NER in the wastewater and stormwater domain en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
iris.mediafilter.data 2026/01/16 02:41:38 *
iris.orcid.lastModifiedDate 2026/01/15 12:27:48 *
iris.orcid.lastModifiedMillisecond 1768476468876 *
iris.scopus.extIssued 2025 -
iris.scopus.extTitle Novel Benchmark for NER in the Wastewater and Stormwater Domain -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1109/cist65886.2025.11224095 *
iris.unpaywall.isoa false *
iris.unpaywall.metadataCallLastModified 16/01/2026 03:34:06 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1768530846275 -
iris.unpaywall.oastatus closed *
scopus.category 1711 *
scopus.category 1706 *
scopus.category 1803 *
scopus.category 1802 *
scopus.contributor.affiliation Consiglio Nazionale Delle Ricerche -
scopus.contributor.affiliation Consiglio Nazionale Delle Ricerche -
scopus.contributor.affiliation Consiglio Nazionale Delle Ricerche -
scopus.contributor.affiliation Inria -
scopus.contributor.affiliation Cnrs -
scopus.contributor.affiliation Cnrs -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60085207 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60108488 -
scopus.contributor.afid 60108488 -
scopus.contributor.afid 60108488 -
scopus.contributor.auid 57191090133 -
scopus.contributor.auid 22333451000 -
scopus.contributor.auid 55162070400 -
scopus.contributor.auid 60012687800 -
scopus.contributor.auid 8625087900 -
scopus.contributor.auid 58672437800 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.country France -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Franco Alberto -
scopus.contributor.name Franca -
scopus.contributor.name Francesca -
scopus.contributor.name Mitra -
scopus.contributor.name Nanee -
scopus.contributor.name Serge -
scopus.contributor.subaffiliation Ist. di Linguistica Computazionale; -
scopus.contributor.subaffiliation Ist. di Scienza e Tecnologie Dell'Informazione; -
scopus.contributor.subaffiliation Ist. di Linguistica Computazionale; -
scopus.contributor.subaffiliation Hsm Univ. Montpellier;Cnrs;Ird; -
scopus.contributor.subaffiliation Hsm Univ Montpellier;Ird; -
scopus.contributor.subaffiliation Hsm Univ Montpellier;Ird; -
scopus.contributor.surname Cardillo -
scopus.contributor.surname Debole -
scopus.contributor.surname Frontini -
scopus.contributor.surname Aelami -
scopus.contributor.surname Chahinian -
scopus.contributor.surname Conrad -
scopus.date.issued 2025 *
scopus.description.abstracteng Efficient wastewater and stormwater management is mandatory for sustainable cities. Extracting structured knowledge from reports and regulations is challenging due to domain-specific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages. *
scopus.description.allpeopleoriginal Cardillo F.A.; Debole F.; Frontini F.; Aelami M.; Chahinian N.; Conrad S. *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.funding.funders 501100014596 - Istituto di Scienza e Tecnologie dell'Informazione; 501100000780 - European Commission; 501100001665 - Agence Nationale de la Recherche; *
scopus.funding.ids GA 101086252; GA ANR-21-CE23-0004; *
scopus.identifier.doi 10.1109/CiSt65886.2025.11224095 *
scopus.identifier.eissn 2327-1884 *
scopus.identifier.isbn 9798331543846 *
scopus.identifier.pui 649556157 *
scopus.identifier.scopus 2-s2.0-105024952471 *
scopus.journal.sourceid 21100400809 *
scopus.language.iso eng *
scopus.publisher.name Institute of Electrical and Electronics Engineers Inc. *
scopus.relation.conferencedate 2025 *
scopus.relation.conferencename 8th IEEE International Congress on Information Science and Technology, CiSt 2025 *
scopus.relation.conferenceplace mar *
scopus.relation.firstpage 226 *
scopus.relation.lastpage 231 *
scopus.subject.keywords Annotation projection; Domain-specific corpus; LLMs for NER; Multilingual NLP; Named Entity Recognition; *
scopus.title Novel Benchmark for NER in the Wastewater and Stormwater Domain *
scopus.titleeng Novel Benchmark for NER in the Wastewater and Stormwater Domain *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
main.pdf

solo utenti autorizzati

Descrizione: Novel Benchmark for NER in the Wastewater and Stormwater Domain
Tipologia: Documento in Pre-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 112.99 kB
Formato Adobe PDF
112.99 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/562981
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact