In this study we ask the question whether simplifying the data in dialectometrical studies by removing infrequent forms is advantageous to uncovering the geographical structure in dialect data. By investigating lexical variation in a large corpus of Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are able to identify the main geographical areas together with their linguistic basis. In order to assess the influence of infrequent forms, we conduct two analyses: one which includes only lexical variants used by at least 0.5% of the informants, and another which includes all lexical variants in the data. Using this approach we show that using all data enables us to find a geographical characterization with a more adequate linguistic basis than by using the trimmed data.

Infrequent forms: Noise or not?

Simonetta Montemagni
2016

Abstract

In this study we ask the question whether simplifying the data in dialectometrical studies by removing infrequent forms is advantageous to uncovering the geographical structure in dialect data. By investigating lexical variation in a large corpus of Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are able to identify the main geographical areas together with their linguistic basis. In order to assess the influence of infrequent forms, we conduct two analyses: one which includes only lexical variants used by at least 0.5% of the informants, and another which includes all lexical variants in the data. Using this approach we show that using all data enables us to find a geographical characterization with a more adequate linguistic basis than by using the trimmed data.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Martijn Wieling it
dc.authority.people Simonetta Montemagni it
dc.collection.id.s 8c50ea44-be95-498f-946e-7bb5bd666b7c *
dc.collection.name 02.01 Contributo in volume (Capitolo o Saggio) *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/16 02:55:54 -
dc.date.available 2024/02/16 02:55:54 -
dc.date.issued 2016 -
dc.description.abstracteng In this study we ask the question whether simplifying the data in dialectometrical studies by removing infrequent forms is advantageous to uncovering the geographical structure in dialect data. By investigating lexical variation in a large corpus of Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are able to identify the main geographical areas together with their linguistic basis. In order to assess the influence of infrequent forms, we conduct two analyses: one which includes only lexical variants used by at least 0.5% of the informants, and another which includes all lexical variants in the data. Using this approach we show that using all data enables us to find a geographical characterization with a more adequate linguistic basis than by using the trimmed data. -
dc.description.affiliations University of Groningen, CLCG; Istituto di Linguistica Computazionale "Antonio Zampolli", ILC-CNR -
dc.description.allpeople Wieling, Martijn; Montemagni, Simonetta -
dc.description.allpeopleoriginal Martijn Wieling, Simonetta Montemagni. -
dc.description.fulltext none en
dc.description.numberofauthors 2 -
dc.identifier.doi 10.17169/langsci.b81.78 -
dc.identifier.isbn 978-3-946234-18-0 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/353731 -
dc.identifier.url http://langsci-press.org/catalog/view/81/78/367-1 -
dc.language.iso eng -
dc.publisher.country DEU -
dc.publisher.name Language Science Press -
dc.publisher.place Berlin -
dc.relation.alleditors Marie-Hélène Côté, Remco Knooihuizen, John Nerbonne -
dc.relation.firstpage 215 -
dc.relation.ispartofbook The Future of Dialects -
dc.relation.lastpage 224 -
dc.relation.numberofpages 415 -
dc.subject.keywords dialectometrical studies -
dc.subject.keywords dialectology -
dc.subject.keywords dialect data -
dc.subject.keywords lexical variation -
dc.subject.keywords Tuscan -
dc.subject.singlekeyword dialectometrical studies *
dc.subject.singlekeyword dialectology *
dc.subject.singlekeyword dialect data *
dc.subject.singlekeyword lexical variation *
dc.subject.singlekeyword Tuscan *
dc.title Infrequent forms: Noise or not? en
dc.type.driver info:eu-repo/semantics/bookPart -
dc.type.full 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) it
dc.type.miur 268 -
dc.ugov.descaux1 367813 -
iris.orcid.lastModifiedDate 2024/04/04 10:47:15 *
iris.orcid.lastModifiedMillisecond 1712220435025 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.metadataCallLastModified 20/12/2025 05:24:18 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1766204658454 -
iris.unpaywall.metadataErrorDescription 0 -
iris.unpaywall.metadataErrorType ERROR_NO_MATCH -
iris.unpaywall.metadataStatus ERROR -
Appare nelle tipologie: 02.01 Contributo in volume (Capitolo o Saggio)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/353731
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact