In this study we ask the question whether simplifying the data in dialectometrical studies by removing infrequent forms is advantageous to uncovering the geographical structure in dialect data. By investigating lexical variation in a large corpus of Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are able to identify the main geographical areas together with their linguistic basis. In order to assess the influence of infrequent forms, we conduct two analyses: one which includes only lexical variants used by at least 0.5% of the informants, and another which includes all lexical variants in the data. Using this approach we show that using all data enables us to find a geographical characterization with a more adequate linguistic basis than by using the trimmed data.
Infrequent forms: Noise or not?
Simonetta Montemagni
2016
Abstract
In this study we ask the question whether simplifying the data in dialectometrical studies by removing infrequent forms is advantageous to uncovering the geographical structure in dialect data. By investigating lexical variation in a large corpus of Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are able to identify the main geographical areas together with their linguistic basis. In order to assess the influence of infrequent forms, we conduct two analyses: one which includes only lexical variants used by at least 0.5% of the informants, and another which includes all lexical variants in the data. Using this approach we show that using all data enables us to find a geographical characterization with a more adequate linguistic basis than by using the trimmed data.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Martijn Wieling | it |
| dc.authority.people | Simonetta Montemagni | it |
| dc.collection.id.s | 8c50ea44-be95-498f-946e-7bb5bd666b7c | * |
| dc.collection.name | 02.01 Contributo in volume (Capitolo o Saggio) | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/16 02:55:54 | - |
| dc.date.available | 2024/02/16 02:55:54 | - |
| dc.date.issued | 2016 | - |
| dc.description.abstracteng | In this study we ask the question whether simplifying the data in dialectometrical studies by removing infrequent forms is advantageous to uncovering the geographical structure in dialect data. By investigating lexical variation in a large corpus of Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are able to identify the main geographical areas together with their linguistic basis. In order to assess the influence of infrequent forms, we conduct two analyses: one which includes only lexical variants used by at least 0.5% of the informants, and another which includes all lexical variants in the data. Using this approach we show that using all data enables us to find a geographical characterization with a more adequate linguistic basis than by using the trimmed data. | - |
| dc.description.affiliations | University of Groningen, CLCG; Istituto di Linguistica Computazionale "Antonio Zampolli", ILC-CNR | - |
| dc.description.allpeople | Wieling, Martijn; Montemagni, Simonetta | - |
| dc.description.allpeopleoriginal | Martijn Wieling, Simonetta Montemagni. | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 2 | - |
| dc.identifier.doi | 10.17169/langsci.b81.78 | - |
| dc.identifier.isbn | 978-3-946234-18-0 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/353731 | - |
| dc.identifier.url | http://langsci-press.org/catalog/view/81/78/367-1 | - |
| dc.language.iso | eng | - |
| dc.publisher.country | DEU | - |
| dc.publisher.name | Language Science Press | - |
| dc.publisher.place | Berlin | - |
| dc.relation.alleditors | Marie-Hélène Côté, Remco Knooihuizen, John Nerbonne | - |
| dc.relation.firstpage | 215 | - |
| dc.relation.ispartofbook | The Future of Dialects | - |
| dc.relation.lastpage | 224 | - |
| dc.relation.numberofpages | 415 | - |
| dc.subject.keywords | dialectometrical studies | - |
| dc.subject.keywords | dialectology | - |
| dc.subject.keywords | dialect data | - |
| dc.subject.keywords | lexical variation | - |
| dc.subject.keywords | Tuscan | - |
| dc.subject.singlekeyword | dialectometrical studies | * |
| dc.subject.singlekeyword | dialectology | * |
| dc.subject.singlekeyword | dialect data | * |
| dc.subject.singlekeyword | lexical variation | * |
| dc.subject.singlekeyword | Tuscan | * |
| dc.title | Infrequent forms: Noise or not? | en |
| dc.type.driver | info:eu-repo/semantics/bookPart | - |
| dc.type.full | 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) | it |
| dc.type.miur | 268 | - |
| dc.ugov.descaux1 | 367813 | - |
| iris.orcid.lastModifiedDate | 2024/04/04 10:47:15 | * |
| iris.orcid.lastModifiedMillisecond | 1712220435025 | * |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.metadataCallLastModified | 20/12/2025 05:24:18 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1766204658454 | - |
| iris.unpaywall.metadataErrorDescription | 0 | - |
| iris.unpaywall.metadataErrorType | ERROR_NO_MATCH | - |
| iris.unpaywall.metadataStatus | ERROR | - |
| Appare nelle tipologie: | 02.01 Contributo in volume (Capitolo o Saggio) | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


