Many tasks in language analysis are described as the maximally economic mapping of one level of linguistic representation onto another such level. Over the past decade, many different machine-learning strategies have been developed to automatically induce such mappings directly from data. In this paper, we contend that the way most learning algorithms have been applied to problems of language analysis reflects a strong bias towards a compositional (or biunique) model of interlevel mapping. Although this is justified in some cases, we contend that biunique inter-level mapping is not a jack of all trades. A model of analogical learning, based on a paradigmatic reanalysis of memorized data, is presented here. The methodological pros and cons of this approach are discussed in relation to a number of germane linguistic issues and illustrated in the context of three case studies: word pronunciation, word analysis, and word sense disambiguation. The evidence produced here seems to suggest that the brain is not designed to carry out the logically simplest and maximally economic way of relating form and function in language. Rather we propose a radical shift of emphasis in language learning from syntagmatic inter-level mapping to paradigmatically-constrained intra-level mapping.

The hidden dimension: a paradigmatic view of data-driven NLP.

Vito Pirrelli;
1999

Abstract

Many tasks in language analysis are described as the maximally economic mapping of one level of linguistic representation onto another such level. Over the past decade, many different machine-learning strategies have been developed to automatically induce such mappings directly from data. In this paper, we contend that the way most learning algorithms have been applied to problems of language analysis reflects a strong bias towards a compositional (or biunique) model of interlevel mapping. Although this is justified in some cases, we contend that biunique inter-level mapping is not a jack of all trades. A model of analogical learning, based on a paradigmatic reanalysis of memorized data, is presented here. The methodological pros and cons of this approach are discussed in relation to a number of germane linguistic issues and illustrated in the context of three case studies: word pronunciation, word analysis, and word sense disambiguation. The evidence produced here seems to suggest that the brain is not designed to carry out the logically simplest and maximally economic way of relating form and function in language. Rather we propose a radical shift of emphasis in language learning from syntagmatic inter-level mapping to paradigmatically-constrained intra-level mapping.
1999
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
data-driven NLP
memory-based machine learning
analogical language learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/264349
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact