Processing compounds: what frequency (alone) cannot explain

Pirrelli, V; Ferro, M; Marzi, C; Gagné, C; Spalding, T; Marelli, M

Observed elevation in typing latency for the initial letter of the second constituent of an Englishcompound, compared with the typing time of the final letter of the first constituent (Gagné &Spalding 2016), suggests that both compounds ( snowball ) and pseudo-compounds ( carpet ) aredecomposed but also that full form representations are available in the lexical store. To gainfurther insight into the lexical representations underlying typing, we used computationalmodelling. In particular, we used superpositional models of word memory, based onSelf-Organising Recurrent Maps (TSOMs) (Ferro et al. 2016; Marzi et al. 2016), where bothsimple and compound words are processed (and stored) using the same pool of processing (andmemory) resources, to model the elevation in typing time at the constituent boundary and the rateof typing. In addition, we also considered models based in the Compositional DistributionalSemantics framework (CAOSS, Marelli et al. 2017), to simulate independent effects of semantictransparency on compound typing (Gagné & Spalding 2016).Due to co-activation and competition between compounds and their constituent words inTSOMs, levels of activation of processing nodes per letter positions appear to reflect degrees ofcontext-sensitive predictability: the higher the level, the more expected the letter in that position.In English compounds, activation levels appeared to exhibit a characteristically U-shapedpattern, with min values centred on the constituent boundary. A similar pattern was found forpseudo-compounds, which nonetheless present a less pronounced U-shaped pattern and a higheractivation value at the morpheme boundary than compounds do. The difference is in line with thehigher speed-up rate in typing pseudo-compounds than compounds reported in Gagné andSpalding (2016).TSOMs were trained on letter-based representations, so computer experiments couldsimulate peripheral effects of serial processing of compound structure before lexical access. Toinvestigate post-lexical issues, we also tested computational models of generation of themeanings of novel compounds based on CAOSS, which proved to be able to account forwell-established relational effects in compound processing (Gagné 2001; Gagné & Shoben 1997)with an unsupervised data-driven framework (Marelli et al. 2017). We ran a mixed-effectsregression analysis of the data in Gagné and Spalding (2016) using vector-semantics estimatesand TSOM activation levels to predict typing time for the initial letter of the second constituent.There was a negative effect of TSOM letter activation levels: i.e. the more active a letter node is,the faster a subject is at typing the letter ( t =-2.7 p =.007). Also, there was a positive effect ofCAOSS-based compositionality estimates: i.e. the more easily a compound's lexicalizedmeaning can be obtained through compositional operations on single constituent vectors, theslower participants were at typing the first letter of the second constituent ( t =2.4, p =.017).These results have interesting implications for an integrative computational architectureaccounting for the whole range of experimental evidence reported by Gagné and Spalding(2016). In particular we will focus on evidence of a stronger competition (and longer typingtime) in Transparent-Transparent and Transparent-Opaque compounds, vs. Opaque-Transparentcompounds, which gives an indication of a non-trivial interaction between semanticcompositionality and serial processing effects.