Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games.

Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond

Cristiano Ciaccio;Alessio Miaschi;Felice Dell’Orletta
2025

Abstract

Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Cristiano Ciaccio en
dc.authority.people Gabriele Sarti en
dc.authority.people Alessio Miaschi en
dc.authority.people Felice Dell’Orletta en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2026/03/03 16:53:32 -
dc.date.available 2026/03/03 16:53:32 -
dc.date.firstsubmission 2026/03/03 15:46:47 *
dc.date.issued 2025 -
dc.date.submission 2026/03/03 15:46:47 *
dc.description.abstracteng Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games. -
dc.description.allpeople Ciaccio, Cristiano; Sarti, Gabriele; Miaschi, Alessio; Dell’Orletta, Felice -
dc.description.allpeopleoriginal Cristiano Ciaccio, Gabriele Sarti, Alessio Miaschi, Felice Dell’Orletta en
dc.description.fulltext open en
dc.description.numberofauthors 4 -
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/570745 -
dc.language.iso eng en
dc.relation.ispartofbook Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) en
dc.subject.keywordseng Language Games, Crosswords, Semantic Similarity, Embeddings, Natural Language Processing, Information Retrieval -
dc.subject.singlekeyword Language Games *
dc.subject.singlekeyword Crosswords *
dc.subject.singlekeyword Semantic Similarity *
dc.subject.singlekeyword Embeddings *
dc.subject.singlekeyword Natural Language Processing *
dc.subject.singlekeyword Information Retrieval *
dc.title Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
iris.mediafilter.data 2026/03/04 02:52:08 *
iris.orcid.lastModifiedDate 2026/03/03 16:53:32 *
iris.orcid.lastModifiedMillisecond 1772553212836 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
crci.pdf

accesso aperto

Licenza: Creative commons
Dimensione 2.54 MB
Formato Adobe PDF
2.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570745
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact