Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games.
Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond
Cristiano Ciaccio;Alessio Miaschi;Felice Dell’Orletta
2025
Abstract
Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Cristiano Ciaccio | en |
| dc.authority.people | Gabriele Sarti | en |
| dc.authority.people | Alessio Miaschi | en |
| dc.authority.people | Felice Dell’Orletta | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2026/03/03 16:53:32 | - |
| dc.date.available | 2026/03/03 16:53:32 | - |
| dc.date.firstsubmission | 2026/03/03 15:46:47 | * |
| dc.date.issued | 2025 | - |
| dc.date.submission | 2026/03/03 15:46:47 | * |
| dc.description.abstracteng | Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games. | - |
| dc.description.allpeople | Ciaccio, Cristiano; Sarti, Gabriele; Miaschi, Alessio; Dell’Orletta, Felice | - |
| dc.description.allpeopleoriginal | Cristiano Ciaccio, Gabriele Sarti, Alessio Miaschi, Felice Dell’Orletta | en |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 4 | - |
| dc.identifier.source | manual | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/570745 | - |
| dc.language.iso | eng | en |
| dc.relation.ispartofbook | Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) | en |
| dc.subject.keywordseng | Language Games, Crosswords, Semantic Similarity, Embeddings, Natural Language Processing, Information Retrieval | - |
| dc.subject.singlekeyword | Language Games | * |
| dc.subject.singlekeyword | Crosswords | * |
| dc.subject.singlekeyword | Semantic Similarity | * |
| dc.subject.singlekeyword | Embeddings | * |
| dc.subject.singlekeyword | Natural Language Processing | * |
| dc.subject.singlekeyword | Information Retrieval | * |
| dc.title | Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| iris.mediafilter.data | 2026/03/04 02:52:08 | * |
| iris.orcid.lastModifiedDate | 2026/03/03 16:53:32 | * |
| iris.orcid.lastModifiedMillisecond | 1772553212836 | * |
| iris.sitodocente.maxattempts | 1 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
crci.pdf
accesso aperto
Licenza:
Creative commons
Dimensione
2.54 MB
Formato
Adobe PDF
|
2.54 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


