Human sketches exhibit substantial variability across individuals in terms of line style, abstraction level and drawing conventions. Unlike realistic images, they provide limited contextual information and rely on highly simplified concept representations. Recognizing and generating sketches therefore requires efficient use of the available information, identification of the most informative local features, interpretation of their meaning within a minimal context, and understanding of the spatial relationships that define the overall structure. In this study, we introduce ViSketch-GPT, a representation and model that can extract these local features, contextualize them within the sketch and encode spatial relationships, thereby enabling a deeper understanding of the sketch structure. Guided by the intuition of the void as information, we leverage Signed Distance Functions (SDF) to reveal this potentially hidden information, organizing it via quadtree decomposition and processing it with a hierarchical Transformer to capture multi-scale dependencies. This structured representation allows the model to support both high-fidelity generation and accurate classification. Experiments on the QuickDraw and TU-Berlin datasets demonstrated that the model classifies sketches with high accuracy while generating outputs that preserve structural coherence, respect part relationships, and capture essential conceptual patterns despite the scarcity of information in the original sketches.
Vi-SketchGPT: a novel multi-scale and context-aware representation for sketch generation and classification
Federico Giulio
;Amato Giuseppe;Carrara Fabio;Gennaro Claudio;Di Benedetto Marco
2026
Abstract
Human sketches exhibit substantial variability across individuals in terms of line style, abstraction level and drawing conventions. Unlike realistic images, they provide limited contextual information and rely on highly simplified concept representations. Recognizing and generating sketches therefore requires efficient use of the available information, identification of the most informative local features, interpretation of their meaning within a minimal context, and understanding of the spatial relationships that define the overall structure. In this study, we introduce ViSketch-GPT, a representation and model that can extract these local features, contextualize them within the sketch and encode spatial relationships, thereby enabling a deeper understanding of the sketch structure. Guided by the intuition of the void as information, we leverage Signed Distance Functions (SDF) to reveal this potentially hidden information, organizing it via quadtree decomposition and processing it with a hierarchical Transformer to capture multi-scale dependencies. This structured representation allows the model to support both high-fidelity generation and accurate classification. Experiments on the QuickDraw and TU-Berlin datasets demonstrated that the model classifies sketches with high accuracy while generating outputs that preserve structural coherence, respect part relationships, and capture essential conceptual patterns despite the scarcity of information in the original sketches.| File | Dimensione | Formato | |
|---|---|---|---|
|
Vi-SketchGPT_A_Novel_Multi-Scale_and_Context-Aware_Representation_for_Sketch_Generation_and_Classification.pdf
accesso aperto
Descrizione: Vi-SketchGPT: A Novel Multi-Scale and Context-Aware Representation for Sketch Generation and Classification
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
6.5 MB
Formato
Adobe PDF
|
6.5 MB | Adobe PDF | Visualizza/Apri |
|
Federico et al_Vi-SketchGPT_A_Novel_Multi-Scale_and_Context-Aware_Representation_VoR.pdf
accesso aperto
Descrizione: Vi-SketchGPT: A Novel Multi-Scale and Context-Aware Representation for Sketch Generation and Classification
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
4.11 MB
Formato
Adobe PDF
|
4.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


