CNR Institutional Research Information System

Human sketches exhibit substantial variability across individuals in terms of line style, abstraction level and drawing conventions. Unlike realistic images, they provide limited contextual information and rely on highly simplified concept representations. Recognizing and generating sketches therefore requires efficient use of the available information, identification of the most informative local features, interpretation of their meaning within a minimal context, and understanding of the spatial relationships that define the overall structure. In this study, we introduce ViSketch-GPT, a representation and model that can extract these local features, contextualize them within the sketch and encode spatial relationships, thereby enabling a deeper understanding of the sketch structure. Guided by the intuition of the void as information, we leverage Signed Distance Functions (SDF) to reveal this potentially hidden information, organizing it via quadtree decomposition and processing it with a hierarchical Transformer to capture multi-scale dependencies. This structured representation allows the model to support both high-fidelity generation and accurate classification. Experiments on the QuickDraw and TU-Berlin datasets demonstrated that the model classifies sketches with high accuracy while generating outputs that preserve structural coherence, respect part relationships, and capture essential conceptual patterns despite the scarcity of information in the original sketches.

Vi-SketchGPT: a novel multi-scale and context-aware representation for sketch generation and classification

Federico Giulio;Amato Giuseppe;Carrara Fabio;Gennaro Claudio;Di Benedetto Marco

2026

Abstract

Human sketches exhibit substantial variability across individuals in terms of line style, abstraction level and drawing conventions. Unlike realistic images, they provide limited contextual information and rely on highly simplified concept representations. Recognizing and generating sketches therefore requires efficient use of the available information, identification of the most informative local features, interpretation of their meaning within a minimal context, and understanding of the spatial relationships that define the overall structure. In this study, we introduce ViSketch-GPT, a representation and model that can extract these local features, contextualize them within the sketch and encode spatial relationships, thereby enabling a deeper understanding of the sketch structure. Guided by the intuition of the void as information, we leverage Signed Distance Functions (SDF) to reveal this potentially hidden information, organizing it via quadtree decomposition and processing it with a hierarchical Transformer to capture multi-scale dependencies. This structured representation allows the model to support both high-fidelity generation and accurate classification. Experiments on the QuickDraw and TU-Berlin datasets demonstrated that the model classifies sketches with high accuracy while generating outputs that preserve structural coherence, respect part relationships, and capture essential conceptual patterns despite the scarcity of information in the original sketches.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Vectors; Transformers; Feature extraction; Accuracy; Visualization; Adaptation models; Training; Generative adversarial networks; Encoding; Context modeling
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Vi-SketchGPT_A_Novel_Multi-Scale_and_Context-Aware_Representation_for_Sketch_Generation_and_Classification.pdf accesso aperto Descrizione: Vi-SketchGPT: A Novel Multi-Scale and Context-Aware Representation for Sketch Generation and Classification Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 6.5 MB Formato Adobe PDF Visualizza/Apri	6.5 MB	Adobe PDF	Visualizza/Apri
Federico et al_Vi-SketchGPT_A_Novel_Multi-Scale_and_Context-Aware_Representation_VoR.pdf accesso aperto Descrizione: Vi-SketchGPT: A Novel Multi-Scale and Context-Aware Representation for Sketch Generation and Classification Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 4.11 MB Formato Adobe PDF Visualizza/Apri	4.11 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/565826

Citazioni

ND

0

0

social impact