The evolution of a parent malware into a family of slightly different mutations may hinder detection mechanisms based on signatures, while the limited number of training examples may reduce the effectiveness of machine learning methods in the early stages of the infection. To address these challenges, we define a framework to improve the ability to generalize the detection of 'evolving' malware samples. Specifically, we leverage a Large Language Model (LLM) to map malware instructions into a latent space. The obtained embeddings are then used to train a Variational Autoencoder for generating realistic variants. Experimental results obtained by training a detector on both real and synthetic embeddings demonstrate the effectiveness of our approach, especially when facing three real malware families. Our LLM-based feature extraction approach should be then considered a promising mechanism for pursuing robust malware detection in dynamic threat environments.
Days of Future Past: Towards Robust Detection of Malware Variants via LLM-Based Embedding Generation
Benedetti G.;Caviglione L.;Guarascio M.;Liguori A.;Manco G.;Rullo A.
2025
Abstract
The evolution of a parent malware into a family of slightly different mutations may hinder detection mechanisms based on signatures, while the limited number of training examples may reduce the effectiveness of machine learning methods in the early stages of the infection. To address these challenges, we define a framework to improve the ability to generalize the detection of 'evolving' malware samples. Specifically, we leverage a Large Language Model (LLM) to map malware instructions into a latent space. The obtained embeddings are then used to train a Variational Autoencoder for generating realistic variants. Experimental results obtained by training a detector on both real and synthetic embeddings demonstrate the effectiveness of our approach, especially when facing three real malware families. Our LLM-based feature extraction approach should be then considered a promising mechanism for pursuing robust malware detection in dynamic threat environments.| File | Dimensione | Formato | |
|---|---|---|---|
|
Days_of_Future_Past_Towards_Robust_Detection_of_Malware_Variants_via_LLM-Based_Embedding_Generation.pdf
accesso aperto
Licenza:
Dominio pubblico
Dimensione
443.54 kB
Formato
Adobe PDF
|
443.54 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


