Advancements in Large Language Models (LLMs) allow solving many challenging tasks related to software security in an automatic manner, e.g., the generation of test cases. An important aspect concerns the deobfuscation of source code, especially for improving its readability or preventing the elusion of signature-based countermeasures. Although LLMs are increasingly deployed to reveal the presence of malicious payloads within obfuscated software components, a comprehensive understanding of their potential and limitations is still missing. In this work, we evaluate the effectiveness of deobfuscating JavaScript code through an LLM-based pipeline. In more detail, we investigate whether LLMs can preserve structural properties of the software, especially to enhance the identification of weaknesses. Compared to two standard tools (i.e., JSNice and js-deobfuscator), our approach provides a more readable JavaScript prose according to several metrics, while retaining information on the Common Weaknesses Enumeration plaguing the software. To support the process of explaining issues within code, we performed tests on the use of two general-purpose LLMs, i.e., ChatGPT and Google Gemini. Results indicate that advancing the security of JavaScript through LLMs requires facing several challenges, which can be largely addressed via ad-hoc models.

Deobfuscation of JavaScript Code and Identification of Security Weaknesses Through Large Language Models

Giacomo Benedetti;Luca Caviglione;Carmela Comito;Alberto Falcone;Massimo Guarascio
2026

Abstract

Advancements in Large Language Models (LLMs) allow solving many challenging tasks related to software security in an automatic manner, e.g., the generation of test cases. An important aspect concerns the deobfuscation of source code, especially for improving its readability or preventing the elusion of signature-based countermeasures. Although LLMs are increasingly deployed to reveal the presence of malicious payloads within obfuscated software components, a comprehensive understanding of their potential and limitations is still missing. In this work, we evaluate the effectiveness of deobfuscating JavaScript code through an LLM-based pipeline. In more detail, we investigate whether LLMs can preserve structural properties of the software, especially to enhance the identification of weaknesses. Compared to two standard tools (i.e., JSNice and js-deobfuscator), our approach provides a more readable JavaScript prose according to several metrics, while retaining information on the Common Weaknesses Enumeration plaguing the software. To support the process of explaining issues within code, we performed tests on the use of two general-purpose LLMs, i.e., ChatGPT and Google Gemini. Results indicate that advancing the security of JavaScript through LLMs requires facing several challenges, which can be largely addressed via ad-hoc models.
2026
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI - Sede Secondaria Genova
Code deobfuscation
Large language models
Cybersecurity
Threat analysis
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0167739X25006120-main.pdf

accesso aperto

Descrizione: Deobfuscation of JavaScript code and identi!cation of security weaknesses through large language models
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 7.74 MB
Formato Adobe PDF
7.74 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/561157
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact