The increasing complexity of cyber threats necessitates robust cyber security measures. Effective threat detection and mitigation depend on Cyber Threat Intelligence, which includes structured and unstructured data critical for proactive defense strategies. While databases like the NVD and ExploitDB offer structured security information, a significant amount of vital intelligence initially appears in unstructured formats, such as blogs, mailing lists, and news sites. Extracting meaningful information from these sources is particularly challenging in cyber security, requiring specialized Named Entity Recognition (NER) tools to identify domain-specific entities. This paper presents a NER dataset obtained by merging two cyber security domain datasets, CyNER and APTNER, creating a unified resource that enhances NER model training. Experimental results with advanced NER models show significant performance gains, underscoring the value of the proposed dataset in advancing cyber security practices, and highlighting the needs of such kind of resources.
A Dataset for the Fine-tuning of LLM for the NER Task in the Cyber Security Domain
Stefano Silvestri
Primo
;Giuseppe Felice Russo;Giuseppe Tricomi;Mario Ciampi
2025
Abstract
The increasing complexity of cyber threats necessitates robust cyber security measures. Effective threat detection and mitigation depend on Cyber Threat Intelligence, which includes structured and unstructured data critical for proactive defense strategies. While databases like the NVD and ExploitDB offer structured security information, a significant amount of vital intelligence initially appears in unstructured formats, such as blogs, mailing lists, and news sites. Extracting meaningful information from these sources is particularly challenging in cyber security, requiring specialized Named Entity Recognition (NER) tools to identify domain-specific entities. This paper presents a NER dataset obtained by merging two cyber security domain datasets, CyNER and APTNER, creating a unified resource that enhances NER model training. Experimental results with advanced NER models show significant performance gains, underscoring the value of the proposed dataset in advancing cyber security practices, and highlighting the needs of such kind of resources.| File | Dimensione | Formato | |
|---|---|---|---|
|
paper_170.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
180.32 kB
Formato
Adobe PDF
|
180.32 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


