In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework's potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner.
Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques
Folino F.;Folino G.;Pisani F. S.;Pontieri L.;Sabatino P.
2024
Abstract
In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework's potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner.File | Dimensione | Formato | |
---|---|---|---|
s40537-024-00933-6.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
3.96 MB
Formato
Adobe PDF
|
3.96 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.