Privacy is one of the most critical issues associated with spreading the Internet of Things and Internet of Everything devices. Over the years, several methods have been introduced to address this phenomenon. In 2017, Google introduced the concept of Federated Machine Learning. This paradigm allows models to be trained collaboratively across multiple decentralized devices or servers, holding local data samples without exchanging them. This approach enhances data privacy and security by ensuring raw data remains on local devices while only model updates are shared and aggregated. This paper presents a privacy-preserving Android malware detector based on Federated Machine Learning. As a first step, we built a dataset comprising over 40,000 Android applications, including trusted and malicious (belonging to 71 malware families) samples. Afterward, we conducted experiments leveraging three different architectures by exploiting the CIFAR-10 and the ImageNet datasets, employing hyperparameters determined through a Grid Search algorithm by exploiting 40 clients. Moreover, the experimental analysis uses two distributions: Independent and identically distributed and non-independent and identically distributed data. To conclude the Federated Machine Learning experiments, we trained models for each architecture, with both weight types and distribution models, by applying the Clipping Norm Aggregator. The results exhibit interesting performances with Independent and identically distributed data, achieving an accuracy of 0.873 without normalization and 0.877 with the Clipping Norm aggregator. However, with non-independent and identically distributed data, the model accuracy equals 0.865 without normalization, 0.864 with the Clipping Norm aggregator using Custom MobileNet 2. In conclusion, to compare Federated Machine Learning with a centralized training approach, we trained several models adopting the same dataset, dataset splitting, and architectures, achieving an accuracy of 0.944 using InceptionV3. The outcomes show that the proposed method can provide engaging performances in privacy-preserving Android malware detection.

A method for real-world privacy-preserving Android malware detection through Federated Machine Learning

Ciaramella G.;Martinelli F.;Peluso C.;Mercaldo F.
2026

Abstract

Privacy is one of the most critical issues associated with spreading the Internet of Things and Internet of Everything devices. Over the years, several methods have been introduced to address this phenomenon. In 2017, Google introduced the concept of Federated Machine Learning. This paradigm allows models to be trained collaboratively across multiple decentralized devices or servers, holding local data samples without exchanging them. This approach enhances data privacy and security by ensuring raw data remains on local devices while only model updates are shared and aggregated. This paper presents a privacy-preserving Android malware detector based on Federated Machine Learning. As a first step, we built a dataset comprising over 40,000 Android applications, including trusted and malicious (belonging to 71 malware families) samples. Afterward, we conducted experiments leveraging three different architectures by exploiting the CIFAR-10 and the ImageNet datasets, employing hyperparameters determined through a Grid Search algorithm by exploiting 40 clients. Moreover, the experimental analysis uses two distributions: Independent and identically distributed and non-independent and identically distributed data. To conclude the Federated Machine Learning experiments, we trained models for each architecture, with both weight types and distribution models, by applying the Clipping Norm Aggregator. The results exhibit interesting performances with Independent and identically distributed data, achieving an accuracy of 0.873 without normalization and 0.877 with the Clipping Norm aggregator. However, with non-independent and identically distributed data, the model accuracy equals 0.865 without normalization, 0.864 with the Clipping Norm aggregator using Custom MobileNet 2. In conclusion, to compare Federated Machine Learning with a centralized training approach, we trained several models adopting the same dataset, dataset splitting, and architectures, achieving an accuracy of 0.944 using InceptionV3. The outcomes show that the proposed method can provide engaging performances in privacy-preserving Android malware detection.
2026
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Android
Federated Machine Learning
Machine learning
Malware
Mobile
Privacy
Security
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0950584925002319-main.pdf

accesso aperto

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 3.4 MB
Formato Adobe PDF
3.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559501
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact