Due to the pervasive diffusion of personal mobile and IoT devices, many "smart environments" (e.g., smart cities and smart factories) will be, among others, generators of huge amounts of data. To provide value-add services in these environments, data will have to be analysed to extract knowledge. Currently, this is typically achieved through centralised cloud-based data analytics services. However, according to many studies, this approach may present significant issues from the standpoint of data ownership, and even wireless network capacity. One possibility to cope with these shortcomings is to move data analytics closer to where data is generated. In this paper we tackle this issue by proposing and analysing a distributed learning framework, whereby data analytics are performed at the edge of the network, i.e., on locations very close to where data is generated. Specifically, in our framework, partial data analytics are performed directly on the nodes that generate the data, or on nodes close by (e.g., some of the data generators can take this role on behalf of subsets of other nodes nearby). Then, nodes exchange partial models and refine them accordingly. Our framework is general enough to host different analytics services. In the specific case analysed in the paper we focus on a learning task, considering two distributed learning algorithms. Using an activity recognition and a pattern recognition task, both on reference datasets, we compare the two learning algorithms between each other and with a central cloud solution (i.e., one that has access to the complete datasets). Our results show that using distributed machine learning techniques, it is possible to drastically reduce the network overhead, while obtaining performance comparable to the cloud solution in terms of learning accuracy. The analysis also shows when each distributed learning approach is preferable, based on the specific distribution of the data on the nodes.

A communication efficient distributed learning framework for smart environments

Valerio L;Passarella A;Conti M
2017

Abstract

Due to the pervasive diffusion of personal mobile and IoT devices, many "smart environments" (e.g., smart cities and smart factories) will be, among others, generators of huge amounts of data. To provide value-add services in these environments, data will have to be analysed to extract knowledge. Currently, this is typically achieved through centralised cloud-based data analytics services. However, according to many studies, this approach may present significant issues from the standpoint of data ownership, and even wireless network capacity. One possibility to cope with these shortcomings is to move data analytics closer to where data is generated. In this paper we tackle this issue by proposing and analysing a distributed learning framework, whereby data analytics are performed at the edge of the network, i.e., on locations very close to where data is generated. Specifically, in our framework, partial data analytics are performed directly on the nodes that generate the data, or on nodes close by (e.g., some of the data generators can take this role on behalf of subsets of other nodes nearby). Then, nodes exchange partial models and refine them accordingly. Our framework is general enough to host different analytics services. In the specific case analysed in the paper we focus on a learning task, considering two distributed learning algorithms. Using an activity recognition and a pattern recognition task, both on reference datasets, we compare the two learning algorithms between each other and with a central cloud solution (i.e., one that has access to the complete datasets). Our results show that using distributed machine learning techniques, it is possible to drastically reduce the network overhead, while obtaining performance comparable to the cloud solution in terms of learning accuracy. The analysis also shows when each distributed learning approach is preferable, based on the specific distribution of the data on the nodes.
2017
Istituto di informatica e telematica - IIT
Big data
Communications efficiency
Distributed learning
Iot
Smart cities
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/339537
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact