Social media as seismic networks for the earthquake damage assessment

Meletti, C; Cresci, S; LA POLLA, MARIANTONIETTA NOEMI; Marchetti, A; Tesconi, M

The growing popularity of online platforms based on user-generated content is gradually creating a digital world that mirrors the physical world. In the paradigm of crowdsensing, the crowd becomes a distributed network of sensors that allows us to understand real life events at a quasi-real-time rate. The SoS - Social Sensing project (http://socialsensing.eu/) exploits the opportunistic crowdsensing, involving users in the sensing process in a minimal way, for social media emergency management purposes in order to obtain a very fast, but still reliable, detection of emergency dimension to face. First of all we designed and implemented a decision support system for the detection and the damage assessment of earthquakes. Our system exploits the messages shared in real-time on Twitter. For the detection phase, data mining and natural language processing techniques are firstly adopted to select meaningful and comprehensive sets of tweets. We then applied a burst detection algorithm in order to promptly identify outbreaking seismic events. By using georeferenced tweets and reported locality names, a rough epicentral determination is also possible. Results were compared to Italian INGV official reports and show that the system is able to detect, within seconds, events of a magnitude in the region of 3.5 with a precision of 75% and a recall of 81,82%. We then focused our attention on damage assessment phase. We investigated the possibility to exploit social media data to determine earthquake intensity. We designed a set of predictive linear models and evaluated their ability to map the intensity of worldwide earthquakes. The models build on a dataset of almost 5 million tweets exploited to compute our earthquake features, and more than 7,000 globally distributed earthquakes data, acquired in a semi-automatic way from USGS, serving as ground truth. We extracted 45 distinct features falling into four categories: profile, tweet, time and linguistic. We run diagnostic tests and simulations on generated models to assess their significance and avoid overfitting. Overall results show a correlation between the messages shared in social media and intensity estimations based on online survey data (CDI)

CNR Institutional Research Information System