Recently, user-generated content in social media opened up new alluring possibilities for understanding the geospatial aspects of many real-world phenomena. Yet, the vast majority of such content lacks explicit, structured geographic information. Here, we describe the design and implementation of a novel approach for associating geographic information to text documents. GSP exploits powerful machine learning algorithms on top of the rich, interconnected Linked Data in order to overcome limitations of previous state-of-the-art approaches. In detail, our technique performs semantic annotation to identify relevant tokens in the input document, traverses a sub-graph of Linked Data for extracting possible geographic information related to the identified tokens and optimizes its results by means of a Support Vector Machine classifier. We compare our results with those of 4 state-of-the-art techniques and baselines on ground-truth data from 2 evaluation datasets. Our GSP technique achieves excellent performances, with the best F 1 = 0.91, sensibly outperforming benchmark techniques that achieve F 1 <= 0.78.

GSP (Geo-Semantic-Parsing): Geoparsing and Geotagging with Machine Learning on top of Linked Data

S Cresci;L Nizzoli;M Tesconi
2018

Abstract

Recently, user-generated content in social media opened up new alluring possibilities for understanding the geospatial aspects of many real-world phenomena. Yet, the vast majority of such content lacks explicit, structured geographic information. Here, we describe the design and implementation of a novel approach for associating geographic information to text documents. GSP exploits powerful machine learning algorithms on top of the rich, interconnected Linked Data in order to overcome limitations of previous state-of-the-art approaches. In detail, our technique performs semantic annotation to identify relevant tokens in the input document, traverses a sub-graph of Linked Data for extracting possible geographic information related to the identified tokens and optimizes its results by means of a Support Vector Machine classifier. We compare our results with those of 4 state-of-the-art techniques and baselines on ground-truth data from 2 evaluation datasets. Our GSP technique achieves excellent performances, with the best F 1 = 0.91, sensibly outperforming benchmark techniques that achieve F 1 <= 0.78.
2018
Istituto di informatica e telematica - IIT
Data Mining
information retrieval
Social Media Analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/347877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact