The manual annotation of tweets is an essential task for the training of new algorithms based on those data. Unfortunately, conventional and consolidated automatic annotation systems that already exists are based and trained on documents where text is longer than 140 characters (the maximum length of a tweet) and the grammar is correct. Besides the length, tweets must be treated differently because they have peculiar characteristics, such as the presence of links, abbreviations, hashtags and mentions. To face the challenge of the linguistic annotation of tweets, we have created #tweeTag, a web­tool designed for a specific approach on tweets. The tool is able to annotate tweets in three different ways: (i) global annotation to bring out macro characteristics of the tweet, given context (e.g., if it is related to a certain phenomenon, if it is positive or negative etc.)? (ii) textual annotation to annotate the text of a tweet in order to identify information about the topic or content (e.g., whether describes various types of damage to property and/or people, etc)? (iii) timeline annotation to evaluate the credibility of a specific user through the analysis of its timeline. 1 #tweeTag has been developed in order to make it possible to run multiple annotation campaigns with different purposes in order to meet every need. Results show a usable and effective tool, with great potential for the near future. As this is a web application, #tweeTag could also be implemented as a crowdsourcing system.

#tweeTag: a web-based annotation tool for Twitter data

Cresci S;La Polla M N;Tesconi M
2016

Abstract

The manual annotation of tweets is an essential task for the training of new algorithms based on those data. Unfortunately, conventional and consolidated automatic annotation systems that already exists are based and trained on documents where text is longer than 140 characters (the maximum length of a tweet) and the grammar is correct. Besides the length, tweets must be treated differently because they have peculiar characteristics, such as the presence of links, abbreviations, hashtags and mentions. To face the challenge of the linguistic annotation of tweets, we have created #tweeTag, a web­tool designed for a specific approach on tweets. The tool is able to annotate tweets in three different ways: (i) global annotation to bring out macro characteristics of the tweet, given context (e.g., if it is related to a certain phenomenon, if it is positive or negative etc.)? (ii) textual annotation to annotate the text of a tweet in order to identify information about the topic or content (e.g., whether describes various types of damage to property and/or people, etc)? (iii) timeline annotation to evaluate the credibility of a specific user through the analysis of its timeline. 1 #tweeTag has been developed in order to make it possible to run multiple annotation campaigns with different purposes in order to meet every need. Results show a usable and effective tool, with great potential for the near future. As this is a web application, #tweeTag could also be implemented as a crowdsourcing system.
2016
Istituto di informatica e telematica - IIT
Social Media Analysis
Twitter annotation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/324689
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact