This paper addresses the nontrivial task of Twitter financial disambiguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to generate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).
Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
Zola P;
2020
Abstract
This paper addresses the nontrivial task of Twitter financial disambiguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to generate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).File | Dimensione | Formato | |
---|---|---|---|
prod_437415-doc_156730.pdf
accesso aperto
Descrizione: Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
761.34 kB
Formato
Adobe PDF
|
761.34 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.