Visualization and Multiclass Classification of Complaints to Official Organisms on Twitter

Keywords: Text Mining, Multiclass Classification, Social Networks, Twitter

Abstract

Social networks generate massive amounts of information. Current Natural Language techniques allow the automatic processing of that information, and Data Mining enables the automatic extraction of useful info. However, a state-of-the-art review reveals that many classification methods only distinguish two classes. This paper presents a procedure to automatically classify tweets into several classes (more than two). The steps of the procedure are described in detail so that any researcher can follow them. The accuracy and coverage (instead of only coverage as usual in the literature) of two automatic classifiers (SVM and Random Forests) were analyzed in a comparative study. The procedure was applied to automatically identify more than two types of complaint from 190,000 tweets. According to the results, Random Forests should be used because they achieve an average accuracy of 81.46 % and an average coverage of 59.88 %.

Author Biographies

Beatriz Hernández-Pajares, Centro de Inteligencia Artificial, Wavespace, España

MSc. en Ingeniería de Computación, Centro de Inteligencia Artificial, Wavespace, Madrid-España, beatriz.hernandezpajares@ey.es

Diana Pérez-Marín*, Universidad Rey Juan Carlos, España

PhD. en Ingeniería de Computación, Departamento de Ingeniería de Sistemas, Universidad Rey Juan Carlos, Madrid-España, diana.perez@urjc.es

Vanessa Frías-Martínez, Universidad de Maryland, Estados Unidos

PhD. en Ingeniería de Computación, Facultad de Estudios de Información y UMIACS, Universidad de Maryland, College Park-Estados Unidos, vfrias@umd.edu

References

S. Galeano, “Cuáles son las redes sociales con más usuarios del mundo (2019),” M4rketing Ecommerce, 2019. Disponible en: https://marketing4ecommerce.net/cuales-redes-sociales-con-mas-usuarios-mundo-2019-top/, [Accedido: 27-Jan-2020].

K. Smith, “44 estadísticas de Twitter,” Brandwatch, 2016. Disponible en: URL [Accedido: 27-Jan-2020].

C. D. Manning y H. Schiitze, Foundations of Statistical Natural Language Processing: Massachusetts Institute of Technology: MIT Press. Cambridge, 1999. Disponible en: https://www.cs.vassar.edu/~cs366/docs/Manning_Schuetze_StatisticalNLP.pdf

M. Vallez y R. Pedraza-Jimenez, “El Procesamiento del Lenguaje Natural en la Recuperación de Información Textual y áreas afines,” Hipertext.net, vol. 5, 2007. Disponible en: https://www.raco.cat/index.php/Hipertext/article/view/59496

tf-idf, “What does tf-idf mean?”. Disponible en: http://www.tfidf.com/. [Accedido: 27-Jan-2020].

C. C. Aggarwa y C. Zhai, Mining Text Data: Boston, MA: Springer US, 2012. https://doi.org/10.1007/978-1-4614-3223-4

Z. Malkani y E. Gillie, “Supervised Multi-Class Classification of Tweets,” pp. 1–6, Dec. 2012. Disponible en: https://pdfs.semanticscholar.org/bc78/1a147a3fe8477ade06ccf22a3aabe12236ea.pdf

Twitter, “What The Trend,” 2009. Disponible en: https://twitter.com/whatthetrend

K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal, y A. Choudhary, “Twitter Trending Topic Classification,” en 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver 2011. pp. 251–258. https://doi.org/10.1109/ICDMW.2011.171

Y. Zhu, X. Shen, y W. Pan, “Network-based support vector machine for classification of microarray samples,” BMC Bioinformatics, vol. 10, no S21, Jan. 2009. https://doi.org/10.1186/1471-2105-10-S1-S21

J. Ramos, “Using tf-idf to determine word relevance in document queries,” en Proceedings of the first instructional conference on machine learning, Piscataway, 2003, pp. 133–142. Disponible en: https://0bc297c6-a-62cb3a1a-s-sites.googlegroups.com/site/caonmsu/ir/UsingTFIDFtoDetermineWordRelevanceinDocumentQueries.pdf?attachauth=ANoY7cqkto1wDdp6Jn46PedfG7tGhGuYmcCduJwLGMhNpvI-5c7t18UboKTmHi_pT-azS_yYTWmZIytOQSEh56v29LLcG8vrrTwNbjXg0c49O-oE2ZpJail3QOfHci1bk-m4oDISHj2AZ9IdBIB3s5Vklxd06ZGZbf-tg3HMDWG3WVoyAEAOR7CU6UQuvJdm1rye6v1KH4fEF29zCvfMigps7R31YDkTepj8GZWeuOUX77R_nUX4E32OeQklG26umoedBM08ee-HmZIm0RNzHg76DslSGl-eiA%3D%3D&attredirects=0

I. Rish, “An empirical study of the naive Bayes classifier,” en IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, pp. 41–46. Disponible en: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.330.2788

E. Anguiano-Hernández, Naive Bayes Multinomial para clasificación de texto usando un esquema de pesado por clases, pp.1-8, Apr. 2009. Disponible en: http://ccc.inaoep.mx/~esucar/Clases-mgp/Proyectos/MGP_RepProy_Abr_29.pdf

N. Cristianini y J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge: University Press, 2000. https://doi.org/10.1017/CBO9780511801389

RuleQuest Research “About us,” 2018. Disponible en: https://rulequest.com/about-us.html. [Accedido: 21-Sep-2019].

B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, y M. Demirbas, “Short text classification in twitter to improve information filtering,” en Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’10, Geneva, 2010, pp. 841–842. https://doi.org/10.1145/1835449.1835643

J. Nazura y B. L. Muralidhara, “Semantic classification of tweets: A contextual knowledge based approach for tweet classification,” en 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, 2017, pp.1-6. https://doi.org/10.1109/IISA.2017.8316358

P. Selvaperumal y A. Suruliandi, “A short message classification algorithm for tweet classification,” en 2014 International Conference on Recent Trends in Information Technology, Chennai, 2014. pp. 1–3. https://doi.org/10.1109/ICRTIT.2014.6996189

R. C. Balabantaray, M. Mohammad, y N. Sharma, “Multi-Class Twitter Emotion Classification: A New Approach,” Int. J. Appl. Inf. Syst., vol. 4, no. 1, pp. 48–53, Sep. 2012. https://doi.org/10.5120/ijais12-450651

E. D’Andrea, P. Ducange, A. Bechini, A. Renda, y F. Marcelloni, “Monitoring the public opinion about the vaccination topic from tweets analysis,” Expert Syst. Appl., vol. 116, pp. 209–226, Feb. 2019. https://doi.org/10.1016/j.eswa.2018.09.009

M. Habdank, N. Rodehutskors, y R. Koch, “Relevancy assessment of tweets using supervised learning techniques: Mining emergency related tweets for automated relevancy classification,” en 2017 4th International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), Münster, 2017, pp. 1–8. https://doi.org/10.1109/ICT-DM.2017.8275670

J. F. Franco-Bermúdez y W. L. Ruiz-Castañeda, “Análisis de redes sociales para un sistema de innovación generado a partir de un modelo de simulación basado en agentes,” TecnoLógicas, vol. 22, no. 44, pp. 21–44, Jan. 2019. https://doi.org/10.22430/22565337.1183

R. S. Ghaly, E. Elabd, y M. A. Mostafa, “Tweets classification, hashtags suggestion and tweets linking in social semantic web,” en 2016 SAI Computing Conference (SAI), London, 2016. pp. 1140–1146. https://doi.org/10.1109/SAI.2016.7556121

E. Yar, I. Delibalta, L. Baruh, y S. S. Kozat, “Online text classification for real life tweet analysis,” en 2016 24th Signal Processing and Communication Application Conference (SIU), Zonguldak, 2016. pp. 1609–1612. https://doi.org/10.1109/SIU.2016.7496063

J. M. Rodriguez, D. Godoy, C. Mateos, y A. Zunino, “A multi-core computing approach for large-scale multi-label classification,” Intell. Data Anal., vol. 21, no. 2, pp. 329–352, Mar. 2017. https://doi.org/10.3233/IDA-150375

Twitter4J.org, “Overview”. Disponible en: http://twitter4j.org/javadoc/index.html

R. Longadge, S. Dongre y L. Malik, “Class Imbalance Problem in Data Mining Review,” Int. J. Comput. Sci. Netw., vol. 2, no. 1, pp. 83–87, May, 2013. Disponible en: http://journaldatabase.info/articles/class_imbalance_problem_data_mining.html

B. Hernández-Pajares, “Clasificación Automática Multiclase de Tweets y su Representación Gráfica,”(Tesis de Maestría), Facultad de ingeniería, Madrid, Universidad Rey Juan Carlos, 2013. Disponible en: https://eciencia.urjc.es/handle/10115/11914

B. Hernández-Pajares, D. Pérez-Marín y V. Frías-Martínez, “TFM_code”, 2013. Disponible en: https://tinyurl.com/y4mnwotv.

How to Cite
[1]
B. Hernández-Pajares, D. Pérez-Marín, and V. Frías-Martínez, “Visualization and Multiclass Classification of Complaints to Official Organisms on Twitter”, TecnoL., vol. 23, no. 47, pp. 109–120, Jan. 2020.

Downloads

Download data is not yet available.
Published
2020-01-30
Section
Research Papers

Altmetric

Crossref Cited-by logo