Comparison of Text Summarization Algorithms for Processing Editorials and News in Spanish

Keywords: Natural language processing, Recall-Oriented Understudy for Gisting Evaluation, Text Analysis, Text Mining, Automatic Summarization

Abstract

Language is affected not only by grammatical rules but also by the context and socio-cultural differences. Therefore, automatic text summarization, an area of interest in natural language processing (NLP), faces challenges such as identifying essential fragments according to the context and establishing the type of text under analysis. Previous literature has described several automatic summarization methods; however, no studies so far have examined their effectiveness in specific contexts and Spanish texts. In this paper, we compare three automatic summarization algorithms using news articles and editorials in Spanish. The three algorithms are extractive methods that estimate the importance of a phrase or word based on similarity or word frequency metrics. A document database was built with 33 editorials and 27 news articles, and three summaries of each text were manually extracted employing the three algorithms. The algorithms were quantitatively compared using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric. We analyzed the algorithms’ potential to identify the main components of a text. In the case of editorials, the automatic summary should include a problem and the author’s opinion. Regarding news articles, the summary should describe the temporal and spatial characteristics of an event. In terms of word reduction percentage and accuracy, the method based on the similarity matrix produced the best results and can achieve a 70 % reduction in both cases (i.e., news and editorials). However, semantics and context should be incorporated into the algorithms to improve their performance in terms of accuracy and sensitivity.

Author Biographies

Sebastián López-Trujillo, Instituto Tecnológico Metropolitano, Colombia

Instituto Tecnológico Metropolitano, Medellín-Colombia, Sebastianlopez249178@correo.itm.edu.co

María C. Torres-Madroñero*, Instituto Tecnológico Metropolitano, Colombia

Instituto Tecnológico Metropolitano, Medellín-Colombia, mariatorres@itm.edu.co

References

K. R. Chowdhary, “Natural language processing,” en Fundamentals of Artificial Intelligence, New Delhi: Springer, 2020, pp- 603-649. https://doi.org/10.1007/978-81-322-3972-7_19

A. Cortez Vásquez; H. Vega Huerta; J. Pariona Quispe; A. M. Huayna, “Procesamiento de lenguaje natural”, Revista de Investigación de Sistemas e Informática, vol. 6, no. 2, pp. 45-54, dic. 2009. https://revistasinvestigacion.unmsm.edu.pe/index.php/sistem/article/view/5923

A. Gelbukh, “Procesamiento de Lenguaje Natural y sus Aplicaciones”, Komputer Sapiens, vol. 1, pp. 6-11, jun. 2010. https://www.gelbukh.com/CV/Publications/2010/Procesamiento%20de%20lenguaje%20natural%20y%20sus%20aplicaciones.pdf

A. Rivera Arrizabalaga; S. Rivera Velasco, “Origen del lenguaje: un enfoque multidisciplinar”, Ludus Vitalis, vol. 17, no. 31, pp. 103-141, 2009. http://ludus-vitalis.org/ojs/index.php/ludus/article/view/277

V. Gupta; G. S. Lehal, “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, vol. 1, no. 1, pp. 60-76, Aug. 2009. http://learnpunjabi.org/pdf/gslehal-pap18.pdf

S. Naqeeb Khan; N. Mohd Nawi; M. Imrona; A. Shahzad; A. Ullah; A. Ur- Rahman, “Opinion Mining Summarization and Automation Process: A Survey”, International Journal on Advanced Science Engineering Information Technology, vol. 8, no. 5, pp. 1836-1844, 2018. http://dx.doi.org/10.18517/ijaseit.8.5.5002

C. Yew-Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”, In Text summarization branches out, Association for Computational Linguistics, pp. 74-81, 2004. https://www.aclweb.org/anthology/W04-1013.pdf

Z. Li; Z. Peng; S. Tang; C. Zhang; H. Ma, “Text Summarization Method Based on Double Attention Pointer Network”, IEEE Access, vol. 8, pp. 11279-11288, Jan. 2020. https://doi.org/10.1109/ACCESS.2020.2965575

M. González Boluda, “Estudio comparativo de traductores automáticos en línea: Systran, reverso y google”, Núcleo, vol. 22, no. 27, pp. 187-216, dic. 2010. http://ve.scielo.org/scielo.php?script=sci_arttext&pid=S0798-97842010000100008

A. Hernández Castañeda; R. A. García Hernández; Y. Ledeneva; C. E. Millán Hernández, “Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords”, IEEE Access, vol. 8, pp. 49896-49907, Mar. 2020. https://doi.org/10.1109/ACCESS.2020.2980226

S. Kumar Saha; D. Rao Ch., “Development of a practical system for computerized evaluation of descriptive answers of middle school level students.” Interactive Learning Environments, pp. 1-14, Ago. 2019. https://doi.org/10.1080/10494820.2019.1651743

J. Rose; C. Lennerholt, “Low-cost text mining as a strategy for qualitative researchers”, Electronic Journal of Business Research Methods, vol. 15, no. 1, pp. 2-16, Apr. 2017. https://www.researchgate.net/publication/315702194_Low_cost_text_mining_as_a_strategy_for_qualitative_researchers

G. A. Matias Mendoza; Y. Ledeneva; R. A García Hernández, “Detección de ideas principales y composición de resúmenes en inglés, español, portugués y ruso. 60 años de investigación”, Alfaomega Grupo Editor, S.A. 2020. https://www.semanticscholar.org/paper/Detecci%C3%B3n-de-ideas-principales-y-composici%C3%B3n-de-en-Mendoza-Ledeneva/4ae110ed12c30b76a869206092b097605ffc4f56

M. D. Bustamante-Rodríguez; A. A. Piedrahita-Ospina; I. M. Ramírez-Velásquez, “Modelo para detección automática de errores léxico-sintácticos en textos escritos en español”, TecnoLógicas, vol. 21, no. 42, pp. 199-209, May. 2018. https://doi.org/10.22430/22565337.788

R. Elbarougy; G. Behery; A. El Khatib, “Extractive Arabic Text Summarization Using Modified PageRank Algorithm”, Egyptian Informatics Journal, vol. 21, no. 2, pp. 73-81, Jul. 2020. https://doi.org/10.1016/j.eij.2019.11.001

R. Chandra Belwal; S. Rai; A. Gupta. “A new graph-based extractive text summarization using keywords or topic modeling.” Journal of Ambient Intelligence and Humanized Computing, pp. 1-16, Oct. 2020. https://doi.org/10.1007/s12652-020-02591-x

J. Steinberger; K. Ježek, “Evaluation measures for text summarization”, Computing and Informatics, vol. 28, no. 2, pp. 251–275. Mar. 2009. https://cai.type.sk/content/2009/2/evaluation-measures-for-text-summarization/1726.pdf

H. Christian; M. Pramodana Agus; D. Suhartono, “Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)”, ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, Dic. 2016. https://doi.org/10.21512/comtech.v7i4.3746

I. Manterola; A. Diaz de Ilarraza; K. Gojenola; K. Sarasola, “Recursos en euskera para la herramienta NLTK para enseñanza de procesamiento del lenguaje natural.” Procesamiento del Lenguaje Natural, no. 45, pp. 305-306, Sep. 2010. https://www.redalyc.org/pdf/5157/515751745045.pdf

How to Cite
[1]
S. López-Trujillo and M. C. Torres-Madroñero, “Comparison of Text Summarization Algorithms for Processing Editorials and News in Spanish”, TecnoL., vol. 24, no. 51, p. e1816, Jun. 2021.

Downloads

Download data is not yet available.
Published
2021-06-11
Section
Research Papers
Crossref Cited-by logo

More on this topic