Model for automatic detection of lexical-syntactic errors in texts written in Spanish

  • María D. Bustamante-Rodríguez Instituto Tecnológico Metropolitano
  • Alberto A. Piedrahita-Ospina Instituto Tecnológico Metropolitano
  • Iliana M. Ramírez-Velásquez Instituto Tecnológico Metropolitano
Keywords: Computational linguistics, text analysis, natural language processing, artificial intelligence, syntax


Evaluating written texts is a task that mainly considers two aspects: syntactics and semantics. The first one focuses on the form of the text, and the second one, on its meaning. Conducting this task manually implies an effort in time and resources that can be reduced if part of the process is carried out automatically. According to the reviewed literature, there are different techniques for automatically correcting texts. One of them is the linguistic approach, which focuses on syntactic, semantic, and pragmatic elements. Likewise, this ongoing research is concerned with the automatic evaluation of syntactic errors in texts written in Spanish as a starting point to ensure coherence and cohesion in text composition, which may be useful in the academic environment. In order to carry out this study, a set of texts by students enrolled in an academic program was collected and analyzed by applying natural language processing and machine learning techniques. Additionally, the content of the corpus was manually corrected to compare the results of both methods, and correspondence was established between them. For this reason, it was concluded that the automatic method supports the syntactic correction process of a text written in Spanish.


Download data is not yet available.

Author Biographies

María D. Bustamante-Rodríguez, Instituto Tecnológico Metropolitano

Magíster en Educación, Facultad de Ciencias Exactas y Aplicadas

Alberto A. Piedrahita-Ospina, Instituto Tecnológico Metropolitano

Magíster en Ingeniería de Sistemas, Facultad de Ciencias Exactas y Aplicadas

Iliana M. Ramírez-Velásquez, Instituto Tecnológico Metropolitano

Magíster en Automatización y Control Industrial, Facultad de Ciencias Exactas y Aplicadas


[1] J. Gómez-Guinovart, “Fundamentos de lingüística computacional: bases teóricas, líneas de investigación y aplicaciones,” Bibliodoc Anu. Bibl. Doc. e Inf., pp. 135–146, 1998.
[2] S. Russell and P. Norvig, Artificial intelligence: a modern approach. Prentice Hall, 1995.
[3] J. Corredor-Tapias and L. F. Nieto-Ruiz, “Un vistazo a los pilares de la lingüística moderna: Saussure, Chomsky y Van Dijk. Del estructuralismo a la lingüística textual,” Cuad. Lingüística Hispánica, no. 9, pp. 83–96, 2007.
[4] G. Sidorov, Construcción no lineal de n-gramas en la lingüística computacional. Sociedad Mexicana de Inteligencia Artificial, 2013.
[5] T. A. Van Dijk, “Texto y Contexto. Semántica y pragmática del discurso,” Estud. Linguística Apl., no. 2, pp. 131–133, 1982.
[6] J. Allen, Natural language understanding, 2nd ed. Benjamin/Cummings Publishing Company, 1995.
[7] A. Moreno-Sandoval, Lingüística computacional. Madrid, España: Editorial Síntesis, 1998.
[8] J. Posadas-Durán et al., “Syntactic n-grams as features for the author profiling task,” Work. Notes Pap. CLEF, p. 5, 2015.
[9] G. Sidorov, F. Velásquez, E. Stamatatos, A. Gelbukh, and L. Chanona-Hernández, “Syntactic N-grams as machine learning features for natural language processing,” Expert Syst. Appl., vol. 41, no. 3, pp. 853–860, Feb. 2014.
[10] C. González-Gallardo, J. Torres-Moreno, A. Montes-Rendón, and G. Sierra, “Perfilado de autor multilingüe en redes sociales a partir de n-gramas de caracteres y de etiquetas gramaticales,” Linguamática, vol. 8, no. 1, pp. 21–29, 2016.
[11] J. Castillo et al., “Desarrollo de sistemas de análisis de texto,” in XIX Workshop de Investigadores en Ciencias de la Computación, 2017, pp. 58–62.
[12] G. Parodi, “Lingüística de corpus: una introducción al ámbito,” RLA. Rev. lingüística teórica y Apl., vol. 46, no. 1, pp. 93–119, 2008.
[13] E. A. P. Del Castillo, J. A. A. Valencia, and A. Pomares Quimbaya, “Constructor automático de modelos de dominios sin corpus preexistente,” Soc. Española para el Proces. del Leng. Nat., vol. 59, pp. 129–132, 2017.
[14] E. Pitler, A. Louis, and A. Nenkova, “Automatic evaluation of linguistic quality in multi-document summarization,” in Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2010, pp. 544–554.
[15] W. Koza, “Marcadores discursivos del español. Descripción y propuesta de detección automática,” Rev. Epistemol. y Ciencias Humanas, vol. 2, pp. 109–120, 2009.
[16] M. Pinto-Cruces, “Modelo de detección automática de ironía en textos en español,” Universidad del Bío-Bío, 2017.
[17] Real Academia Española, Nueva gramática de la lengua española manual, 1st ed. Espasa, 2010.
[18] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a cyclic dependency network,” in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL ’03, 2003, vol. 1, pp. 173–180.
[19] K. Toutanova and C. D. Manning, “Enriching the knowledge sources used in a maximum entropy part-of-speech tagger,” in Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -, 2000, vol. 13, pp. 63–70.
[20] G. Leech and A. Wilson, EAGLES Recommendations for the Morphosyntactic Annotation of Corpora. EAGLES, 1996.
How to Cite
Bustamante-Rodríguez, M., Piedrahita-Ospina, A., & Ramírez-Velásquez, I. (2018, May 14). Model for automatic detection of lexical-syntactic errors in texts written in Spanish. TecnoLógicas, 21(42), 199-209.
Research Papers