Word-Embeddings and Grammar Features to Detect Language Disorders in Alzheimer’s Disease Patients

Keywords: Alzheimer's Disease, Natural Language Processing, Text Mining, Classification, Machine Learning

Abstract

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that affects the language production and thinking capabilities of patients. The integrity of the brain is destroyed over time by interruptions in the interactions between neuron cells and associated cells required for normal brain functioning. AD comprises deterioration of the communicative skills, which is reflected in deficient speech that usually contains no coherent information, low density of ideas, and poor grammar. Additionally, patients exhibit difficulties to find appropriate words to structure sentences. Multiple ongoing studies aim to detect the disease considering the deterioration of language production in AD patients. Natural Language Processing techniques are employed to detect patterns that can be used to recognize the language impairments of patients. This paper covers advances in pattern recognition with the use of word-embedding and word-frequency features and a new approach with grammar features. We processed transcripts of 98 AD patients and 98 healthy controls in the Pitt Corpus of the Dementia-Bank database. A total of 1200 word-embedding features, 1408 Term Frequency—Inverse Document Frequency features, and 8 grammar features were extracted from the selected transcripts. Three models are proposed based on the separate extraction of such feature sets, and a fourth model is based on an early fusion strategy of the proposed feature sets. All the models were optimized following a Leave-One-Out cross validation strategy. Accuracies of up to 81.7 % were achieved using the early fusion of the three feature sets. Furthermore, we found that, with a small set of grammar features, accuracy values of up to 72.8 % were obtained. The results show that such features are suitable to effectively classify AD patients and healthy controls.

Author Biographies

Juan S. Guerrero-Cristancho*, Universidad de Antioquia, Colombia

Estudiante de Ingeniería Electrónica, Grupo de investigación en Telecomunicaciones aplicadas (GITA), Facultad de Ingeniería, Universidad de Antioquia, Medellín-Colombia, jsebastian.guerrero@udea.edu.co

Juan C. Vásquez-Correa, Universidad de Erlangen, Erlangen, Germany

MSc. en Ingeniería de Telecomunicaciones, Grupo de investigación en Telecomunicaciones aplicadas (GITA), Facultad de Ingeniería, Universidad de Antioquia, Laboratorio de reconocimiento de patrones (LME), Universidad de Erlangen, Erlangen-Germany, jcamilo.vasquez@udea.edu.co

Juan R. Orozco-Arroyave , Universidad de Antioquia, Colombia

PhD. en Ciencias de la Computación, Grupo de investigación en Telecomunicaciones aplicadas (GITA), Facultad de Ingeniería, Universidad de Antioquia, Laboratorio de reconocimiento de patrones (LME), Universidad de Erlangen, Erlangen-Germany, rafael.orozco@udea.edu.co

References

S. R. Chandra, “Alzheimer’s disease: An alternative approach”, Indian J. Med. Res., vol. 145, no. 6, pp. 723 - 729, Jun. 2017. https://doi.org/10.4103/ijmr.IJMR_74_17

C. M. Henstridge, B. T. Hyman, and T. L. Spires-Jones, “Beyond the neuron–cellular interactions early in Alzheimer disease pathogenesis”, Nature Reviews Neuroscience, vol. 20, pp. 94-108, Jan. 2019. https://doi.org/10.1038/s41583-018-0113-1

F. J. Huff, J. T. Becker, S. H. Belle, R. D. Nebes, A. L. Holland, and F. Boller, “Cognitive deficits and clinical diagnosis of Alzheimer’s disease,” Neurology, vol. 37, no. 7, pp. 1119–1124, Jul. 1987. https://doi.org/10.1212/WNL.37.7.1119

J. A. Small, S. Kemper, and K. Lyons, “Sentence comprehension in Alzheimer’s disease: Effects of grammatical complexity, speech rate, and repetition,” Psychol and Aging, vol. 12, no. 1, pp. 3–11, Mar. 1997. https://doi.org/10.1037/0882-7974.12.1.3

M. Nicholas, L. K. Obler, M. L. Albert, and N. Helm-Estabrooks, “Empty Speech in Alzheimer’s Disease and Fluent Aphasia,” J. Speech, Lang. Hear. Res., vol. 28, no. 3, pp. 405 - 410, Sep. 1985. https://doi.org/10.1044/jshr.2803.405

B. E. Murdoch, H. J. Chenery, V. Wilks, and R. S. Boyle, “Language disorders in dementia of the Alzheimer type,” Brain and Language., vol. 31, no. 1, pp. 122 - 137, May. 1987. https://doi.org/10.1016/0093-934X(87)90064-2

D. A. Snowdon, S. J. Kemper, J. A. Mortimer, L. H. Greiner, D. R. Wekstein, and W. R. Markesbery., “Linguistic Ability in Early Life and Cognitive Function and Alzheimer's Disease in Late Life: Findings From the Nun Study”, JAMA clinical challenge, vol. 275, no. 7, pp. 528 - 532, Feb. 1996. https://doi.org/10.1001/jama.1996.03530310034029

A. Almor, D. Kempler, M. C. MacDonald, E. S. Andersen, and L. K. Tyler, “Why Do Alzheimer Patients Have Difficulty with Pronouns? Working Memory, Semantics, and Reference in Comprehension and Production in Alzheimer’s Disease”, Brain and Language., vol. 67, no. 3, pp. 202 - 227, May. 1999. https://doi.org/10.1006/brln.1999.2055

S. O. Orimaye, J. S.-M. Wong, K. J. Golden, C. P. Wong, and I. N. Soyiri, “Predicting probable Alzheimer’s disease using linguistic deficits and biomarkers”, BMC Bioinformatics, vol. 18, no. 34, Jan. 2017. https://doi.org/10.1186/s12859-016-1456-0

J. T. Becker, F. Boiler, O. L. Lopez, J. Saxton, and K. L. McGonigle, “The Natural History of Alzheimer’s Disease: Description of Study Cohort and Accuracy of Diagnosis”, Arch. Neurol., vol. 51, no. 6, pp. 585 - 594, Jun. 1994. https://doi.org/10.1001/archneur.1994.00540180063015

P. F. Brown, P. V DeSouza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai, “Class-based n-gram Models of Natural Language,” Computational. Linguists., vol. 18, no. 4, pp. 467–479, Dec. 1992. Available: https://www.aclweb.org/anthology/J92-4003/

B. Mirheidari, D. Blackburn, T. Walker, A. Venneri, M. Reuber, and H. Christensen, “Detecting Signs of Dementia Using Word Vector Representations,” in Interspeech, Hyderabad, 2018, pp. 1893 -1897. https://doi.org/10.21437/Interspeech.2018-1764

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” in 26th International Conference on Neural Information Processing Systems, Nevada, 2013, pp. 3111 - 3119. Available: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

P. Klumpp, J. Fritsch, and E. Nöth, “ANN-based Alzheimer’s disease classification from bag of words”, in Speech Communication; 13th ITG-Symposium, Oldenburg, 2018. pp. 1-4. Available: https://www.idiap.ch/~jfritsch/pdf/2018ITG.pdf

A. Budhkar and F. Rudzicz, “Augmenting word2vec with latent Dirichlet allocation within a clinical application”, in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, 2019, pp. 4095-4099. https://doi.org/10.18653/v1/N19-1414

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space”, in Proceedings of the International Conference on Learning Representations, Arizona, 2013, pp.1-12. Available: https://arxiv.org/pdf/1301.3781.pdf

G. Salton, and M. J. McGill, Introduction to Modern Information Retrieval, New York: McGraw-Hill, 1986. Available: https://dl.acm.org/citation.cfm?id=576628

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, vol. 3, pp. 993 - 1022, Jan. 2003. Availabe in: http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

P. Herd, D. Carr, and C. Roan, “Cohort Profile: Wisconsin longitudinal study (WLS),” International journal of epidemiology, vol. 43, no. 1, pp. 34 - 41, Feb. 2014. https://doi.org/10.1093/ije/dys194

A. Pistono, M. Jucla, C. Bézy, B. Lemesle, J. Le Men, and J. Pariente, “Discourse macrolinguistic impairment as a marker of linguistic and extralinguistic functions decline in early Alzheimer’s disease,” Int. J. Lang. Commun. Disord., vol. 54, no. 3, pp. 390 - 400, May. 2019. https://doi.org/10.1111/1460-6984.12444

R. Rehrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, 2010, pp. 45–50. Available: https://is.muni.cz/publication/884893/en

S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, United States: O’Reilly Media, Inc. 2009. Available: https://is.muni.cz/publication/884893/en

A. Almor et al., “A common mechanism in verb and noun naming deficits in Alzheimer’s patients,” Brain and Language, vol. 111, no. 1, pp. 8 -19, Oct. 2009. https://doi.org/10.1016/j.bandl.2009.07.009

M. Kim and C. K. Thompson, “Verb deficits in Alzheimer’s disease and agrammatism: Implications for lexical organization,” Brain and Languaje, vol. 88, no. 1, pp. 1-20, Jan. 2004. https://doi.org/10.1016/S0093-934X(03)00147-0

J. P. Kincaid, R. P. Fishburne, R. L. Rogers, and B. S. Chissom, “Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel,” Naval Technical Training Command Millington TN Research Branch Report, United States, IST technical report, 1975. Available: https://apps.dtic.mil/docs/citations/ADA006655

C. Roth, “Boston Diagnostic Aphasia Examination”, in Encyclopedia of Clinical Neuropsychology, 3st ed, New York: Springer New York, 2011. pp. 338 - 468. https://doi.org/10.1007/978-0-387-79948-3_868

F. Pedregosa, et al., “Scikit-learn: Machine Learning in Python”, Journal of Machine Learning Research, vol. 12, pp. 2825-2830. Oct. 2011. Available: http://www.jmlr.org/papers/v12/pedregosa11a

J. Devlin, M. W. Chang, K. Lee, K. Toutanova, BERT: “Pre-training of Deep Bidirectional Transformers for Language Understanding”, in Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, 2019, pp. 1-16. Available: https://arxiv.org/pdf/1810.04805.pdf

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” in Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, 2019, Available: https://arxiv.org/pdf/1906.08237.pdf

How to Cite
Guerrero-Cristancho, J. S., Vásquez-Correa, J. C., & Orozco-Arroyave , J. R. (2020). Word-Embeddings and Grammar Features to Detect Language Disorders in Alzheimer’s Disease Patients . TecnoLógicas, 23(47), 63-75. https://doi.org/10.22430/22565337.1387

Downloads

Download data is not yet available.
Published
2020-01-30
Section
Research Papers