Correlation Between Speech-Related Feature Spaces and Clinical Voice Disorders in Patients with Dysphagia

Keywords: Dysphagia, Speech analysis, Voice analysis, Biosignal processing, Feature extraction, Statistical analysis

Abstract

Dysphagia is defined as the difficulty to transport an alimentary bolus from the oral cavity to the stomach in a safe and effective way. Currently, dysphagia-related diagnosis methods are invasive and highly dependent on the examiner’s experience. Biosignal-based studies, such as those on voice and speech records, have been proposed to develop complementary diagnostic tools. Likewise, this study explores, in features extracted from voice and speech signals, the capacity to discriminate between healthy subjects and patients with swallowing disorders. For this purpose, the signals were recorded in a group of 30 healthy individuals and 45 dysphagic patients. The participants performed different voice tasks (sustained vowels) and speech tasks (text reading, monologue, and diadochokinetic exercises). The patient records were assigned labels of three clinical conditions: wet voice, dysphonic voice, and voice with undetermined alteration. Classical voice- and speech-related feature spaces were assessed using statistical tests, and it was found that features related to phonation, prosody, and diadochokinesia have potential as biomarkers for the discrimination of different alterations in patients with dysphagia. This is a preliminary study based on voice and speech signals for a non-invasive and objective diagnosis of dysphagia.

Author Biographies

Andrés Felipe Flórez-Gómez, Instituto Tecnológico Metropolitano, Colombia

Instituto Tecnológico Metropolitano, Medellín-Colombia, andresflorez223360@correo.itm.edu.co

Juan Rafael Orozco-Arroyave, Universidad de Antioquia, Colombia

Universidad de Antioquia, Medellín-Colombia, Friedrich-Alexander-Universität, Erlangen Nürnberg-Alemania, rafael.orozco@udea.edu.co

Sebastián Roldán-Vasco*, Instituto Tecnológico Metropolitano, Colombia

Instituto Tecnológico Metropolitano, Universidad de Antioquia, Medellín-Colombia, sebastianroldan@itm.edu.co

References

L. Sura; A. Madhavan; G. Carnaby; M. Crary, “Dysphagia in the elderly: management and nutritional considerations”, Clin. Interv. Aging, vol. 2012, no. 7, pp. 287-298, Jul. 2012. https://doi.org/10.2147/CIA.S23404

D. C. Wolf, “Dysphagia”, en Clinical Methods: The History, Physical, and Laboratory Examinations, 3a ed., Eds. Boston: Butterworths, 1990. https://www.ncbi.nlm.nih.gov/books/NBK408/

A. Farri; A. Accornero; C. Burdese, “Social importance of dysphagia: its impact on diagnosis and therapy”, Acta Otorhinolaryngol Ital, vol. 27, no. 2, pp. 83–6, Abr. 2007. http://www.ncbi.nlm.nih.gov/pubmed/17608136

O. Ortega; A. Martín; P. Clavé, “Diagnosis and Management of Oropharyngeal Dysphagia Among Older Persons, State of the Art”, J. Am. Med. Dir. Assoc., vol. 18, no. 7, pp. 576–582, Jul. 2017. https://doi.org/10.1016/j.jamda.2017.02.015

Ministerio de Salud y Protección Social Oficina de Promoción Social, “Sala situacional de la Población Adulta Mayor”, Minist. Salud y Protección Soc., pp. 1-8, 2018. https://www.minsalud.gov.co/sites/rid/Lists/BibliotecaDigital/RIDE/DE/PS/sala-situacion-envejecimiento-2018.pdf

S. E. Langmore, “Evaluation of oropharyngeal dysphagia: which diagnostic tool is superior?”, Curr. Opin. Otolaryngol. Head Neck Surg., vol. 11, no. 6, pp. 485–489, Dic. 2003. http://journals.lww.com/00020840-200312000-00014

T. Warnecke et al., “The Safety of Fiberoptic Endoscopic Evaluation of Swallowing in Acute Stroke Patients”, Stroke, vol. 40, no. 2, pp. 482–486, Feb. 2009. https://doi.org/10.1161/STROKEAHA.108.520775

S. Restrepo-Agudelo; S. Roldan-Vasco; L. Ramirez-Arbelaez; S. Cadavid-Arboleda; E. Perez-Giraldo; A. Orozco-Duque, “Improving surface EMG burst detection in infrahyoid muscles during swallowing using digital filters and discrete wavelet analysis”, J. Electromyogr. Kinesiol., vol. 35, pp. 1–8, Aug. 2017. https://doi.org/10.1016/j.jelekin.2017.05.001

C. M. Steele et al., “Development of a Non-invasive Device for Swallow Screening in Patients at Risk of Oropharyngeal Dysphagia: Results from a Prospective Exploratory Study”, Dysphagia, vol. 34, no. 5, pp. 698–707, Oct. 2019. https://doi.org/10.1007/s00455-018-09974-5

D. H. McFarland; P. Tremblay, “Clinical implications of cross-system interactions”, Semin. Speech Lang., vol. 27, no. 4, pp. 300–310, 2006. https://doi.org/10.1055/s-2006-955119

D. Farneti, “Voice and Dysphagia”, en Dysphagia: Diagnosis and Treatment, O. Ekberg, Ed. Cham: Springer International Publishing, 2017, pp. 257–274. https://doi.org/10.1007/174_2017_110

A. E. Aronson, Clinical voice disorders. Thieme Inc., 1990.

T. Warms; J. Richards, “``Wet Voice’’ as a Predictor of Penetration and Aspiration in Oropharyngeal Dysphagia”, Dysphagia, vol. 15, no. 2, pp. 84–88, Mar. 2000. https://doi.org/10.1007/s004550010005

S. Murugappan; S. Boyce; S. Khosla; L. Kelchner; E. Gutmark, “Acoustic characteristics of phonation in ‘wet voice’ conditions”, J. Acoust. Soc. Am., vol. 127, no. 4, pp. 2578–2589, Abr. 2010. https://doi.org/10.1121/1.3308478

M. E. Dajer; P. R. Scalassara; J. L. Marrara; J. C. Pereira, “Voice analysis of patients with neurological disorders using acoustical and nonlinear tools”, IEEE Int. Work. Mach. Learn. Signal Process. MLSP, 2012. http://dx.doi.org/10.1109/mlsp.2012.6349803

K. López-De-Ipiña et al., “Advances in a multimodal approach for dysphagia analysis based on automatic voice analysis”, en Smart Innovation, Systems and Technologies, 2016, vol. 54, pp. 201–211. https://doi.org/10.1007/978-3-319-33747-0_20

J. S. Ryu; S. R. Park; K. H. Choi, “Prediction of laryngeal aspiration using voice analysis”, Am. J. Phys. Med. Rehabil., vol. 83, no. 10, pp. 753–757, Oct. 2004. http://dx.doi.org/10.1097/01.PHM.0000140798.97706.A5

K. W. Dos Santos; B. Scheeren; A. C. Maciel; M. Cassol, “Vocal variability post swallowing in individuals with and without oropharyngeal dysphagia”, Int. Arch. Otorhinolaryngol., vol. 19, no. 1, pp. 61–66, 2015. https://doi.org/10.1055/s-0034-1394129

J. R. Orozco-Arroyave et al., “NeuroSpeech: An open-source software for Parkinson’s speech analysis”, Digit. Signal Process. A Rev. J., vol. 77, pp. 207–221, Jun. 2018. https://doi.org/10.1016/j.dsp.2017.07.004

J. R. Orozco-Arroyave; J. D. Arias-Londoño; J. F. Vargas-Bonilla; M. C. González-Rátiva; E. Nöth, “New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease”, Proc. 9th Int. Conf. Lang. Resour. Eval. Lr. 2014, pp. 342–347, 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/7_Paper.pdf

Y. Jadoul; B. Thompson; B. de Boer, “Introducing Parselmouth: A Python interface to Praat”, J. Phon., vol. 71, pp. 1–15, Nov. 2018. https://doi.org/10.1016/j.wocn.2018.07.001

P. Boersma; D. Weenink, “Praat: doing phonetics by computer [Computer program]”. 2001, [En línea]. Disponible en: http://www.praat.org/

J. C. Catford; J. H. Esling, “Phonetics, Articulatory”, en Encyclopedia of Language & Linguistics, Elsevier, 2006, pp. 425–442. https://doi.org/10.1016/B0-08-044854-2/00002-X

F. R. Bach; M. I. Jordan, “Discriminative Training of Hidden Markov Models for Multiple Pitch Tracking [speech processing examples]”, en Proceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, 2005, vol. 5, pp. 489–492. http://doi.org/10.1109/ICASSP.2005.1416347

P. Boersma, “Acurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound”, IFA Proc. 17, pp. 97–110, 1993. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.218.4956&rep=rep1&type=pdf

S. Strömbergsson, “Today’s most frequently used F0 estimation methods, and their accuracy in estimating male and female pitch in clean speech”, Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, pp. 525–529, Sep. 2016. http://dx.doi.org/10.21437/Interspeech.2016-240

S. Basu; J. Chakraborty; M. Aftabuddin, “Emotion Recognition from Speech using Convolutional Neural Network with Recurrent Neural Network Architecture”, en 2nd International Conference on Communication and Electronics Systems (ICCES), Coimbatore 2017, pp. 333–336. https://doi.org/10.1109/CESYS.2017.8321292

A. Shukla; R. Tiwari; R. Kala, “Speech Signal Analysis”, en Studies in Computational Intelligence, vol. 307, Springer, Berlin, Heidelberg, 2010, pp. 111–128. https://doi.org/10.1007/978-3-642-14344-1_5

S. Skodda; W. Visser; U. Schlegel, “Vowel articulation in parkinson’s disease”, J. Voice, vol. 25, no. 4, pp. 467–472, Jul. 2011. https://doi.org/10.1016/j.jvoice.2010.01.009

G. Fant, Acoustic theory of speech production. The Hague: Mouton, 1960.

K. N. Stevens; A. S. House, “Development of a Quantitative Description of Vowel Articulation”, J. Acoust. Soc. Am., vol. 27, no. 3, pp. 484–493, May. 1955. https://doi.org/10.1121/1.1907943

M. Blomgren; M. Robb; Y. Chen, “A note on vowel centralization in stuttering and nonstuttering individuals”, J. Speech, Lang. Hear. Res., vol. 41, no. 5, pp. 1042–1051, Oct. 1998. https://doi.org/10.1044/jslhr.4105.1042

M. Guzmán, “Acústica Del Tracto Vocal”, 2010. https://www.logopediapsicologia.com/wp-content/uploads/acustica-del-tracto-vocal.pdf

S. Davis; P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoust., vol. 28, no. 4, pp. 357–366, Ago. 1980. https://doi.org/10.1109/TASSP.1980.1163420

L. Moro-Velázquez; J. A. Gómez-García; J. I. Godino-Llorente; J. Villalba; J. R. Orozco-Arroyave; N. Dehak, “Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson’s Disease”, Appl. Soft Comput., vol. 62, pp. 649–666, Jan. 2018. https://doi.org/10.1016/j.asoc.2017.11.001

F. O. López-pabón; T. Arias-vergara; J. R. Orozco-Arroyave, “Cepstral Analysis and Hilbert- Huang Transform for Automatic Detection of Parkinson ’ s Disease”, TecnoLógicas, vol. 23, no. 47, pp. 93–108, Jan. 2020. https://doi.org/10.22430/22565337.1401

B. B. Monson; E. J. Hunter; A. J. Lotto; B. H. Story, “The perceptual significance of high-frequency energy in the human voice”, Front. Psychol., vol. 5, no. 587, pp. 1–11, Jun. 2014. https://doi.org/10.3389/fpsyg.2014.00587

E. Zwicker, “Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen)”, J. Acoust. Soc. Am., vol. 33, no. 2, pp. 248, feb. 1961. https://doi.org/10.1121/1.1908630

E. Zwicker; E. Terhardt, “Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency”, J. Acoust. Soc. Am., vol. 68, no. 5, pp. 1523–1525, Aug. 1998. https://doi.org/10.1121/1.385079

J. R. Orozco-Arroyave et al., “Automatic detection of Parkinson’s disease in running speech spoken in three different languages”, J. Acoust. Soc. Am., vol. 139, no. 1, pp. 481-500, Jan. 2016. https://doi.org/10.1121/1.4939739

P. Maragos; T. F. Quatieri; J. F. Kaiser, “Speech nonlinearities, modulations, and energy operators”, in [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, Toronto, 1991. https://doi.org/10.1109/ICASSP.1991.150366

R. B. Randall; W. A. Smith, “Application of the Teager Kaiser Energy Operator to Machine Diagnostics”, en Conference: Tenth DST Group International Conference on Health and Usage Monitoring Systems (HUMS), Melbourne, 2017, pp. 26–28. https://www.researchgate.net/publication/316284738

M. Tatham; K. Morton, “Speech Production: Prosody”, en Speech Production and Perception, London: Palgrave Macmillan UK, 2006, pp. 121–163. https://doi.org/10.1057/9780230513969_5

S. Roldan-Vasco ; A. Orozco-Duque; J. C. Suarez-Escudero; J. R. Orozco-Arroyave , “Machine learning based analysis of speech dimensions in functional oropharyngeal dysphagia”, Comput. Methods Programs Biomed., vol. 208, p. 106248, Sep. 2021. https://doi.org/10.1016/j.cmpb.2021.106248

K. López-de-Ipiña et al., “Automatic voice analysis for dysphagia detection”, Speech, Lang. Hear., vol. 21, no. 2, pp. 86–89, 2018. https://doi.org/10.1080/2050571X.2017.1369017

J. R. Orozco-Arroyave; N. García; J. F. Vargas-Bonilla; E. Nöth, “Automatic Detection of Parkinson’s Disease from Compressed Speech Recordings”, en Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science, Springer, Cham, 2015, pp. 88–95. http://dx.doi.org/10.1007/978-3-319-24033-6_10

How to Cite
[1]
A. F. Flórez-Gómez, J. R. Orozco-Arroyave, and S. Roldán-Vasco, “Correlation Between Speech-Related Feature Spaces and Clinical Voice Disorders in Patients with Dysphagia”, TecnoL., vol. 25, no. 53, p. e2220, Apr. 2022.

Downloads

Download data is not yet available.
Published
2022-04-05
Section
Research Papers
Crossref Cited-by logo

More on this topic