Artificial intelligence methodologies to enhance visualisation and subsequent analysis of vocal fold dynamics

For clinical assessment of the voice production process, evaluation of the acoustic voice signal as well as the laryngeal dynamics (i.e., the vocal fold oscillations), producing the source signal in the larynx, is highly important.

Applying artificial intelligence (AI) methods in a clinical environment has become more and more popular during the last decade. Such AI methods are applied in a variety of clinical domains to enhance data visualisation, separate physiological from pathological data, predict risk factors for diseases and recurrences for cancer or quantify the severity of pathologies and disorders.

Naturally, during the last decade, AI methods as e.g. deep neural networks, recurrent neural networks, support vector machines, decision trees/stumps etc. have been suggested for the assessment of the voice production process to improve diagnostics and therapeutic approaches.

As for CT and MRI imaging techniques, laryngeal visualisation using endoscopic imaging has been significantly improved by automated AI based image processing techniques [1,2]. These AI based image processing methodologies allow now for a more reliable and much faster processing of vast data as produced by endoscopic high-speed-video (HSV) imaging [3]. Endoscopic HSV recordings are performed with at least 4000 fps and hence allow for a good visualisation of the vocal fold vibrations, typically being between 100 Hz – 400 Hz during phonation [4]. To objectively evaluate the vocal fold dynamics, the opening and closing glottis over time is segmented [5]. From there, important physiological vocal fold dynamical characteristics, such as entire vocal fold closure during each cycle, dynamical left-right symmetry and periodicity of vocal fold oscillations, can be computed [6]. To automate and hence further improve the quantification of the left-right symmetry of the vocal fold oscillations, additional AI algorithms were introduced [7]. To ensure clinical applicability of these new techniques a long-term study was performed in an actual clinical environment demonstrating their applicability and usefulness [8]. The final AI algorithms were then incorporated into a state-of-the-art hardware and can now be employed by other scientists and clinical researchers [9].

Having now established reliable data pre-processing as the prerequisite for subsequent data analysis, the next step is to apply this data to actually improve diagnostic and therapeutic processes. This can be done in the traditional way, where parameters computed on the data were just statistically evaluated and interpreted [10]. The other way would be, to again consult AI methodologies and use these newly provided techniques to better understand voice production physiology [11] and biomechanics [12]. However, the clinical desire is not to just separate between voice pathologies or disorders, which is rather straightforward using AI or other mathematical approaches, as shown years ago [13,14,15], and is also straightforward for a speech-language-pathologist or a trained physician. The clinical interesting part for the physician and the patient is to actually quantify the degree of disturbance in the voice and hence the consequences on the patient’s life. This severity estimation is still in the beginning but first achievements have been already accomplished [16].

In summary, despite all opportunities, advances and support that modern AI algorithms can and will provide, the last decision on medical assessment and therapy still is, and hopefully will remain, with the medical doctor.

References:

  1. P. Gomez, M. Semmler, A. Schützenberger, C. Bohr, M. Döllinger. Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network. Med Biol Eng Comput, 57(7): 1451-63; 2019. doi: 10.1007/s11517-019-01965-4
  2. A.M. Wölfl, A. Schützenberger, K. Breininger, A.M. Kist. Towards image-based laryngeal videostroboscopy using deep learning-enabled compressed sensing. Biomedical Signal Processing and Control, 86, artcle 105335, 2023. doi: 10.1016/j.bspc.2023.105335
  3. M. Döllinger, T. Schraut, L.A. Henrich, D. Chhetri, M. Echternach, A.M. Johnson, M. Kunduk, Y. Maryn, R.R. Patel, R. Samlan, M. Semmler, A. Schützenberger. Re-training of convolutional neural networks for glottis segmentation in endoscopic high-speed videos. Appl. Sci., 12, 9791, 2022. doi: 10.3390/app12199791
  4. A.M. Kist, P. Gómez, D. Dubrovskiy, P. Schlegel, M. Kunduk, M. Echternach, R. Patel, M. Semmler, C. Bohr, S. Dürr, A. Schützenberger, M. Döllinger. A deep learning enhanced novel software tool for laryngeal dynamics analysis. J Speech Lang Hear R, 64(6):1889-1903; 2021. doi: 10.1044/2021_JSLHR-20-00498
  5. A.M. Kist, K. Breininger, M. Dörrich, S. Dürr, A. Schützenberger, M. Semmler. A single latent channel is sufficient for biomedical glottis segmentation. Scientific Reports, 12(1): 14292;2022. Doi: 10.1038/s41598-022-17764-1
  6. Y. Maryn, M. Verguts, H. Demarsin, J. van Dinther, P. Gomez, P. Schlegel, M. Döllinger. Intersegmenter variability in high-speed laryngoscopy-based glottal area waveform measures. Laryngoscope, 130(11):E654-E661; 2020. doi: 10.1002/lary.28475.
  7. E. Kruse, M. Döllinger, A. Schützenberger, A.M. Kist. GlottisNetV2: Temporal glottal midline detection using deep convolutional neural networks. IEEE J. Transl. Eng. Health Med., 11:137-44;2023. doi: 10.1109/JTEHM.2023.3237859
  8. R. Grohe, S. Dürr, A. Schützenberger, M. Semmler, A.M. Kist. Long-term performance assessment of fully automatic biomedical glottis segmentation at the point of care. Plos One, 17, article: e0266989, 2022. Doi: 10.1371/journal.pone.0266989
  9. A.M. Kist, S. Dürr, A. Schützenberger, M. Döllinger. OpenHSV: An open platform for laryngeal high-speed videoendoscopy. Sci Rep, 11: art. no. 13760; 2021. doi: 10.1038/s41598-021-93149-0
  10. E. C. Inwald, M. Döllinger, M. Schuster, U. Eysholdt, C. Bohr. Multi-parametric analysis of vocal fold vibrations in healthy and disordered voices in high-speed imaging. J Voice, 25(5):576-90;2011. doi: 10.1016/j.jvoice.2010.04.004.
  11. P. Schlegel, S. Kniesburges, S. Dürr, A. Schützenberger, M. Döllinger. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Scientific Reports, 10(1):10517; 2020. doi: 10.1038/s41598-020-66405-y.
  12. P. Gomez, A. Schützenberger, M. Semmler, M. Döllinger. Laryngeal pressure estimation with a recurrent neural network. IEEE J Transl Eng Health Med, 7: article no. 8590726; 2019. doi: 10.1109/JTEHM.2018.2886021
  13. D. Voigt, M. Döllinger, T. Braunschweig, A. Yang, U. Eysholdt, J. Lohscheller. Classification of functional voice disorders based on Phonovibrograms. Artif Intell Med, 49(1):51-9;2010. doi: 10.1016/j.artmed.2010.01.001.
  14. C. Bohr, A. Kräck, U. Eysholdt, A. Ziethe, M. Döllinger. Quantitative analysis of organic vocal fold pathologies in females by high-speed endoscopy. Laryngoscope, 123(7):1686-93;2013. doi: 10.1002/lary.23783.
  15. C. Bohr, A. Kräck, D. Dubrovskiy, U. Eysholdt, J.G. Svec, G. Psychogios, A. Ziethe, M. Döllinger. Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males. J Speech Lang Hear R, 57(4):1148-61;2014. doi: 10.1044/2014_JSLHR-S-12-0076.
  16. T. Schraut, A. Schützenberger, T. Arias-Vergara, M. Kunduk, M. Echternach, M. Döllinger. Machine learning based estimation of hoarseness severity using sustained vowels. J Acoust Soc Am, In Press 01/2024.