NCVS Insights – Science that Resonates
Artificial intelligence methodologies to enhance visualisation and subsequent analysis of vocal fold dynamics
April 2, 2024
Volume 6 – April 2024
by Michael Döllinger
For clinical assessment of the voice production process, evaluation of the acoustic voice signal as well as the laryngeal dynamics (i.e., the vocal fold oscillations), producing the source signal in the larynx, is highly important.
Applying artificial intelligence (AI) methods in a clinical environment has become more and more popular during the last decade. Such AI methods are applied in a variety of clinical domains to enhance data visualisation, separate physiological from pathological data, predict risk factors for diseases and recurrences for cancer or quantify the severity of pathologies and disorders.
Naturally, during the last decade, AI methods as e.g. deep neural networks, recurrent neural networks, support vector machines, decision trees/stumps etc. have been suggested for the assessment of the voice production process to improve diagnostics and therapeutic approaches.
As for CT and MRI imaging techniques, laryngeal visualisation using endoscopic imaging has been significantly improved by automated AI based image processing techniques [1,2]. These AI based image processing methodologies allow now for a more reliable and much faster processing of vast data as produced by endoscopic high-speed-video (HSV) imaging [3]. Endoscopic HSV recordings are performed with at least 4000 fps and hence allow for a good visualisation of the vocal fold vibrations, typically being between 100 Hz – 400 Hz during phonation [4]. To objectively evaluate the vocal fold dynamics, the opening and closing glottis over time is segmented [5]. From there, important physiological vocal fold dynamical characteristics, such as entire vocal fold closure during each cycle, dynamical left-right symmetry and periodicity of vocal fold oscillations, can be computed [6]. To automate and hence further improve the quantification of the left-right symmetry of the vocal fold oscillations, additional AI algorithms were introduced [7]. To ensure clinical applicability of these new techniques a long-term study was performed in an actual clinical environment demonstrating their applicability and usefulness [8]. The final AI algorithms were then incorporated into a state-of-the-art hardware and can now be employed by other scientists and clinical researchers [9].
Having now established reliable data pre-processing as the prerequisite for subsequent data analysis, the next step is to apply this data to actually improve diagnostic and therapeutic processes. This can be done in the traditional way, where parameters computed on the data were just statistically evaluated and interpreted [10]. The other way would be, to again consult AI methodologies and use these newly provided techniques to better understand voice production physiology [11] and biomechanics [12]. However, the clinical desire is not to just separate between voice pathologies or disorders, which is rather straightforward using AI or other mathematical approaches, as shown years ago [13,14,15], and is also straightforward for a speech-language-pathologist or a trained physician. The clinical interesting part for the physician and the patient is to actually quantify the degree of disturbance in the voice and hence the consequences on the patient’s life. This severity estimation is still in the beginning but first achievements have been already accomplished [16].
In summary, despite all opportunities, advances and support that modern AI algorithms can and will provide, the last decision on medical assessment and therapy still is, and hopefully will remain, with the medical doctor.
REFERENCES
Michael Döllinger
Professor Döllinger graduated in 2000 at the FAU Erlangen-Nürnberg (Germany) with a Master in Mathematics. In 2002 he finished his PhD studies on optimizing a biomechanical model on laryngeal dynamics. From 2003 – 2005 he held a post-doc position at the University of California Los Angeles (UCLA). Between 2005 and 2008 he was senior scientist at the Divison for Phonaitrics and Pediatric Audiology at FAU Erlangen-Nürnberg. Since June 2008 he is professor for “Computational Medicine” at FAU.
HOW TO CITE
Subscribe to NCVS Notes
Contact
975 S. State Street
Clearfield, UT 84015
