ICVPB 2026 Program

The ICVPB 2026 Program brings together voice scientists, clinicians, engineers, and researchers from around the world for two days of presentations on the latest advances in voice production, biomechanics, and clinical voice care. The full ICVPB 2026 Program features four invited keynotes and 46 contributed talks across eight themed sessions, October 7–8 in Salt Lake City. Click any presenter card to read the full abstract or presenter bio.

Wednesday · October 7
Opening Keynote — Introduced by Ingo Titze
Nicole Y. K. Li-Jessen, Ph.D.

Nicole Y. K. Li-Jessen, Ph.D.

McGill University, Canada

Dr. Nicole Y. K. Li-Jessen is Associate Professor at McGill University and Canada Research Chair in Personalized Medicine of Upper Airway Health and Diseases. A speech pathologist and computational biologist by training, she earned her clinical degree at the University of Hong Kong, her PhD in computational biology at the University of Pittsburgh under Prof. Katherine Verdolini-Abbott, and completed postdoctoral training in tissue engineering with Prof. Susan Thibeault at the University of Wisconsin–Madison.

Read full bio

Dr. Nicole Y. K. Li-Jessen is Associate Professor at McGill University and Canada Research Chair in Personalized Medicine of Upper Airway Health and Diseases. A speech pathologist and computational biologist by training, she earned her clinical degree at the University of Hong Kong, her PhD in computational biology at the University of Pittsburgh under Prof. Katherine Verdolini-Abbott, and completed postdoctoral training in tissue engineering with Prof. Susan Thibeault at the University of Wisconsin–Madison. She leads McGill’s Voice and Upper Airway Research Laboratory, the only research group combining high-fidelity biocomputing with vocal fold bioengineering, and has trained more than 50 researchers across engineering, computing, and clinical sciences. Her program advances Digital Health Twins — dynamic, patient-specific virtual replicas of vocal fold systems that predict disease progression and guide regenerative therapies — across four integrated themes: computational medicine, tissue engineering, digital wearables, and health stigmatization. She has published 57 peer-reviewed articles and 7 book chapters, serves as Section Editor for PLOS Digital Health and on the editorial board of Scientific Reports, and received McGill’s Principal’s Prize for Excellence in Teaching in 2018.

Session 1 — Cellular Biology · Chair: Nicole Li-Jessen
Anumitha Venkatraman

Establishing Vocal Fold Swabbing as a Microbiome Sampling Technique

Anumitha Venkatraman
University of Wisconsin–Madison
Additional authors: Katelyn Jacobs, Ruth J. Davis, Susan L. Thibeault

Chronic laryngeal inflammatory conditions, such as benign vocal fold lesions, are attributed to one or a combination of immune challenges of environmental, systemic, and mechanical stimuli. The microbiome is responsible for maintaining immune homeostasis in the larynx. Benign vocal fold lesions are associated with altered microbiome, with increased abundance of the pathogen Streptococcus pseudopneumoniae. S. pseudopneumoniae disrupts vocal fold epithelial barrier integrity, thereby increasing risk of inflammation.

Read full abstract
Introduction

Chronic laryngeal inflammatory conditions, such as benign vocal fold (VF) lesions, are attributed to one or a combination of immune challenges of environmental, systemic and mechanical stimuli. The microbiome is responsible for maintaining immune homeostasis in the larynx. Benign VF lesions are associated with altered VF microbiome, with increased abundance of pathogen, Streptococcus pseudopneumoniae. S. pseudopneumoniae disrupts VF epithelial barrier integrity, thereby increasing risk of inflammation. Changes in microbiome composition can lead to earlier detection of inflammatory disease states, serve as prognostic indicators of interventional effectiveness and inform probiotic therapeutic targets. VF swabs provide an easy method to sample the VF microbiome, yet remain untested.

Objectives

To establish feasibility of VF swabbing as a microbiome sampling technique by confirming sufficient bacterial DNA for microbiome compositional analysis (>2 ng/ul). A secondary objective was to compare VF microbiome compositional structure/function obtained via swabs and a previously-established sampling method (tissue biopsies) in the same patients with benign VF lesions – using measures of microbial diversity and abundance.

Methods

VF swab and corresponding biopsy from benign VF lesions of five patients were obtained in the operating room (data collection ongoing). Bacterial DNA was extracted from samples using PowerSoilPro Kit (Qiagen, MD). 16S rRNA sequencing was completed to obtain measures of microbial diversity and abundance of dominant bacterial taxa in swabs and biopsies.

Results

All laryngeal swabs yielded sufficient bacterial DNA for microbiome compositional analysis (2.5-5 ng/ul). No significant differences in measures of microbial diversity were determined when comparing swabs and biopsies (p>.05). In both swabs and biopsies, Actinobacteria, Proteobacteria were dominant phyla, with high abundance of the following genera Streptococcus, Rothia, Haemophilus and Prevotella.

Conclusions

VF swabs provide a feasible microbiome sampling technique. Use of VF swabs allow microbial sampling in a wider variety of patients, facilitating advances in microbiome-informed diagnostic and therapeutic voice care.

Anumitha Venkatraman PhD CCC-SLP is a postdoctoral fellow at University of Wisconsin–Madison, working with Susan Thibeault. Her research broadly focuses on how psychosocial stress can alter laryngeal host-microbiome interactions.

Read full bio

Anumitha Venkatraman PhD CCC-SLP is a postdoctoral fellow at University of Wisconsin–Madison, working with Susan Thibeault. Her research broadly focuses on how psychosocial stress can alter laryngeal host-microbiome interactions. She received her MS-SLP and PhD degrees from Purdue University.

Junseo Cha

Low Intensity Pulsed Ultrasound Modulates Wound Healing Response in Vocal Fold Fibroblasts

Junseo Cha
University of Wisconsin–Madison
Additional authors: Susan L. Thibeault

Low-intensity pulsed ultrasound (LIPUS) has been widely used in soft and hard tissues of the human body to accelerate wound healing and improve clinical outcomes. The underlying biological process is known to occur via cellular mechanotransduction, which modulates wound-healing processes by altering gene and protein expression. The objective of this study was to determine whether this phenomenon is applicable to vocal fold fibroblasts, which are a key contributing cell type in numerous vocal fold pathologies resulting from dysregulated healing processes.

Read full abstract
Objectives

Low-intensity pulsed ultrasound (LIPUS) has been widely used in soft and hard tissues of the human body to accelerate wound healing and improve clinical outcomes. The underlying biological process is known to occur via cellular mechanotransduction, which modulates wound-healing processes by altering gene and protein expression. The objective of this study was to determine whether this phenomenon is applicable to vocal fold fibroblasts, which are a key contributing cell type in numerous vocal fold pathologies resulting from dysregulated healing processes.

Methods

Primary vocal fold fibroblast cell lines were cultured in cell culture media supplemented with 5 µg/mL lipopolysaccharide and 10 ng/mL TGF-β to invoke a pro-inflammatory and pro-fibrotic response. Cells were then subjected to 20 minutes of LIPUS every 24 hours for a total duration of 48 hours, receiving three treatments in total. Total RNA was extracted from the samples at 6, 12, 24, and 48 hours after the initial LIPUS treatment, followed by bulk RNA sequencing for differential gene expression and pathway enrichment analyses.

Results

While small changes were detected at the individual gene level, analysis at the pathway level showed enrichment of proliferative and anti-inflammatory pathways at 12 hours, and homeostatic behaviors with suppressed fibrotic and inflammatory pathways at 48 hours, with ECM deposition–related pathways predominantly enriched 24 hours after the initial LIPUS treatment.

Conclusions

While the small sample size limits statistical power for meaningful biological interpretation at the individual gene level, pathway analysis demonstrated an overall biological response to LIPUS in vocal fold fibroblasts. Our results suggest that vocal fold fibroblasts are mechanosensitive and responsive to changes in gene expression induced by high-frequency oscillatory stimulation, with enriched pathways that may be therapeutically beneficial for vocal fold wound healing.

Junseo is a doctoral candidate in the Department of Communication Sciences and Disorders under the mentorship of Dr. Susan Thibeault. He earned his B.A. in Contemporary Music from Hanyang University and an M.S. in Speech-Language Pathology from Daegu Catholic University.

Read full bio

Junseo is a doctoral candidate in the Department of Communication Sciences and Disorders under the mentorship of Dr. Susan Thibeault. He earned his B.A. in Contemporary Music from Hanyang University and an M.S. in Speech-Language Pathology from Daegu Catholic University. Junseo has sung, composed, engineered, or produced 25 songs during his music career. He also worked at the Medical Voice Otorhinolaryngology Clinic as a speech-language pathologist and a clinical vocologist, where he provided voice rehabilitation services and habilitation techniques to patients with voice disorders and voice professionals. Junseo’s current research interests include vocal fold wound healing in response to high-frequency stimulation and the elucidation of its related mechanotransduction pathways.

Tobias Riede

Vocal Changes in a Mouse Model During Systemic Dehydration and Post-Rehydration Recovery

Tobias Riede
Midwestern University

This study tested the hypothesis that systemic dehydration induces measurable and reversible alterations in vocal behavior in adult California mice (Peromyscus californicus). We further quantified baseline day-to-day variation in vocal performance to distinguish dehydration effects from normal within-individual variability.

Read full abstract
Objectives

This study tested the hypothesis that systemic dehydration induces measurable and reversible alterations in vocal behavior in adult California mice (Peromyscus californicus). We further quantified baseline day-to-day variation in vocal performance to distinguish dehydration effects from normal within-individual variability.

Methods

Twenty-six adult mice were assigned to a control group (N = 10; ad libitum food and water) or an experimental group (N = 16). Control animals were recorded for 10 days to estimate natural variation in vocal activity and acoustic structure. Experimental animals were recorded at baseline (Day 1), during three consecutive days of water deprivation (Days 2–4), and after water reintroduction (Day 5). Body mass was measured to estimate dehydration severity. Vocal activity (calls/24 h) and seven acoustic parameters were quantified using linear mixed-effects models and planned pairwise comparisons.

Results

In controls, vocal variation was dominated by stable individual differences, with no systematic day effects. Eleven experimental animals vocalized consistently and were included in analyses. Vocal activity did not differ between baseline and peak dehydration (Day 4; p = 1.0), but increased significantly after rehydration (Day 5 vs. Day 4, p = 0.012; Day 5 vs. Day 1, p = 0.021); 6 of 11 animals more than doubled call output after water restoration. Dehydration significantly reduced multiple acoustic parameters at Day 4 relative to baseline, including mean, maximum, and minimum fundamental frequency (all Bonferroni-corrected p ≤ 0.01), mean intensity (p < 0.05), and centroid size (p < 0.05). Syllable repetition rate decreased during dehydration (p = 0.0001), whereas syllable duration was unaffected. No significant differences were observed between baseline and rehydration for any acoustic parameter.

Conclusions

Systemic dehydration induces significant but reversible reductions in fundamental frequency, sound intensity, and repetition rate in a nonhuman mammal. Vocal function recovers rapidly following rehydration, demonstrating the sensitivity of mammalian phonation to hydration state.

Tobias Riede is currently an Associate Professor of Physiology at Midwestern University and formerly a Research Assistant Professor in the Department of Biology at the University of Utah. He conducts research on the physiology and functional morphology of sound production in vertebrates.

Read full bio

Tobias Riede is currently an Associate Professor of Physiology at Midwestern University and formerly a Research Assistant Professor in the Department of Biology at the University of Utah. He conducts research on the physiology and functional morphology of sound production in vertebrates. Comparative analysis of living animals provides a useful tool for the understanding of the mechanisms of vocal communication, including humans and the evolution of human speech. Dr. Riede holds a DVM from Free University in Berlin and a Ph.D from Humboldt University.

Morning Break · 9:30 – 9:45 AM
Session 2 — Muscle Physiology · Chair: Tobias Riede
Xudong Zheng

Neural Spike-Resolved Multiscale Modeling of Vocal Control and Sound Production

Xudong Zheng
Rochester Institute of Technology
Additional authors: Weili Jiang, Iris Adam, Nicholas Gladman, Coen Elemans, Qian Xue

Precise vocal control requires the transformation of discrete neural signals into continuous acoustic outputs at millisecond scales. However, most biomechanical models simplify neural input into time-averaged activation signals, failing to capture how the timing of individual action potentials influences the motor control of voice. This study presents a spike-resolved multiscale model to quantify how neural spike trains precisely regulate vocal control and sound production.

Read full abstract
Objectives

Precise vocal control requires the transformation of discrete neural signals into continuous acoustic outputs at millisecond scales. However, most biomechanical models simplify neural input into time-averaged activation signals, failing to capture how the timing of individual action potentials influences the motor control of voice. This study presents a spike-resolved multiscale model to quantify how neural spike trains precisely regulate vocal control and sound production.

Methods

We developed a five-stage chemomechanical framework that models the excitation-contraction coupling (ECC) cascade, bridging molecular calcium dynamics with macroscopic 3D biomechanics. The model incorporates a novel dynamic calcium release mechanism sensitive to inter-spike intervals (ISIs). We validated the model against experimental data from songbird vocal muscles (DTB and VS) using genetic algorithm optimization and 3D finite-element analysis to simulate the physical constraints of vocal production.

Results

The model was rigorously validated against experimental measurements of fast syringeal muscles, demonstrating a significant improvement in temporal precision over activation-based models. Integrated FSI simulations demonstrated that neural spike trains drive sound production by modulating labial position and tension. Quantitatively, the simulated relationship between muscle stress and fundamental frequency matched experimental measurements.

Conclusions

We have developed and validated a multiscale vocal control model which links temporally-varied spike patterns to vocal control and sound production. This validated multiscale approach provides a high-fidelity tool for investigating the neuromuscular strategies and biomechanical relationships underlying vocal communication across species.

Xudong Zheng is a professor in the mechanical engineering department at Rochester Institute of Technology (RIT). Prior to joining RIT, he held faculty positions at the University of Maine for a decade, serving as both an Assistant and Associate Professor.

Read full bio

Xudong Zheng is a professor in the mechanical engineering department at Rochester Institute of Technology (RIT). Prior to joining RIT, he held faculty positions at the University of Maine for a decade, serving as both an Assistant and Associate Professor. He earned his Ph.D. from George Washington University and completed a two-year postdoctoral fellowship at Johns Hopkins University. His primary research expertise lies in the computational modeling of flow-structure-acoustic interactions in voice production and vocal control. His work focuses on the development of high-fidelity models to investigate the complex neuromuscular and physical mechanisms underlying sound generation. Over the years, his research program has been supported by multiple grants from the National Institutes of Health and the National Science Foundation, as well as various international foundation awards.

Tsukasa Yoshinaga

Effects of Intramuscular Fiber Orientation in the Thyroarytenoid Muscle on Vocal Fold Adduction

Tsukasa Yoshinaga
University of Osaka
Additional authors: Zhaoyan Zhang

Previous studies have suggested that the fiber orientation of the thyroarytenoid (TA) muscle significantly alters the capability to regulate vocal fold vertical thickness and voice quality. However, earlier models assumed a single uniform muscle fiber direction and did not account for the gradual changes in intramuscular fiber orientation as observed in anatomical studies. The present study aims to incorporate a more realistic distribution in TA muscle fiber orientation and to investigate its influence on vocal fold shape control during adduction. Methods: Intrinsic laryngeal muscle activation was modeled using the finite element method.

Read full abstract
Objectives

Previous studies have suggested that the fiber orientation of the thyroarytenoid (TA) muscle significantly alters the capability to regulate vocal fold vertical thickness and voice quality. However, earlier models assumed a single uniform muscle fiber direction and did not account for the gradual changes in intramuscular fiber orientation as observed in anatomical studies. The present study aims to incorporate a more realistic distribution in TA muscle fiber orientation and to investigate its influence on vocal fold shape control during adduction. Methods: Intrinsic laryngeal muscle activation was modeled using the finite element method. The geometries were reconstructed from a male and a female excised larynx. Each laryngeal muscle was modeled as an anisotropic hyperelastic material with active stress due to muscle activation. Two types of TA muscle fiber orientation distributions were examined: a model with all TA muscle fibers aligned along a uniform angle and a model with radially varying fiber orientations. The degree of vocal fold inferior bulging due to TA muscle activation was compared across different fiber orientation conditions. Results: In both the male and female laryngeal models, radially oriented fibers produced stronger inferior medial bulging than fibers with a uniform angle. This effect likely arises because the medial fiber bundles aligned along the vocal fold medial edge, promoting expansion toward the medial side, whereas more lateral fibers with a more oblique orientation induced a strong medial rotation of the arytenoid cartilage. Additionally, the adduction pattern differed between the male and female models, indicating the importance of initial muscle geometry and fiber orientation in determining the degree of inferior bulging. Conclusions: The intramuscular fiber orientation distribution of the TA muscle has a significant effect on vocal fold medial surface shape and is likely an important factor underlying individual differences in voice characteristics and the ability to control phonation. Acknowledgments: This study was supported by Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (Grant Nos. JP23K17195 and JP22KK0238), and the National Institute on Deafness and Other Communication Disorders, the National Institutes of Health (Research Grant No. R01DC020240).

Tsukasa Yoshinaga received his Ph.D. in Engineering from the University of Osaka in 2018 and is currently an Assistant Professor there. His research focuses on the biomechanics and computational modeling of speech production and the aeroacoustics of musical instruments, including the development of large-scale physics-based models to study both normal and disordered voice.

Lizzie Hary

Vocal Fold Injury Induced Muscle Atrophy: Regional Differences in Medial vs Lateral Thyroarytenoid Muscle Cross-Sectional Area Following Injury and Intracordal Intervention

Lizzie Hary
University of Pittsburgh
Additional authors: Chloe Santa Maria, Courtney Edwards, Thiagarajan Meyyappan, Lea Sayce

This study investigates thyroarytenoid muscle atrophy following vocal fold injury, focusing on regional differences between the medial and lateral compartments, and evaluates the treatment effects of dexamethasone injection versus a sham injection of PLGA (Poly Lactic-co-Glycolic Acid) microparticles.

Read full abstract
Objectives

This study investigates thyroarytenoid muscle atrophy following vocal fold injury, focusing on regional differences between the medial and lateral compartments, and evaluates the treatment effects of dexamethasone injection versus a sham injection of PLGA (Poly Lactic-co-Glycolic Acid) microparticles.

Methods

Twenty New Zealand white rabbits were included in this study. Subjects were randomized to four groups: control (n=5), vocal fold injury (n=5), vocal fold injury with dexamethasone intracordal injection (n=5), and vocal fold injury with PLGA microparticle intracordal injection (sham injection; n=5). Experimental groups underwent vocal fold injury (a 3mm incision made with a sickle knife) followed by the assigned intervention. Subjects were recovered for eight weeks prior to terminal laryngeal harvest to allow for complete wound healing and chronic scar formation. Larynges were processed with formalin fixation and paraffin embedding. Histological evaluation was performed using hematoxylin and eosin staining. Within the midmembranous vocal fold, muscle fibers were categorized as medial TA (superficial compartment) or lateral TA (deep compartment) based on anatomical location. Individual muscle fiber cross-sectional area (CSA) was quantified for each region across all groups.

Results

All injuries demonstrated reduced TA muscle fiber CSA compared to uninjured controls, consistent with VF injury-induced atrophy. Dexamethasone intracordal injection attenuated muscle atrophy relative to the untreated injury group and the PLGA microparticle (sham) injection group. Regional analysis revealed greater atrophy in the lateral (deep) TA compared to the medial (superficial) TA across injury conditions, indicating differential susceptibility within the muscle.

Conclusions

Vocal fold injury results in significant TA muscle atrophy. Dexamethasone injection, the current clinical standard for VF scar treatment, appears to mitigate this atrophic response, while PLGA microparticles (sham injection) do not confer a protective effect. Notably, the lateral TA demonstrates greater vulnerability to atrophy than the medial TA, suggesting region-specific differences in muscle response to injury. Word Count (Objectives + Methods + Results + Conclusions): 297/300

Lizzie Hary, MA, CCC-SLP, is a voice-specialized speech-language pathologist and PhD candidate at the University of Pittsburgh, where her research focuses on vocal fold biology, wound healing, and fibrosis. Her work investigates transcriptomic and molecular mechanisms underlying acute and chronic vocal fold injury, with particular emphasis on the role of the TGF-β signaling pathway in tissue remodeling and scar formation.

Read full bio

Lizzie Hary, MA, CCC-SLP, is a voice-specialized speech-language pathologist and PhD candidate at the University of Pittsburgh, where her research focuses on vocal fold biology, wound healing, and fibrosis. Her work investigates transcriptomic and molecular mechanisms underlying acute and chronic vocal fold injury, with particular emphasis on the role of the TGF-β signaling pathway in tissue remodeling and scar formation. The long-term mission of her research is the development of targeted, biologically informed therapies for voice disorders.

Lizzie has nine years of clinical experience in voice, upper airway, and swallowing disorders. Her clinical background informs her research approach, bridging mechanistic biology with functional voice outcomes. Lizzie is the recipient of an NIH F31 predoctoral fellowship and multiple institutional research awards, and she is actively involved in student mentorship and graduate teaching.

Amanda Stark

Respiratory Kinematics Reveal Phenotypic Differences in Essential Vocal Tremor and Laryngeal Dystonia with Vocal Tremor

Amanda Stark
University of Utah
Additional authors: Jenny Pierce, Brad Story, Kaitlyn Dwenger, Sarah McDowell, Hezro Da Silva, Julie Barkmeier-Kraemer

Vocal tremor is a neurogenic voice disorder marked by rhythmic oscillations that can involve multiple speech subsystems, including respiration. Although respiratory contributions have been noted, chest wall kinematic patterns have not been evaluated as a clinical feature differentiating essential vocal tremor (EVT) from laryngeal dystonia with vocal tremor (LDVT). Because EVT typically affects multiple body regions whereas LDVT is focal, we hypothesized respiratory tremor would be present in EVT but not LDVT.

Read full abstract
Background

Vocal tremor is a neurogenic voice disorder marked by rhythmic oscillations that can involve multiple speech subsystems, including respiration. Although respiratory contributions have been noted, chest wall kinematic patterns have not been evaluated as a clinical feature differentiating essential vocal tremor (EVT) from laryngeal dystonia with vocal tremor (LDVT). Because EVT typically affects multiple body regions whereas LDVT is focal, we hypothesized respiratory tremor would be present in EVT but not LDVT.

Methods

Group comparisons between EVT and LDVT were conducted using independent-samples t-tests or Wilcoxon rank-sum tests for continuous variables and chi-square or Fisher’s exact tests for categorical variables. Respiratory tremor rate during sustained /a/ did not differ significantly between groups for the rib cage (LDVT: M = 4.0, SD = 0.52; EVT: M = 3.6, SD = 0.87; p = .06) or abdomen (LDVT: M = 3.5, SD = 1.1; EVT: M = 3.4, SD = 1.3; p = .77). During sustained /i/, EVT showed a higher abdominal tremor rate (M = 3.7, SD = 3.4) than LDVT (M = 2.8, SD = 1.4), p = .001. No group differences were observed for tremor extent. Due to limited sample size, groups were combined for palpation analyses. Palpable respiratory tremor was identified in 46% of EVT and 19% of LDVT participants. Abdominal tremor extent was significantly greater in participants with palpable tremor, regardless of group (M = 0.17, SD = 0.13), p = .001, indicating that palpable tremor is associated with increased abdominal excursion rather than diagnosis.

Conclusion

Respiratory oscillation did not clearly differentiate EVT from LDVT, as both groups demonstrated tremor patterns. However, EVT showed faster abdominal tremor during /i/, aligning with typical essential tremor rates. Extent did not explain these differences, suggesting potential central versus peripheral mechanisms. Palpable tremor was linked to greater abdominal excursion, indicating clinical detection may reflect abdominal movement. Further work is needed to clarify mechanisms and muscular contributions. Future research should examine neural versus biomechanical drivers, include larger samples, and assess specific muscle contributions such as diaphragm and abdominal wall activity across tasks to understand respiratory tremor characteristics and improve differential diagnosis and clinical assessment approaches.

Amanda C. Stark, PhD, CCC-SLP, is a speech-language pathologist, voice scientist, and clinical researcher specializing in voice, upper-airway, and swallowing disorders. She is a postdoctoral research associate at the Utah Center for Vocology at the University of Utah, where her work integrates clinical voice science, biomechanics, and advanced technologies to improve diagnosis and treatment of complex vocal conditions.

Read full bio

Amanda C. Stark, PhD, CCC-SLP, is a speech-language pathologist, voice scientist, and clinical researcher specializing in voice, upper-airway, and swallowing disorders. She is a postdoctoral research associate at the Utah Center for Vocology at the University of Utah, where her work integrates clinical voice science, biomechanics, and advanced technologies to improve diagnosis and treatment of complex vocal conditions.

Her research focuses on voice production and disorders using multimodal approaches, including acoustics, aerodynamics, imaging, respiratory kinematics, and electromyography. She emphasizes translational science, bridging laboratory methods with clinical application. Her work includes studies on neurologically based voice disorders such as laryngeal dystonia, essential voice tremor, and chronic cough.

Clinically, Dr. Stark evaluates and treats individuals with voice and airway disorders, working within interdisciplinary teams and serving diverse populations, including patients with neurologic conditions and professional voice users. Her clinical and research efforts aim to improve communication outcomes and quality of life.

Rory O’Keeffe

Functional Perilaryngeal Muscle Networks as an Objective Biomarker of Vocal Hyperfunction

Rory O’Keeffe
NYU Grossman School of Medicine / NYU
Additional authors: Davood Shahrjerdi, Aaron M. Johnson

Vocal hyperfunction refers to excessive perilaryngeal muscle activity during voicing and is associated with several common voice disorders. Clinical assessment relies largely on subjective measures such as patient reporting, perceptual evaluation, and manual palpation. Surface electromyography (sEMG) provides a non-invasive method for measuring neck muscle activity. However, conventional spectrotemporal metrics have produced inconsistent results when distinguishing dysphonic from healthy voicing.

Read full abstract
Objectives

Vocal hyperfunction refers to excessive perilaryngeal muscle activity during voicing and is associated with several common voice disorders. Clinical assessment relies largely on subjective measures such as patient reporting, perceptual evaluation, and manual palpation. Surface electromyography (sEMG) provides a non-invasive method for measuring neck muscle activity. However, conventional spectrotemporal metrics have produced inconsistent results when distinguishing dysphonic from healthy voicing. We hypothesized that vocal hyperfunction reflects altered neuromuscular coordination rather than merely increased muscle activation. This work investigates functional perilaryngeal muscle networks derived from multichannel sEMG as an objective biomarker of vocal coordination and hyperfunction.

Methods

sEMG signals were recorded from twelve sensors across perilaryngeal and cervical muscles while participants performed ten vocal tasks spanning sustained phonation, pitch glide, speech, singing, and reading. Functional synergistic muscle networks were constructed using intermuscular coherence. Network metrics such as mean degree, weighted clustering coefficient, and global efficiency characterized coordination patterns across tasks. Directional connectivity analysis using partial directed coherence examined temporal coordination among muscles. In clinical studies, sagittal network asymmetry was compared between patients with vocal hyperfunction and asymptomatic controls.

Results

Network metrics differentiated vocal behaviors, outperforming traditional spectrotemporal sEMG features and increasing in proportion to vocal pitch and loudness. In patients with voice disorders, the network metrics were decreased and less responsive to task demands. Compared to asymptomatic controls, patients also generally exhibited greater sagittal network asymmetry. The differences in asymmetry were task- and muscle-dependent, with the largest patient-control contrasts observed during reading tasks and at the infrahyoid musculature.

Conclusions

Functional muscle network analysis provides a systems-level representation of perilaryngeal coordination during voice production. The findings suggest that vocal hyperfunction reflects altered neuromuscular coordination, rather than merely increased muscle activation. Therefore, muscle network metrics show promise as objective biomarkers for diagnosing and monitoring vocal hyperfunction and related voice disorders. Acknowledgments: This work was supported by NIH grant R01DC021452.

Rory O’Keeffe is a postdoctoral researcher at the NYU Voice Center at NYU Grossman School of Medicine, with a joint affiliation at the NYU Tandon School of Engineering. His work lies at the intersection of neural engineering, speech science, and clinical voice disorders, with a focus on developing objective biomarkers of vocal function using surface electromyography (sEMG) and advanced signal processing techniques.

Read full bio

Rory O’Keeffe is a postdoctoral researcher at the NYU Voice Center at NYU Grossman School of Medicine, with a joint affiliation at the NYU Tandon School of Engineering. His work lies at the intersection of neural engineering, speech science, and clinical voice disorders, with a focus on developing objective biomarkers of vocal function using surface electromyography (sEMG) and advanced signal processing techniques.

Dr. O’Keeffe’s research centers on characterizing neuromuscular coordination in the perilaryngeal musculature during functional vocal tasks. He employs network-based approaches, including intermuscular coherence and functional connectivity analyses, to investigate how muscle coordination patterns are altered in voice disorders such as primary muscle tension dysphonia and lesions related to vocal hyperfunction. His work aims to bridge physiological measurement and clinical assessment by identifying task- and muscle-specific signatures of dysfunction.

Working closely with Dr. Aaron Johnson at the NYU Voice Center and Dr. Davood Shahrjerdi at the NYU Tandon School of Engineering, Dr. O’Keeffe contributes to multidisciplinary efforts that integrate engineering, clinical voice evaluation, and translational research. His current projects include the analysis of asymmetry in muscle network organization and the optimization of perilaryngeal sEMG sensor configurations.

Dr. O’Keeffe’s broader goal is to advance quantitative, clinically interpretable tools that can support diagnosis, treatment monitoring, and rehabilitation in voice disorders. His work aligns closely with the mission of ICVPB in advancing the scientific understanding of voice production and its clinical applications.

Open block · 11:00 – 11:30 AM (panel, extended Q&A)
Lunch · 11:30 AM – 12:30 PM
Post-Lunch Keynote — Introduced by Ingo Titze
Ron Scherer, Ph.D.

Ron Scherer, Ph.D.

Bowling Green State University

Dr. Ron Scherer is Distinguished Research Professor in the Department of Communication Sciences and Disorders at Bowling Green State University, where he teaches voice disorders and voice and speech science. His research spans the physiology, mechanics, and acoustics of normal, abnormal, and performance sound production, as well as the methodologies involved in such work.

Read full bio

Dr. Ron Scherer is Distinguished Research Professor in the Department of Communication Sciences and Disorders at Bowling Green State University, where he teaches voice disorders and voice and speech science. His research spans the physiology, mechanics, and acoustics of normal, abnormal, and performance sound production, as well as the methodologies involved in such work. He previously served as Senior Scientist at the Denver Center for the Performing Arts voice laboratories and taught in the DCPA’s theatre voice and speech trainers program, and held a Research Professor appointment in the Department of Otolaryngology–Head and Neck Surgery at the University of Cincinnati in 2005. Dr. Scherer received his PhD from the University of Iowa, a master’s degree in speech-language pathology from Indiana University, and a BS in mathematics, having also spent two years as a music major at Indiana. He is a Fellow of both the Acoustical Society of America and the American Speech-Language-Hearing Association.

Session 3 — Acoustics & Aerodynamics · Chair: Ron Scherer
Robert Brinton Fujiki

Evaluating CPPS and Pitch Strength for Pediatric Voice Assessment

Robert Brinton Fujiki
Indiana University School of Medicine
Additional authors: Charles J. Nudelman

Pediatric voice disorders adversely affect quality of life, yet young children may have limited tolerance for diagnostic voice assessments. As such, identifying efficient acoustic measures that can reliably differentiate school-aged children with and without voice disorders is of high clinical import. This study examined the extent to which Cepstral Peak Prominence Smoothed (CPPS) and pitch strength differentiate between healthy and dysphonic pediatric voices.

Read full abstract
Objectives

Pediatric voice disorders adversely affect quality of life, yet young children may have limited tolerance for diagnostic voice assessments. As such, identifying efficient acoustic measures that can reliably differentiate school-aged children with and without voice disorders is of high clinical import. This study examined the extent to which Cepstral Peak Prominence Smoothed (CPPS) and pitch strength differentiate between healthy and dysphonic pediatric voices.

Methods

Voice samples were collected from 100 children between the ages of 4 and 12 (mean=7.8 years, SD=2.4 years, female=48, male=52). Forty-two children had been diagnosed with benign vocal fold lesions and 58 denied any history of dysphonia. CPPS and pitch strength were calculated on sustained vowels and connected speech using a novel automated analysis framework in Python based on existing, open access acoustic analysis pipelines. Univariable and multivariable logistic regression models assessed associations between the acoustic measures and voice-disorder status. Receiver operating characteristic analyses quantified classification accuracy, and optimal thresholds were identified.

Results

Children with voice disorders demonstrated significantly lower CPPS and pitch strength values on sustained vowel and connected speech when compared to vocally healthy controls. CPPS on sustained vowel was the strongest predictor of voice status; however, both measures effectively separated children with and without voice disorders. Despite significant group-level differences, substantial within-group variability limited the establishment of clear diagnostic cutoffs. Optimal thresholds for differentiating children with and without voice disorders were 11.19dB for CPPS on sustained vowel, 9.48dB for CPPS on connected speech, .30 for pitch strength on vowel, and .28 for pitch strength on connected speech.

Conclusions

Results support the use of CPPS in routine pediatric voice assessment and identify pitch strength as a promising adjunctive metric. Incorporating brief cepstral and pitch-based analyses into standard clinical protocols may enhance objectivity while minimizing testing burden in children.

Robert Brinton Fujiki, PhD, CCC-SLP, is a clinician scientist at Indiana University School of Medicine specializing in the evaluation and treatment of voice, resonance, and upper airway disorders in children. He completed his PhD at Purdue University with Dr. Preeti Sivasankar and postdoctoral fellowship at the University of Wisconsin-Madison with Dr. Susan Thibeault.

Read full bio

Robert Brinton Fujiki, PhD, CCC-SLP, is a clinician scientist at Indiana University School of Medicine specializing in the evaluation and treatment of voice, resonance, and upper airway disorders in children. He completed his PhD at Purdue University with Dr. Preeti Sivasankar and postdoctoral fellowship at the University of Wisconsin-Madison with Dr. Susan Thibeault. His research has been recognized by the American Speech-Language and Hearing Association (ASHA), the American Cleft Palate and Craniofacial Association (ACPA), and The Voice Foundation.

Qian Xue

Fast Prediction of Vocal-Fold Medial-Surface Pressure on M5-Derived Geometries Using Diffeomorphic Operator Learning

Qian Xue
Rochester Institute of Technology
Additional authors: Hem Raj Pandeya, Xudong Zheng

High-fidelity 3D Navier–Stokes simulations can resolve the intraglottal wall-pressure distributions, but coupling with fluid–structure interactions over multiple vibration cycles is computationally prohibitive. A quasi-steady approximation is valid in the normal phonation frequency range, where it can represent an unsteady cycle as a sequence of steady flows on instantaneous geometries. Using this approximation, we aim to develop a fast, geometry-aware surrogate that predicts medial-surface pressure from static glottal geometries for use in surrogate-coupled fluid–structure interaction.

Read full abstract
Objectives

High-fidelity 3D Navier-Stokes simulations can resolve the intraglottal wall-pressure distributions, but coupling with fluid-structure interactions (FSI) over multiple vibration cycles is computationally prohibitive. A quasi-steady approximation is valid in the normal phonation frequency range, where it can represent an unsteady cycle as a sequence of steady flows on instantaneous geometries. Using this approximation, we aim to develop a fast, geometry-aware surrogate that predicts medial-surface pressure from static glottal geometries for use in surrogate-coupled FSI.

Methods

We will train a Diffeomorphic Mapping Operator Learning (DIMON) model to predict the medial surface pressure field on M5-based continuum vocal-fold geometries. The geometries will be generated by varying length, depth, height, adduction gap, and convergent/divergent taper, with deformation patterns to span normal and diseased phonations. For each geometry, quasi-steady incompressible CFD will provide pressure labels. DIMON maps each geometry to a common reference domain, learns the unified pressure operator, and transfers the solution back to the original geometry. The model’s performance will be evaluated by comparing surrogate-coupled FSI with CFD-coupled FSI using medial-surface pressure and key vibration/flow outputs (vibration frequency, open quotient, and flow rate).

Results

In preliminary studies, a pressure surrogate trained on surface wave-based median-surface geometries generalized to held-out POD-sampled shapes, motivating the proposed DIMON extension to M5-based continuum geometries. We will report pressure-field accuracy on unseen geometries, computational speedup relative to CFD, and agreement between surrogate- and CFD-coupled simulations for key FSI outputs.

Conclusions

This study will provide a pathway for rapid intraglottal pressure prediction and will quantify the feasibility of replacing the Navier-Stokes solver with a learned pressure surrogate in FSI-based phonation studies.

Dr. Qian Xue is an Associate Professor of Mechanical Engineering at Rochester Institute of Technology (RIT). Prior to joining RIT, she was an Assistant Professor in the Mechanical Engineering Department at the University of Maine.

Read full bio

Dr. Qian Xue is an Associate Professor of Mechanical Engineering at Rochester Institute of Technology (RIT). Prior to joining RIT, she was an Assistant Professor in the Mechanical Engineering Department at the University of Maine. She received her Ph.D. in Mechanical Engineering from Johns Hopkins University in 2012. Her research focuses on computational modeling of fluid–structure–acoustic interactions in biological systems, with applications in disease diagnosis and treatment, bio-inspired design, and simulation-assisted healthcare. Her work spans voice biomechanics, seal whisker–inspired flow sensing, aquatic and aerial locomotion, and machine-learning-aided modeling.

Leonardo Lopes

The Grade of Breathiness Index (gdi): A Fuzzy Logic Model for Vocal Assessment

Leonardo Lopes
Federal University of Paraíba
Additional authors: Giulliana Karla Lacerda Pereira de Queiroz, Ronei Marcos de Moraes, Samuel Ribeiro de Abreu

to develop and validate a multiparametric acoustic model, based on fuzzy logic, to predict the degree of breathiness (DB) in dysphonic and non-dysphonic voices of Brazilian Portuguese speakers.

Read full abstract
Objectives

to develop and validate a multiparametric acoustic model, based on fuzzy logic, to predict the degree of breathiness (DB) in dysphonic and non-dysphonic voices of Brazilian Portuguese speakers.

Methods

This is a cross-sectional, retrospective study that included vocal samples of the sustained vowel [a] and connected speech from 300 participants (235 women, 65 men; mean age 36.47 ± 12.07 years), including both dysphonic and non-dysphonic individuals. Five speech-language pathologists specializing in voice performed auditory-perceptual judgment of GB on a visual analog scale from 0 to 100 points, using the VoxMore APJ plugin. Based on these judgments, a Fuzzy Visual Analog Scale (Fuzzy VAS) was developed, which integrated the experts’ scores, weighted by their reliability (ICC), and modeled perceptual uncertainty using fuzzy logic and a k-means algorithm. Subsequently, 47 acoustic measures were extracted from the vocal samples. A multiple linear regression model was constructed using stepwise variable selection, with the Fuzzy VAS as the dependent variable, to identify the acoustic predictors of GB. Model validation included the evaluation of statistical assumptions (normality, homoscedasticity, independence, and multicollinearity) and the calculation of the Mean Absolute Error (MAE) relative to Fuzzy VAS.

Results

The developed acoustic model (Grade of Breathiness Index – GBI) demonstrated high predictive ability, explaining 80.49% of the variability in perceived GB. Four acoustic measures were identified as significant predictors: median CPPS, jitterddp, GNE3000Hz, and Hfno6000Hz. The model presented an MAE of 8.04 points (on a scale of 0 to 100) between GBI predictions and Fuzzy VAS scores.

Conclusions

The GBI demonstrates robust predictive ability and high adherence to perceptual judgments of breathiness, with an adjusted R² of 80.49% and an MAE of 8.04 points. This index emerges as a reliable and objective tool for quantifying GB. It provides valuable insights into the underlying vocal physiology, aids in longitudinal monitoring, and serves as a teaching resource for training new clinicians in vocal assessment.

Speech-language pathologist, full professor at the Federal University of Paraíba. Permanent professor in the Postgraduate Program in Communication Disorders and the Decision Models and Health Program at UFPB.

Read full bio

Speech-language pathologist, full professor at the Federal University of Paraíba. Permanent professor in the Postgraduate Program in Communication Disorders and the Decision Models and Health Program at UFPB. Leads the Integrated Voice Studies Laboratory and coordinates clinical studies, acoustic analysis, and artificial intelligence projects in the field of human voice.

Meike Brockmann-Bauser

Effects of Speech Characteristics on Instrumental Acoustic and Electroglottographic Voice Analysis Metrics in Women with Structural Dysphonia Before and After Treatment

Meike Brockmann-Bauser
University of Education Weingarten, Germany
Additional authors: Naomi A. Iob, Lei He, Sten Ternström, Huanchen Cai

Acoustic voice assessment metrics including smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR) vary with voice loudness and fundamental frequency (fo) in pathologic voices. However, it is unclear, if similar effects are present in electroglottographic metrics, and if these are clinically relevant. This work investigates the effects of three elicitation levels, calibrated sound pressure level (SPL), fo, vowel and treatment on acoustic CPPS and HNR, and electroglottographic hybrid open quotient (OQ), dEGG OQ and peak dEGG.

Read full abstract
Objectives

Acoustic voice assessment metrics including smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR) vary with voice loudness and fundamental frequency (fo) in pathologic voices. However, it is unclear, if similar effects are present in electroglottographic metrics, and if these are clinically relevant. This work investigates the effects of three elicitation levels, calibrated sound pressure level (SPL), fo, vowel and treatment on acoustic CPPS and HNR, and electroglottographic hybrid open quotient (OQ), dEGG OQ and peak dEGG.

Methods

In a retrospective study, simultaneous acoustic and electroglottographic recordings of 29 women with a mean of 25 years (± 8.9, range: 18–53) with structural vocal fold pathologies were investigated. Samples included sustained vowel phonations of /ɑ/, /i/, and /u/ at three elicited loudness levels (soft/comfortable/loud) and unconstrained fo before and after treatment. The effects of elicitation effort level, calibrated SPL (dB), fo (Hz) and vowel, as well as treatment effects on the dependent variables peak dEGG, dEGG OQ, hybrid OQ, HNR, and CPPS were investigated with Linear Mixed Models (LMMs).

Results

While peak dEGG, HNR, and CPPS were significantly influenced by elicitation effort level, calibrated SPL significantly affected acoustic HNR and CPPS only (all p<.01). fo showed a significant effect on peak dEGG and CPPS (p < .0001). All metrics had significant changes with regard to vowel (all p<.05). However, there was no significant treatment effect on any assessment metric (p>.05).

Conclusions

While electroglottographic metrics, especially OQ, were less affected by phonatory context (elicitation level, SPL, fo and vowel), the value of all investigated voice assessment parameters for the documentation of treatment effects was limited. To improve their clinical usefulness, future works investigating the underlying physiologic reasons should target suitable voice tasks while sampling under control of SPL and fo.

Prof. Dr. Meike Brockmann-Bauser is Chair of the Department of Speech and Language Therapy at the University of Education Weingarten, Germany. Her research is based on over 20 years of clinical experience in patients with voice disorders and focuses on behavioral, linguistic, physiologic, and pathologic factors influencing diagnostic acoustic voice measurements.

Shaheen N. Awan

Towards Low-cost Aerodynamic Measures of Voice: Exploring the Potential of the Vortex Whistle

Shaheen N. Awan
University of Central Florida
Additional authors: Jordan A. Awan, Jun Chen, Victoria S. McKenna, Sophia Gifford, Emma Burns, Brittany Fletcher, Amanda I. Gillespie, David A. Eddins

Measurement of airflow during phonation is a valuable physiological biomarker of vocal function (e.g., abnormally high flow rates are indicative of glottal incompetence; abnormally low flow rates are indicative of hyperadduction). However, because pneumotachometer-based systems are limited by high cost and lack of availability, the measurement of phonatory airflow (PA) is clinically underused. To address this limitation, we are studying the vortex whistle (VW), a 3D-printed device that produces an acoustic signal with a frequency directly proportional to the inlet airflow and can be manufactured at a fraction of the cost of current PA measurement systems. The purpose of this study was to compare measures of mean PA obtained via the VW versus a gold standard pneumotach-based system.

Read full abstract
Objectives

Measurement of airflow during phonation is a valuable physiological biomarker of vocal function (e.g., abnormally high flow rates are indicative of glottal incompetence; abnormally low flow rates are indicative of hyperadduction). However, because pneumotachometer-based systems are limited by high cost and lack of availability, the measurement of phonatory airflow (PA) is clinically underused. To address this limitation, we are studying the vortex whistle (VW), a 3D-printed device that produces an acoustic signal with a frequency directly proportional to the inlet airflow and can be manufactured at a fraction of the cost of current PA measurement systems. The purpose of this study was to compare measures of mean PA obtained via the VW versus a gold standard pneumotach-based system.

Methods

Mean PA (a.k.a. mean flow rate; transglottal airflow; phonatory flow rate; in mL/s) was measured from the maximum sustained phonations of the vowel /u/ of 27 typical voice subjects using the Phonatory Aerodynamic System (PAS) and a VW specially designed to provide whistle frequency with a strong SNR at the low flow rates observed in typical phonation. VW frequency was measured via spectrographic analysis and converted to flow using a predetermined regression equation.

Results

Measures of mean PA obtained via VW were non-significantly different from the PAS (140.90 vs. 142.22 mL/s). A Theil-Sen regression showed a significant relationship (r = 0.61, r2 = 0.37, p < 0.001) and the interquartile predictive error of 38.33 mL/s was well within previously reported within subject error = 50 mL/s on the PAS.

Conclusions

The VW has enormous potential to provide an important aerodynamic measure of voice (PA) in a wide variety of clinical settings in an affordable and accessible manner. Details of the VW, study methods and results, and future directions will be discussed. Acknowledgments: This work is supported by NIH/NIDCD 1R01DC020799-01A1 “Vital capacity & airflow measurement for voice evaluation: A vortex whistle system”. (Awan SN, Primary MPI; Awan JA, MPI; Chen J, MPI; Gillespie A, MPI).

Dr. Shaheen N. Awan is currently Research Professor in the Dept. of Communication Sciences and Disorders at the University of Central Florida. Dr. Awan previously served as Research professor at the University of South Florida and is also Professor Emeritus, Dept. of Communications Sciences and Disorders at Bloomsburg University of PA.

Read full bio

Dr. Shaheen N. Awan is currently Research Professor in the Dept. of Communication Sciences and Disorders at the University of Central Florida. Dr. Awan previously served as Research professor at the University of South Florida and is also Professor Emeritus, Dept. of Communications Sciences and Disorders at Bloomsburg University of PA. Dr. Awan’s clinical expertise is in the assessment and treatment of voice disorders, and his research has focused on instrumental analysis of typical vs. disordered speech and voice with a concentration on acoustic and aerodynamic analyses. Dr. Awan’s research team is funded by the National Institutes of Health (NIH/NIDCD) for a 5-year study of a low-cost device called a vortex whistle for low-cost aerodynamic analyses of voice. Dr. Awan previously served on the ASHA working group committee for recommended instrumental voice assessment protocols and has served as an Editor for the Journal of Speech-Language-Hearing Research.

Opal Taylor

Impact of Internet Connection Quality on Vortex Whistle Signal Processing Across Telehealth Platforms

Opal Taylor
Emory University
Additional authors: Amanda I. Gillespie, Shaheen N. Awan

The Vortex Whistle (VW) is a low-cost device that produces a frequency that directly correlates to airflow rate. The purpose of this investigation was to test the accuracy and consistency of the VW frequency across telehealth platforms and internet speed conditions. Results of this study will help clarify the usability and technical considerations of the VW in a clinical telehealth setting.

Read full abstract
Objectives

The Vortex Whistle (VW) is a low-cost device that produces a frequency that directly correlates to airflow rate. The purpose of this investigation was to test the accuracy and consistency of the VW frequency across telehealth platforms and internet speed conditions. Results of this study will help clarify the usability and technical considerations of the VW in a clinical telehealth setting.

Methods

A specially-designed VW used for measuring phonatory airflow was tested in seven internet speed conditions (0.1 mbps/0.5 mbps, 0.5/2.5 mbps,1 mbps/5 mbps, 5 mbps/25 mbps ,10 mbps/50 mbps, 20mbps/100 mbps and 40 mbps/200 mbps upload/download) on two of the most common telehealth platforms: Zoom and EPIC. Two computers joined a telehealth meeting, one acting as the “patient” computer and one as the “clinician” computer. At the patient computer, the investigator produced fifteen VW frequencies at each internet speed. Each trial was simultaneously recorded live in Audacity by the “patient” computer, and by the “clinician” computer, which recorded the audio signal it received through the telehealth interface using the “loop back” function in Audacity. The VW frequencies were calculated in Praat using linear predictive coding (LPC). Pearson’s correlations measured the agreement between the frequencies of the “patient” and “clinician” recordings.

Results

Platform performance differed depending on internet speed. Zoom was not able to operate at the lowest internet speed, and had poor agreement at 0.5mbps/2.5mbps. EPIC was able to operate at the lowest internet speed, but did not reliably output a similar signal. Both platforms showed excellent agreement at the five fastest internet speeds. Ideal formant settings differed between platforms, indicating potential differences in the transmission of higher-frequency energy concentrations.

Conclusions

Preliminary results indicate strong potential for VW usage in telehealth settings. Future analyses will include additional platforms, other types of VW, and mobile vs desktop performance.

Opal Taylor is a post-doctoral fellow in the Department of Otolaryngology–Emory Voice Center at Emory University.

Michael H. Krane

Source-Tract Interaction in Terms of Energy

Michael H. Krane
Penn State University
Additional authors: Mitchell J. Swann

This work describes the aeroelastic and aeroacoustic mechanisms of source-tract interaction in terms of energy exchanges between air motion in the trachea, larynx, and vocal tract. First, a theoretical description is presented, based on a control volume formulation, for each of these regions. The terms describing source-tract interaction are then identified, and their behavior discussed. Then, using energy budgets computed with a reduced-order phonation model, the role of source-tract interaction in both sustaining vocal fold vibration and sound production are quantified.

Read full abstract

This work describes the aeroelastic and aeroacoustic mechanisms of source-tract interaction in terms of energy exchanges between air motion in the trachea, larynx, and vocal tract. First, a theoretical description is presented, based on a control volume formulation, for each of these regions. The terms describing source-tract interaction are then identified, and their behavior discussed. Then, using energy budgets computed with a reduced-order phonation model, the role of source-tract interaction in both sustaining vocal fold vibration and sound production are quantified. This analysis is performed for both normal phonation and for phonation through a high-resistance, high-inertance tube.

Dr. Michael H. Krane is a Professor in the Graduate Program in Bioengineering at Pennsylvania State University and a researcher at Penn State’s Applied Research Laboratory (ARL). He holds a Ph.D. in Aerospace Engineering and works at the interdisciplinary intersection of acoustics, bioengineering, and mechanical engineering.

Read full bio

Dr. Michael H. Krane is a Professor in the Graduate Program in Bioengineering at Pennsylvania State University and a researcher at Penn State’s Applied Research Laboratory (ARL). He holds a Ph.D. in Aerospace Engineering and works at the interdisciplinary intersection of acoustics, bioengineering, and mechanical engineering. His research focuses on fluid mechanics, aeroacoustics, fluid–structure interactions, and biological fluid dynamics, with a particular emphasis on the physiological aerodynamics of phonation. Dr. Krane’s experimental and theoretical work investigates the unsteady flow phenomena that generate sound, including jet dynamics in scaled-up vocal fold models, source characteristics of human speech, and source–tract interaction during vocalization. His broader portfolio extends to bio-inspired acoustics, including passive trailing-edge noise attenuation informed by the anatomy of owl flight. He is an active collaborator within Penn State’s Biomedical Engineering interdisciplinary graduate program and the Applied Research Laboratory, where he contributes to projects bridging fundamental fluid dynamics with translational applications in voice science and aerodynamic design.

Open block · 2:45 – 3:15 PM (panel, extended Q&A)
Afternoon Break · 3:15 – 3:30 PM
Session 4 — Voice Disorders · Chair: Amanda Stark
Jenny L. Pierce

Latency of Muscle to Voice Onset During Laryngeal Electromyography Distinguishes Between Laryngeal Dystonia and Essential Vocal Tremor

Jenny L. Pierce
University of Utah
Additional authors: Amanda Stark, Breanne Schiffer, Marshall Smith, Julie Barkmeier-Kraemer

Latency in laryngeal electromyography (LEMG) is the time between muscle activation and voice onset, and is known to be longer than normal in adductor-type laryngeal dystonia (AdLD). It is not known if longer latency is also a feature of a neurogenic voice disorder often misdiagnosed as AdLD, essential vocal tremor (EVT). We compared latency in participants with AdLD and EVT to characterize their distinct peripheral neurophysiology to improve diagnostic testing.

Read full abstract
Objectives

Latency in laryngeal electromyography (LEMG) is the time between muscle activation and voice onset, and is known to be longer than normal in adductor-type laryngeal dystonia (AdLD). It is not known if longer latency is also a feature of a neurogenic voice disorder often misdiagnosed as AdLD, essential vocal tremor (EVT). We compared latency in participants with AdLD and EVT to characterize their distinct peripheral neurophysiology to improve diagnostic testing.

Methods

Nineteen AdLD and 31 EVT participants underwent hooked-wire LEMG on bilateral thyroarytenoid (TA) and cricothyroid (CT) muscles simultaneously. Voice acoustic recording of voice-loaded sentences also occurred simultaneously in LabChart software. Latency was measured manually between the time (in ms) of the final ascent of the muscle signal to the peak burst prior to voicing and the time of voice onset in the acoustic waveform.

Results

Latency was longer in the AdLD group in three of the four muscles. For right TA, AdLD mean=1251 ms (SD=948), EVT mean=357 ms (SD=152), p=0.02, Cohen’s d=1.1. For left TA, AdLD=970 (681), EVT=416 (120), p=0.01, d=.95. Differences for right CT were not significant: AdLD=781 (642), EVT=649 (84), p=0.58, d=.28. For left CT, AdLD=1045 (717), EVT=533 (142), p=0.02, d=.79. Between muscle pairs, TAs and CTs behaved similarly in AdLD (p=0.53, d=0.19); in EVT, latency was significantly longer in CTs than TAs (p<.01, d=1.0).

Conclusions

Latency of AdLD was significantly longer than in EVT. Latency was not significantly different between TAs and CTs for AdLD, indicating these paired muscles act together for voice initiation. Latency of CTs was significantly longer than TAs in EVT, indicating the CTs contribute to voice initiation earlier than TAs in this disorder. These results indicate that latency could be an important measure in the differential diagnosis of AdLD versus EVT, and give novel insights into the neurophysiology of EVT.

Dr. Jenny Pierce is the Associate Director of the Voice, Airway, Swallowing Translational (VAST) Research Lab and a research assistant professor in the Department of Otolaryngology—Head and Neck Surgery at the University of Utah. She is also a clinical speech-language pathologist at LDS Hospital Voice and Swallowing Center.

Read full bio

Dr. Jenny Pierce is the Associate Director of the Voice, Airway, Swallowing Translational (VAST) Research Lab and a research assistant professor in the Department of Otolaryngology—Head and Neck Surgery at the University of Utah. She is also a clinical speech-language pathologist at LDS Hospital Voice and Swallowing Center. Dr. Pierce received a doctoral degree in speech-language pathology at the University of Utah with a minor in neuroscience. She completed two postdoctoral fellowships at the University of Utah. The first investigated a novel substance for treating vocal fold scar in an animal model. The second investigated (1) vascular connective tissues as a factor in onset of idiopathic vocal fold paralysis, and (2) evaluation methods in laryngeal dystonia and vocal tremor. Her primary area of research is neural assessment in neurogenic voice disorders.

Jesse Hoffmeister

Biomechanical Signatures of Voice Disruption Type in Adductor Laryngeal Dystonia

Jesse Hoffmeister
University of Minnesota
Additional authors: Maisie Simpson, Joseph Hayek, Stephanie Misono

Patients with adductor laryngeal dystonia (AdLD) experience involuntary vocal fold hyperadduction during connected speech, resulting in a variety of voice disruptions types (e.g., phonation breaks, rapid fundamental frequency (F0) shifts, and aperiodic phonation). Establishing potential physiologic differences between different disruption types could yield objective markers of disease severity and refine biomechanical models of AdLD. Because upper esophageal sphincter pressure (UESP) is thought to be tied to vocal fold adduction, concurrent measurement of UESP could be used to investigate potential differences between voice disruption types. This study aimed to examine whether UESP characteristics could distinguish between disruption types in AdLD.

Read full abstract
Objectives

Patients with adductor laryngeal dystonia (AdLD) experience involuntary vocal fold hyperadduction during connected speech, resulting in a variety of voice disruptions types (e.g., phonation breaks, rapid fundamental frequency (F0) shifts, and aperiodic phonation). Establishing potential physiologic differences between different disruption types could yield objective markers of disease severity and refine biomechanical models of AdLD. Because upper esophageal sphincter pressure (UESP) is thought to be tied to vocal fold adduction, concurrent measurement of UESP could be used to investigate potential differences between voice disruption types. This study aimed to examine whether UESP characteristics could distinguish between disruption types in AdLD. We hypothesized that all disruption types would have significantly higher UESP than non-disrupted phonation, and that phonation breaks, aperiodic segments, and pitch shifts would have highest to lowest UESP, respectively. We further hypothesized that aperiodic voicing and F0 shifts would have greater UESP range than phonation breaks.

Methods

Twenty-two adults with AdLD completed standardized speaking tasks while UESP was recorded. Phonation types (non-disrupted phonation, phonation breaks, F0 shifts, aperiodic segments) were manually identified and labeled. For each phonation type, maximum UESP and UESP range were calculated. Linear mixed-effects models assessed the effect of disruption type on UESP, controlling for age, sex, and resting UESP.

Results

Disruption type significantly predicted maximum UESP (F(3, 52.10) = 10.96, p < .001) and UESPrange (F(2, 33.51) = 24.01, p < .0001). Phonation breaks occurred at the highest pressures, while aperiodic segments and pitch shifts showed greatest pressure variability.

Conclusions

AdLD symptoms may reflect distinct biomechanical phenomena rather than a uniform hyperadduction, informing models of disease variability and treatment optimization.

Dr. Hoffmeister is an assistant professor in Otolaryngology at the University of Minnesota. His clinical practice and research focus on the assessment and treatment of voice, swallowing, and upper airway disorders.

Read full bio

Dr. Hoffmeister is an assistant professor in Otolaryngology at the University of Minnesota. His clinical practice and research focus on the assessment and treatment of voice, swallowing, and upper airway disorders. His lab uses a variety of invasive and non-invasive measures to assess upper airway biomechanics during voice production with the ultimate goal of improving diagnosis and treatment of neurogenic voice, swallowing, and upper airway disorders. His work is funded by the National Center for Advancing Translational Sciences (K12TR004373 and 1UM1TR004405-01A1), the Dystonia Coalition through the NIH’s Rare Diseases Clinical Network (NS065701, TR001456, and NS116025), and a Catalyst Award from the Dr. Ralph and Marian Falk Medical Trust.

Ahmed M. Yousef

Vocal Efficiency in Daily Life Tracks Therapy Change in Muscle Tension Dysphonia

Ahmed M. Yousef
Massachusetts General Hospital / Harvard Medical School
Additional authors: Emma C. Willis, Vahni Tagirisa, Robert E. Hillman, Daryush D. Mehta

We introduced an ecological vocal efficiency measure (EVE = sound pressure level (SPL) / subglottal pressure (Ps)) and tested whether it captures therapy-related change in primary muscle tension dysphonia (MTD).

Read full abstract
Objectives

We introduced an ecological vocal efficiency measure (EVE = sound pressure level (SPL) / subglottal pressure (Ps)) and tested whether it captures therapy-related change in primary muscle tension dysphonia (MTD).

Methods

Seven female participants with MTD and seven age- and sex-matched vocally healthy controls were enrolled. In the lab, Ps was estimated from intraoral pressure during repeated /p/-vowel sequences at varied loudness and pitch. Anterior neck-surface accelerometer (ACC) magnitude was linearly calibrated to Ps for in-field application. In the field, all participants completed three days of ambulatory voice monitoring; patients completed an additional three days after successful completion of voice therapy. A smartphone-based system measured vocal SPL via a miniature microphone, and Ps was estimated from the ACC signal, yielding in-field EVE [SPL(dB)/Ps(dB)]. For each patient, a normative reference range for EVE was defined as the median EVE ±1 SD from the matched control. Statistical analysis compared the proportion of phonation time outside the normative range pre- versus post-therapy using the Wilcoxon signed-rank test. Results & Conclusions: Participants contributed 72 monitoring days (mean 12.02 h/day; SD 3.39). Across controls, the EVE mean (SD) was 4.62 (0.49) dB/dB. Pre-therapy, patients spent on average 67% of phonation time outside the EVE normative reference range, significantly more than the time spent by controls (p=0.02), whereas the time spent outside the normative ranges of SPL (40%, p=0.11) and Ps (44%, p=0.38) was not significant. Post-therapy, atypical EVE time decreased to 46% and was no longer significantly different from controls (p=0.22), whereas time spent exhibiting atypical SPL (43%, p=0.047) and Ps (39%, p=0.47) largely remained the same compared with pre-therapy times. SPL or Ps alone did not consistently capture therapy effects, but their ratio (EVE) did, providing initial evidence that EVE may be an ecological marker of real-world improvement in patients with MTD.

Ahmed Yousef, PhD, is a Postdoctoral Research Fellow at Massachusetts General Hospital Voice Center and Harvard Medical School. He holds a dual PhD in Communicative Sciences and Disorders and Mechanical Engineering.

Read full bio

Ahmed Yousef, PhD, is a Postdoctoral Research Fellow at Massachusetts General Hospital Voice Center and Harvard Medical School. He holds a dual PhD in Communicative Sciences and Disorders and Mechanical Engineering. His research combines ambulatory voice monitoring, acoustic analysis, aerodynamic measures, and laryngeal imaging with machine learning to improve the assessment, treatment, and prevention of voice disorders.

Victoria S. McKenna

Investigating Vocal Fold Edema in Young Adult Vapers: Acoustic, Aerodynamic, and Vocal Fold Vibratory Kinematic Analysis

Victoria S. McKenna
University of Central Florida
Additional authors: Emma J. Burns, Tiffany M. Torres, Rita Patel

Electronic cigarettes, also known as “vapes” are nicotine delivery systems that first hit the U.S. market in 2006. Despite their pervasive use, few studies have focused on voice-related symptoms,1,2 voice acoustics,3,4 or vibratory kinematics.2,5 Therefore, we are conducting the following study to expand our knowledge of these measures in young adult vapers after relatively short-term use. We hypothesized that vapers would show evidence for increased vocal fold edema when compared to non-vapers.

Read full abstract
Objectives

Electronic cigarettes, also known as “vapes” are nicotine delivery systems that first hit the U.S. market in 2006. Despite their pervasive use, few studies have focused on voice-related symptoms,1,2 voice acoustics,3,4 or vibratory kinematics.2,5 Therefore, we are conducting the following study to expand our knowledge of these measures in young adult vapers after relatively short-term use. We hypothesized that vapers would show evidence for increased vocal fold edema when compared to non-vapers.

Methods

This study is ongoing. Young adult vapers and non-vapers (controls) first completed a standard clinical rigid stroboscopic evaluation to assess anatomy and coloration of the vocal folds and surrounding tissues. High-quality acoustics were captured at 40 kHz during sustained vowels and standard readings to evaluate measures of vocal quality (e.g., cepstral peak prominence, harmonics-to-noise ratio) and laryngeal tension (i.e., relative fundamental frequency). The Phonatory Aerodynamic System captured estimates of subglottal pressure and phonation threshold pressure during /pa-pa/ productions. Finally, rigid high-speed videoendoscopy sampled at 4kHz captured vocal fold vibration for sustained /i/, high-pitched /i/, and inspiratory /i/ tasks, with planned measures of glottal dynamics, mechanics, and symmetry. Mixed-effect models will investigate the impact of group and sex on each measure. Linear regressions will examine the relationship between vaping use characteristics (e.g., duration) on outcome measures for the vaping group.

Results

Thus far, we have enrolled 13 vapers (3 male, 10 female, Age = 20.5+/-1.8 yrs) and 20 controls. The average duration of vaping was 5.07+/-2.19 years, at 49.75+/-41.04 puffs per day, with 8/13 co-using marijuana. Video stroboscopic results revealed erythema or bilateral vocal fold edema in the mid-membranous vocal folds in 4/13 vapers. All other analyses are ongoing.

Conclusions

Emerging evidence suggests that young adults who have only used vaping products for a few years may exhibit changes to their voice. Acknowledgments: Seed Funding Initiative grant from the University of Central Florida (V.S.M.).

Dr. McKenna is an assistant professor in the School of Communication Sciences and Disorders and faculty in the Communication Technologies Research Center at the University of Central Florida. Dr. McKenna’s research lab aims to increase access to clinical tools and improve the efficiency of diagnosis and treatment for voice and swallowing disorders.

Sarah McDowell

Multivariate Analysis of Acoustic Parameters Associated with Vocal Tremor Severity

Sarah McDowell
University of Utah
Additional authors: J. Austin Collum, Kaitlyn Dwenger, Julie Barkmeier-Kraemer

The assessment of essential vocal tremor (EVT) is multidimensional oftentimes utilizing a combination of visual perceptual ratings of nasoendoscopy, auditory perceptual ratings of voice, and acoustic analysis of voice. The severity spectrum of VT is operationally defined by the degree of presence and/or absence of perceived tremor during connected speech and/or sustained phonation. This study aims to analyze a battery of acoustic measures during sustained phonation and connected speech to assess for group differences between controls and subjects with EVT across varying degrees of severity of the disorder.

Read full abstract
Purpose

The assessment of essential vocal tremor (EVT) is multidimensional oftentimes utilizing a combination of visual perceptual ratings of nasoendoscopy, auditory perceptual ratings of voice, and acoustic analysis of voice. The severity spectrum of VT is operationally defined by the degree of presence and/or absence of perceived tremor during connected speech and/or sustained phonation. This study aims to analyze a battery of acoustic measures during sustained phonation and connected speech to assess for group differences between controls and subjects with EVT across varying degrees of severity of the disorder.

Methods

Acoustic analysis was completed for four groups of five subjects each across the following categories: controls, mild EVT, moderate EVT, and severe EVT. Severity groupings were determined by consensus auditory perceptual ratings. Acoustic recordings were randomized and analyzed via PRAAT by blinded raters. Acoustic and temporal measures extracted include speech rate, voicing segment duration, total sentence duration, and rate and extent of sound pressure level (SPL) and fundamental frequency (f0) during sustained phonation.

Results

Eight acoustic parameters, including rate and extent modulations of f0 and SPL, speech rate, voicing segment duration, and total sentence duration, were found to significantly differentiate groups across various speech stimuli. Pairwise comparisons revealed differentiation trends between controls and moderate EVT, controls and severe EVT, and mild EVT and severe EVT dependent upon the acoustic variable.

Conclusions

Our findings demonstrate the importance of utilizing acoustic analysis as a diagnostic tool for assessing VT severity across sustained phonation, connected speech, and phoneme specific stimuli. Acoustic characteristics associated with more severe EVT include decreased speech rate, shortened voicing duration, increased segmentation of voicing, and increased extent of f0 and SPL.

Sarah McDowell is a speech-language pathologist at the University of Utah Voice Disorders Center and a Research Associate in the Department of Otolaryngology–Head and Neck Surgery. She completed her clinical fellowship at the University of Utah Voice Disorders Center, where she specialized in voice, upper airway, and swallowing disorders.

Read full bio

Sarah McDowell is a speech-language pathologist at the University of Utah Voice Disorders Center and a Research Associate in the Department of Otolaryngology–Head and Neck Surgery. She completed her clinical fellowship at the University of Utah Voice Disorders Center, where she specialized in voice, upper airway, and swallowing disorders.

Sarah earned her Master of Science in speech-language pathology from the University of Texas at Dallas and holds a Bachelor of Music in vocal performance from the University of North Texas. Her research interests include muscle tension dysphonia, performing voice, and neurologic voice disorders; she has presented nationally in these areas.

In addition, Sarah is a graduate of the Summer Vocology Institute (2021) and currently serves as a JEDI leader for the organization PAVES (Pathway for Advanced Voice Education and Service).

Maude Desjardins

Acoustic markers of chronicity in nonphonotraumatic vocal hyperfunction

Maude Desjardins
Université Laval
Additional authors: Timothy Pommée, Ariane Martel, Theodora Nestorova, Hugo Massé-Alarie, Ingrid Verduyckt, Françoise Chagnon

Despite the high prevalence of nonphonotraumatic complaints, including vocal fatigue, effort, and changes in vocal quality without structural laryngeal changes, mechanisms distinguishing occasional from chronic symptoms remain unclear. The objective of this study was to determine whether these profiles show distinct acoustic features, to help clarify how symptom duration affects vocal quality and motor behavior and inform prognosis assessment.

Read full abstract
Introduction

Despite the high prevalence of nonphonotraumatic complaints, including vocal fatigue, effort, and changes in vocal quality without structural laryngeal changes, mechanisms distinguishing occasional from chronic symptoms remain unclear. The objective of this study was to determine whether these profiles show distinct acoustic features, to help clarify how symptom duration affects vocal quality and motor behavior and inform prognosis assessment.

Methods

In this cross-sectional study, participants aged 18-60 years diagnosed with nonphonotraumatic hyperfunction were categorized as chronic or non-chronic using NIH-based standardized musculoskeletal chronicity criteria. Speech and voice tasks recorded in a sound-treated booth were analyzed in Praat to extract: Acoustic Voice Quality Index, CPPs, and spectral balance measures (slope, tilt, H1-H2, alpha ratio, L/H ratio) to capture voice quality in connected speech; perturbation and noise-related measures (HNR, GNE, jitter, shimmer, APQ11) to capture vocal stability in a vowel; and fundamental frequency (f₀) measures (mean, SD, min., max., range) in conversational speech to reflect pitch behavior. Linear regressions (including age/sex) examined the effect of chronicity status on each acoustic measure.

Results

43 participants (mean age 37.7±11 years; 28F/15M; 19 chronic/24 non-chronic) were included, with symptom duration ranging from 2 to 542 months (median = 81 months). Only f₀ measures derived from conversational speech showed associations with chronicity: controlling for age and sex, chronic participants exhibited significantly lower mean f₀ (β = -11.50 Hz, p = .025, 95% CI [-21.45, -1.54]) and lower max f₀ (β = -44.79 Hz, p = .013, 95% CI [-79.66, -9.93]) compared to nonchronic participants. No significant group differences were observed for other measures.

Conclusions

These preliminary findings suggest that symptom chronicity is not distinguished by traditional acoustic markers of voice quality. Instead, chronicity-related differences in nonphonotraumatic hyperfunction might arise in functional aspects of voice use (pitch regulation), reflecting vocal motor behavior rather than dysphonia severity.

Maude Desjardins is a speech-language pathologist, Assistant Professor at the School of Rehabilitation Sciences of Université Laval, and researcher at the Center for Interdisciplinary Research in Rehabilitation and Social Inclusion (CIRRIS) and the CHU de Québec – Université Laval Research Center. She completed her Ph.D. in Health and Rehabilitation Sciences at the Medical University of South Carolina in 2019 and a post-doctoral fellowship in the Voice and Motor Learning Lab from the University of Delaware in 2021.

Read full bio

Maude Desjardins is a speech-language pathologist, Assistant Professor at the School of Rehabilitation Sciences of Université Laval, and researcher at the Center for Interdisciplinary Research in Rehabilitation and Social Inclusion (CIRRIS) and the CHU de Québec – Université Laval Research Center. She completed her Ph.D. in Health and Rehabilitation Sciences at the Medical University of South Carolina in 2019 and a post-doctoral fellowship in the Voice and Motor Learning Lab from the University of Delaware in 2021. She is the Director of the Respiration, Speech & Voice Production Lab (RSVP), where her research focuses on identifying psychosocial and physiological factors contributing to the persistence of voice symptoms across the lifespan, with particular attention to disorders associated with vocal hyperfunction, fatigue, and aging. She is especially interested in how these factors shape interactions between the upper and lower respiratory systems during speech, with the goal of better understanding the motor adaptations that lead to chronic voice symptoms.

Miranda L. Wright

Voice Handicap in Individuals with Refractory Chronic Cough and Anxiety

Miranda L. Wright
University of Florida
Additional authors: Kaylea Hollingsworth, Alicia Vose, Julie Barkmeier-Kraemer

Dysphonia co-occurs in approximately 40% of individuals with refractory chronic cough (RCC). Cough is a phonotraumatic event that may contribute to vocal fold pathology, while shared laryngeal mechanisms (i.e., hypersensitivity) may underlie both cough and voice symptoms. Both conditions are associated with reduced quality of life and increased anxiety. This study aimed to examine the relationship between voice handicap and RCC and the potential contribution of anxiety.

Read full abstract
Introduction

Dysphonia co-occurs in approximately 40% of individuals with refractory chronic cough (RCC). Cough is a phonotraumatic event that may contribute to vocal fold pathology, while shared laryngeal mechanisms (i.e., hypersensitivity) may underlie both cough and voice symptoms. Both conditions are associated with reduced quality of life and increased anxiety. This study aimed to examine the relationship between voice handicap and RCC and the potential contribution of anxiety.

Methods

Participants were categorized into four groups: RCC-only, RCC+anxiety, anxiety-only, and healthy controls. Voice handicap (Voice Handicap Index; VHI), cough-related quality of life (Leicester Cough Questionnaire; LCQ), and anxiety (Generalized Anxiety Disorder 7; GAD-7) were assessed. Multiple linear regression examined associations between LCQ, anxiety, RCC status, and their interactions on VHI. Continuous predictors were mean-centered and robust standard errors were used.

Results

40 participants (55% female; mean age = 52 + 15.8 years) were included. The overall model significantly predicted VHI (R2 = .51, p < .001). More severe LCQ scores were associated with higher VHI scores (β = -12.19), although not statistically significant. Anxiety was not associated with VHI (β = -0.15). RCC status and interaction terms were not significant, indicating that the relationship between LCQ and VHI did not differ by RCC or anxiety.

Conclusions

Voice handicap in RCC appears to be more closely related to the functional impact of cough rather than anxiety or RCC diagnosis itself. These findings suggest that cough burden, rather than anxiety, may be the primary driver of voice-related impairment in individuals with RCC. This highlights the potential benefit of targeting voice outcomes in individuals with greater cough-related quality of life impairment even without perceptual dysphonia.

Miranda L. Wright, PhD, CCC-SLP, is a T32 postdoctoral fellow at the University of Florida Breathing Research and Therapeutics (BREATHE) Center, where she is mentored by Dr. Alicia Vose, PhD, CCC-SLP. She earned her PhD in Communication Sciences and Disorders from the University of Utah under the mentorship of Dr. Julie Barkmeier-Kraemer, PhD, CCC-SLP, where her work focused on the intersection of cough, voice, and psychophysiological processes.

Read full bio

Miranda L. Wright, PhD, CCC-SLP, is a T32 postdoctoral fellow at the University of Florida Breathing Research and Therapeutics (BREATHE) Center, where she is mentored by Dr. Alicia Vose, PhD, CCC-SLP. She earned her PhD in Communication Sciences and Disorders from the University of Utah under the mentorship of Dr. Julie Barkmeier-Kraemer, PhD, CCC-SLP, where her work focused on the intersection of cough, voice, and psychophysiological processes.

Dr. Wright has extensive clinical experience in the evaluation and management of voice, swallowing, and upper airway disorders across diverse clinical settings. She completed advanced fellowship training at the University of Wisconsin–Madison, further specializing in complex airway and voice conditions.

Her research program centers on refractory chronic cough, with a particular emphasis on treatment efficacy, symptom perception, and quality-of-life outcomes. She is especially interested in the role of psychological and physiological factors in symptom experience and recovery, with the goal of informing more targeted, patient-centered interventions.

Open block · 5:15 – 5:45 PM (closing remarks, networking)
Thursday · October 8
Opening Keynote
Jan G. Švec, Ph.D.

Jan G. Švec, Ph.D.

Palacký University, Olomouc, Czech Republic

Dr. Jan G. Švec is a Czech physicist conducting basic and clinical research on the production of human voice. He holds an MSc in fine mechanics and optics from Palacký University in Olomouc and a double PhD in biophysics and medical sciences from Palacký University and the University of Groningen in the Netherlands.

Read full bio

Dr. Jan G. Švec is a Czech physicist conducting basic and clinical research on the production of human voice. He holds an MSc in fine mechanics and optics from Palacký University in Olomouc and a double PhD in biophysics and medical sciences from Palacký University and the University of Groningen in the Netherlands. He has held postdoctoral appointments at the Denver Center for the Performing Arts and the University of Groningen. He is currently a Professor at Palacký University in Olomouc and serves as an associate research scientist at the clinical Voice and Hearing Centre in Prague. His work bridges the physics of phonation with clinical voice assessment, with a focus on high-speed imaging, acoustic analysis, and the measurement methodologies that translate laboratory findings into clinical practice.

Session 5 — Computational Modeling · Chair: Jan Švec
Zhaoyan Zhang

Effect of Formant Tuning on the Voice Source in a Three-Dimensional Computational Model of Phonation

Zhaoyan Zhang
University of California Los Angeles

Source-filter interaction in general has small effect on vocal fold vibration and the voice source in the frequency range typical of speech. However, this effect may become noticeable during formant tuning when a vocal tract resonance becomes close in frequency to either the first or second harmonic of the voice source spectrum, as often in classical or music theater singing. The goal of this study is to quantify the effect of source-filter interaction on the voice source during conditions of formant tuning when the first vocal tract resonance is closed to one of the lower-order harmonics of the voice source.

Read full abstract
Objectives

Source-filter interaction in general has small effect on vocal fold vibration and the voice source in the frequency range typical of speech. However, this effect may become noticeable during formant tuning when a vocal tract resonance becomes close in frequency to either the first or second harmonic of the voice source spectrum, as often in classical or music theater singing. The goal of this study is to quantify the effect of source-filter interaction on the voice source during conditions of formant tuning when the first vocal tract resonance is closed to one of the lower-order harmonics of the voice source.

Methods

Computational simulations were performed in a three-dimensional continuum model of voice production, with parametric changes in both the vocal fold and vocal tract configurations. Conditions of strong formant tuning were identified. Acoustic, aerodynamic, and vibratory measures of the voice source at these conditions were then compared to those at the same vocal fold conditions but with a reduced degree of formant tuning.

Results

The results showed that the effect of formant tuning on the voice source strongly depended on the specific vocal fold configuration. Formant tuning was observed to suppress irregular vocal fold vibration at some conditions but also induce new instabilities at some other conditions. The results are discussed with implications to vocal health and voice training.

Dr. Zhaoyan Zhang is a professor in Head and Neck Surgery at the University of California Los Angeles.

Takeshi Ikuma

Subharmonic Voice Signal Assessment with Quasi-Periodic Signal Models

Takeshi Ikuma
Louisiana State University Health Sciences Center
Additional authors: Andrew J. McWhorter, Melda Kunduk

This study proposes a new voice analysis approach using a time-varying harmonic signal model and spectral decomposition to separate the contributions of the subharmonic and nonharmonic components.

Read full abstract
Objectives

This study proposes a new voice analysis approach using a time-varying harmonic signal model and spectral decomposition to separate the contributions of the subharmonic and nonharmonic components.

Methods

Fundamental frequency and individual harmonic amplitudes are each modeled as a linear function to account for the slow variation of voice signals within a short analysis period. An acoustic signal segment in the short analysis window is estimated by two harmonic models, each with a different subharmonic mode (period-doubling or period-tripling). These models are then used to decompose the signal into three components: harmonic, subharmonic (modulation), and nonharmonic (e.g., chaotic vibration, non-subharmonic modulation, biphonation, and breathiness). Next, a set of bandlimited power estimates of each component is computed over selected spectral regions with the strongest harmonics for each signal component of each signal decomposition. Ratios of these power estimates are potentially useful in determining the likelihood of the subharmonic presence, its severity, and in which subharmonic period. In our study, we investigated the subharmonic-to-nonharmonic ratios (S/N’s) and subharmonic-to-harmonic ratios (S/H’s) as the discriminating measures.

Results

The proposed approach was tested with acoustic recordings of the sustained normal-pitch vowels from the Saarbrücken Voice Database (6125 recordings) with a 25-ms analysis interval. The bivariate distribution of the S/N and S/H demonstrates the presence of the subharmonics. In addition, a subset of samples is also manually assessed using both acoustic and electroglottogram signals, and will be presented to support these findings.

Conclusions

Analysis of irregular voicing is a critically underassessed component in clinical acoustic voice analysis. The proposed approach is an important building block to advance these investigations, and the findings motivate future studies on these effects due to the differing types of irregular vocal fold vibration.

Takeshi (Kesh) Ikuma is a resident researcher at the Department of Otolaryngology—Head Neck Surgery, Louisiana State University Health Sciences Center since 2020 after serving as a postdoctoral fellow in the years prior. He received PhD in electrical engineering from Virginia Tech, specializing in signal processing.

Read full bio

Takeshi (Kesh) Ikuma is a resident researcher at the Department of Otolaryngology—Head Neck Surgery, Louisiana State University Health Sciences Center since 2020 after serving as a postdoctoral fellow in the years prior. He received PhD in electrical engineering from Virginia Tech, specializing in signal processing. His main research interest lies in clinical voice signal analysis, especially in understanding how disordered vocal fold vibratory behaviors relate to the various voice signals and how to quantify such behaviors.

Theodora Nestorova

Vibrato Machine Learning AI Models as Diagnostic Biomarkers for Vocal Health, Dysphonia, Tmd, and Tremor

Theodora Nestorova
Viterbo University
Additional authors: Jiarui Xie, Yudan Ding, Stephen McAdams, Yaoyao Fiona Zhao, Luc Mongeau

Multi-signal data were collected from 35 singers previously diagnosed with primary muscle tension dysphonia (pMTD) or temporomandibular disorder (TMD) through a collaborative voice care team at the McGill University Health Centre. Acoustic data originated from two prior vibrato studies recorded in identical environments to ensure reliability. Fifty expert judges—singing teachers and clinical voice specialists—rated vibrato samples for health, biomechanical efficiency, expressiveness, and regularity on a five-point Likert scale. Perceptual ratings were compared with acoustic measures, including Acoustic Voice Quality Index (AVQI), Cepstral Peak Prominence Smoothed (CPPS), and vibrato variability time-profiles (Nestorova, 2025).

Read full abstract
Methods

Multi-signal data were collected from 35 singers previously diagnosed with primary muscle tension dysphonia (pMTD) or temporomandibular disorder (TMD) through a collaborative voice care team at the McGill University Health Centre. Acoustic data originated from two prior vibrato studies recorded in identical environments to ensure reliability. Fifty expert judges—singing teachers and clinical voice specialists—rated vibrato samples for health, biomechanical efficiency, expressiveness, and regularity on a five-point Likert scale. Perceptual ratings were compared with acoustic measures, including Acoustic Voice Quality Index (AVQI), Cepstral Peak Prominence Smoothed (CPPS), and vibrato variability time-profiles (Nestorova, 2025). Physiological measures—motion capture (MOCAP) and surface electromyography (sEMG)—quantified jaw kinematics and muscle activation in the TMD subset. A convolutional neural network (CNN) baseline was optimized using Optuna-based hyperparameter tuning, varying convolutional layers (1–3), kernel size (2–4), hidden units (100–200), learning rate (10⁻⁵–10⁻¹), and dropout (0.01–0.1).

Results

Greater vibrato variability correlated significantly with increased muscle tension, particularly among singers with painful TMD. Confusion matrix analysis across training (N=600), validation (N=75), and test (N=76) datasets revealed distinct diagnostic separability: Healthy Controls achieved 83–90% classification accuracy, pMTD 67–69%, and painful TMD 23–30%. The CNN reached 100% accuracy on synthetic and 97–98% on real audio data, effectively differentiating healthy vibrato from dysfunction-associated irregularities, though performance varied by condition.

Conclusions

Vibrato-based AI analysis shows strong diagnostic promise. The relationship between stability, muscle tension, and neuromotor control highlights vibrato as a sensitive marker for pMTD, TMD, and emerging potential as a vocal tremor biomarker. While model precision was high, broader datasets and standardized protocols are needed for clinical and pedagogical translation. Refinement of feature extraction may establish vibrato as a key biomarker linking vocal artistry with neuromuscular health.

Bulgarian-British-American voice and upper airway specialist, scientist, pedagogue, and performer Theodora Ivanova Nestorova serves as Assistant Professor of Speech-Language Pathology in the Department of Communication Disorders and Sciences at Viterbo University. Her interdisciplinary work bridges voice science, vocal pedagogy, and clinical practice, with research published in the Journal of Voice, Journal of the Acoustical Society of America, Voice and Speech Review, and book chapters with the National Center for Voice and Speech.

Read full bio

Bulgarian-British-American voice and upper airway specialist, scientist, pedagogue, and performer Theodora Ivanova Nestorova serves as Assistant Professor of Speech-Language Pathology in the Department of Communication Disorders and Sciences at Viterbo University. Her interdisciplinary work bridges voice science, vocal pedagogy, and clinical practice, with research published in the Journal of Voice, Journal of the Acoustical Society of America, Voice and Speech Review, and book chapters with the National Center for Voice and Speech. She also serves as Associate Co-Editor of the Mindful Voice column in the Journal of Singing and as Treasurer on the Board of Directors of the Pan American Vocology Association (PAVA).

Nestorova has presented at numerous international conferences, including the American Speech-Language-Hearing Association (ASHA) Convention and The Fall Voice Conference, and has received multiple best paper, poster, and presentation awards at Pan American Vocology Association (PAVA) and National Association of Teachers of Singing (NATS) symposia. Additional honors include The Voice Foundation’s Young Investigator’s Award, the NATS Voice Pedagogy Award, the NATS Emerging Leaders Award, and the McGill University Excellence in Teaching Award.

Nestorova holds a PhD in Interdisciplinary Studies from McGill University, an MBA in Arts Entrepreneurship from the Global Leaders Institute for Arts Innovation, and an MM in Vocal Pedagogy and Music-in-Education from the New England Conservatory. She is a Fulbright Research/Study Grant Scholar (University of Music and Performing Arts Vienna) and holds a BM from Oberlin College and Conservatory. She is currently pursuing clinical certification through an MS-SLP at Viterbo University. Theodora is certified as an Orofacial Myofunctional Therapist (AOMT-C) and is a PAVA-Recognized Vocologist.

Jesús A. Parra

Impact of Neuromuscular Variability on Lumped-Element Vocal Fold Models

Jesús A. Parra
Universidad Técnica Federico Santa María
Additional authors: Josué Martínez, Matías Zañartu

Low-dimensional voice production models typically rely on deterministic muscle activations, neglecting inherent neuromuscular variability. This study aims to implement a physiologically grounded stochastic activation scheme within a Triangular Body-Cover Model (TBCM) to investigate how neuromuscular noise impacts acoustic stability, comparing its performance against a standard Body-Cover Model (BCM).

Read full abstract
Objectives

Low-dimensional voice production models typically rely on deterministic muscle activations, neglecting inherent neuromuscular variability. This study aims to implement a physiologically grounded stochastic activation scheme within a Triangular Body-Cover Model (TBCM) to investigate how neuromuscular noise impacts acoustic stability, comparing its performance against a standard Body-Cover Model (BCM).

Methods

Motor unit recruitment and inter-spike interval variability were modeled for the thyroarytenoid and cricothyroid muscles using the size principle and signal-dependent noise. The generated stochastic neural drives were coupled to both the BCM (using static algebraic rules) and the TBCM (using 1D active Kelvin mechanics). Simulated activation dynamics were validated against in vivo intramuscular electromyography (iEMG) recordings from a healthy subject producing sustained vowels.

Results

Simulations successfully replicated physiological signal-dependent noise, demonstrating a progressive reduction in activation variability as the neural drive increased toward tetanization. This behavior closely matched the in vivo iEMG data. Furthermore, stability maps revealed that the TBCM and BCM exhibit fundamentally different regions of acoustic instability (jitter), proving that active viscoelastic mechanics filter neuromuscular noise differently than traditional static geometric rules. Figure: (Left) Simulated: Muscle activation and synthetic motor unit discharge patterns at various firing rates. (Right) Real: In vivo muscle activation and decomposed motor unit discharge patterns for the vowel /a/ across pitch conditions.

Conclusions

Integrating a stochastic neurological component into biomechanical models provides a biologically plausible representation of laryngeal dynamics. Explicitly coupling neural control with active tissue mechanics is essential to capture physiological acoustic perturbations, establishing a robust computational framework with immense clinical potential to simulate, assess, and diagnose both normal phonation and neurological vocal pathologies.

Jesús A. Parra is a postdoctoral researcher at Universidad Técnica Federico Santa María (UTFSM). His research specializes in voice science and bioengineering, with a primary focus on the biomechanical modeling of vocal fold dynamics.

Read full bio

Jesús A. Parra is a postdoctoral researcher at Universidad Técnica Federico Santa María (UTFSM). His research specializes in voice science and bioengineering, with a primary focus on the biomechanical modeling of vocal fold dynamics.

His work involves the development and implementation of advanced vocal fold models (such as the BCM and TBCM) to characterize phonation and voice production. Currently, his research focuses on creating estimation tools for normal and pathological vocal function to improve the ambulatory assessment of hyperfunctional voice disorders. By integrating signal processing and biomechanical principles, his work aims to bridge the gap between theoretical modeling and clinical monitoring of laryngeal physiology.

Morning Break · 9:30 – 9:45 AM
Session 6 — Imaging · Chair: Zhaoyan Zhang
Bernhard Jakubaß

Impact of Phonatory Context on Automated Measures of Glottal Attack Time, Glottal Offset Time, and Vocal Fold Phase Asymmetry Using Flexible High-Speed Videoendoscopy

Bernhard Jakubaß
Michigan State University
Additional authors: Maryam Naghibolhosseini, Hamide Ghaemi, Hamzeh Ghasemzadeh, Stephanie R.C. Zacharias, Dimitar D. Deliyski

Current knowledge of abnormal vocal fold vibratory behavior, including pre-, post-, and peri-phonatory adjustments, is mostly based on sustained vocalizations. Glottal onset, offset, and asymmetry measured in connected speech may contain more valuable quantitative information, given the limitations of sustained phonation, as human communication primarily occurs through connected speech. As an exploratory investigation of measurements from connected speech, this study examines the potential influence of phonetic context on vocal fold vibratory behavior using uniform vowel-consonant-vowel utterances. Glottal attack time, glottal offset time, and variation of the left–right relative phase asymmetry during six different VCV utterances are systematically examined.

Read full abstract
Objectives

Current knowledge of abnormal vocal fold (VF) vibratory behavior, including pre-, post-, and peri-phonatory adjustments, is mostly based on sustained vocalizations. Glottal onset, offset, and asymmetry measured in connected speech may contain more valuable quantitative information, given the limitations of sustained phonation, as human communication primarily occurs through connected speech. As an exploratory investigation of measurements from connected speech, this study examines the potential influence of phonetic context on VF vibratory behavior using uniform vowel-consonant-vowel (VCV) utterances. Therefore, glottal attack time (GAT), glottal offset time (GOT), and variation of the left-right relative phase asymmetry (vA%) and their interactions with adductory behaviors during six different VCV utterances are systematically examined.

Methods

Laryngeal high-speed videoendoscopy (HSV) recordings at 5,000 fps of six VCV utterances (/ipi/, /ifi/, /isi/, /izi/, /utu/, and /afa/), providing some phonetic variety, of 40 vocally healthy participants (20 men, 20 women) are analyzed. For each video recording, GAT, GOT and vA% are computed automatically and measured manually by a trained expert to serve as the ground truth for validation. The automated analysis consists of three steps: (1) temporal segmentation to identify the vocalized segments of the recording, (2) spatial segmentation of the vocal fold edges using a deep learning method, and (3) computation of the GAT, GOT, and vA% values during the phonatory segments. Subsequently, differences in GAT, GOT, and vA% values between the vowels of the VCV utterances are statistically analyzed.

Results and conclusions

The presentation discusses the impact of the phonatory context in the VCV utterances on GAT, GOT and vA% and the accuracy of the fully automated measures relative to the manually measured ground truth. For each VCV utterance, the general visibility and supraglottic obstruction of the VFs is investigated towards its applicability for automated vocal function evaluation. Additionally, the results are compared to literature.

Bernhard earned his Bachelor’s, Master’s and Ph.D. degree in medical engineering at the Friedrich-Alexander-University Erlangen-Nuremberg in Germany. At the Division of Phoniatrics and Pediatric Audiology of the University Hospital Erlangen he contributed to several projects utilizing Computational Fluid Dynamics simulations (CFD), ex-vivo larynx experiments, and high-speed video (HSV) analysis to gain new insights into the phonatory process.

Read full bio

Bernhard earned his Bachelor’s, Master’s and Ph.D. degree in medical engineering at the Friedrich-Alexander-University Erlangen-Nuremberg in Germany. At the Division of Phoniatrics and Pediatric Audiology of the University Hospital Erlangen he contributed to several projects utilizing Computational Fluid Dynamics simulations (CFD), ex-vivo larynx experiments, and high-speed video (HSV) analysis to gain new insights into the phonatory process.

Since 2024 at Michigan State University (MSU), Bernhard is currently a Postdoctoral Scholar with the Voice and Speech Lab at the Department of Communicative Sciences & Disorders (CSD). There, he uses machine learning to enhance the analysis of laryngeal HSV recordings.

His goals are to create tools and knowledge for researchers and practitioners that further increase the understanding of the phonatory process and that eventually support the diagnosis and treatment of patients with voice disorders.

Michael Döllinger

Automated Hoarseness Severity Grade Estimation Using High-Speed Videoendoscopy-Synchronized Acoustic Signals

Michael Döllinger
University Hospital Erlangen
Additional authors: Param Patel, Jonas Donhauser, Anne Schützenberger, Melda Kunduk

Hoarseness arising from different voice disorders may exhibit different acoustic signatures , complicating the development of standardized quantitative assessment. One commonly used clinical assessment is the quantitative analysis of the acoustical voice signal during phonation. During laryngeal high- speed videoendoscopy (HSV) the acoustic signal is synchronously rec orded. However, rigid endoscopy limits sustained phonation, resulting in HSV -synchronized acoustic recordings that are shorter and may differ acoustically from natural voice production.

Read full abstract
Objectives

Hoarseness arising from different voice disorders may exhibit different acoustic signatures , complicating the development of standardized quantitative assessment. One commonly used clinical assessment is the quantitative analysis of the acoustical voice signal during phonation. During laryngeal high- speed videoendoscopy (HSV) the acoustic signal is synchronously rec orded. However, rigid endoscopy limits sustained phonation, resulting in HSV -synchronized acoustic recordings that are shorter and may differ acoustically from natural voice production. Hence the goals of this study are to (1) d etermine an optimal, as short as possible, analysis time interval and (2) analyze the potential of the synchronously to HSV recorded acoustic signal for machine learning (ML) based hoarseness severity estimation .

Methods

Two databases containing normal voices, f unctional and organic voice disorders were constructed to systematically examine the robustness of severity estimation under constrained HSV examination and natural recording conditions. Database D₁ comprises 824 HSV -synchronized acoustic recordings of sustained vowel /i/, while Database D₂ includes 804 sustained vowel /a/ recordings from speech therapy sessions . Recording lengths of 250ms, 500ms, and 1000ms were analyzed . RBH ratings inferred from continuous speech served as ground truth. Subjects were categorized into two hoarseness levels ( H < 2 vs. H ≥ 2 ), while posterior probability was interpreted as a continuous interval -scaled severity score between 0 and 1 . A comprehensive acoustic feature set , including perturbation, spectral , cepstral , noise -related, and wavelet -based descriptors , was extracted and reduced via ensemble feature selection . ML-Models of varying complexity (logistic regression, SVM, XGBoost , TabNet) were evaluated across all window durations.

Results

Logistic regression with 10 features at a 500ms analysis window showed best performance and achieved rank correlations of 0.593 (D₁) and 0.720 (D₂) between ML predicted severity scores and perceptual ratings.

Conclusions

The synchronously to HSV recorded acoustic data shows high potential for reliable severity estimation.

Michael Döllinger studied mathematics at Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Germany. He received the Diploma (M.Sc.) degree in February 2000, and the Ph.D. degree in computer science from FAU, in November 2002.

Read full bio

Michael Döllinger studied mathematics at Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Germany. He received the Diploma (M.Sc.) degree in February 2000, and the Ph.D. degree in computer science from FAU, in November 2002. From 2003 to 2005, he was a Postdoctoral Fellow with the University of California, Los Angeles. Then, he returned to Erlangen, Germany, and he received his habilitation in Medical Pattern Recognition, a subarea of Artificial Intelligence, at FAU in 2006. In 2008 he became Professor for Computational Medicine and the Head of Research. He was Scientific Head of the DFG funded Research Group FOR894 ‘Fundamentals on Flow Dynamics in Voice Production’, from 2008 to 2013. Since 2008, he has been an Adjunct Professor with Louisiana State University, Baton Rouge. He is co-chair of the Division of Phoniatrics and Pediatric Audiology. His scientific contributions have been published in more than 200 peer reviewed journal papers and 400 conference contributions.

Azure Wilson

Automatic Three-Dimensional Laryngeal Segmentation from MRI Using a Deep Convolutional Neural Network

Azure Wilson
Morgan State University
Additional authors: Lei Zhang, Guofeng He, Elizabeth Hary, Lea Sayce, Zheng Li

Three-dimensional laryngeal reconstruction underlies biomechanical modeling, morphometric analysis, and simulation-based investigation of voice production. Manual segmentation remains the dominant workflow but introduces annotation burden and observer-dependent geometric variability. This study evaluates a deep convolutional learning framework for automated multi-structure laryngeal segmentation from MRI and examines both geometric performance and observer reliability.

Read full abstract
Objectives

Three-dimensional laryngeal reconstruction underlies biomechanical modeling, morphometric analysis, and simulation-based investigation of voice production. Manual segmentation remains the dominant workflow but introduces annotation burden and observer-dependent geometric variability. This study evaluates a deep convolutional learning framework for automated multi-structure laryngeal segmentation from MRI and examines both geometric performance and observer reliability.

Methods

High-resolution ex vivo MRI datasets from 18 rabbit larynges were manually segmented to delineate the thyroid cartilage, cricoid cartilage, arytenoid cartilages, and glottal airway. These annotations were used to train a three-dimensional convolutional neural network implemented using the nnU-Net architecture, which adapts preprocessing, network topology, and training parameters to dataset characteristics. Five-fold cross-validation was performed at the subject level. Segmentation accuracy was quantified using Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), and average surface distance (ASD). A blinded qualitative observer study evaluated anatomical fidelity using a 4-point ordinal scale. Interobserver and intraobserver agreement were quantified using intraclass correlation coefficients (ICC).

Results

Mean DSC values exceeded 0.80 across all anatomical classes. Cartilaginous structures demonstrated lower HD and ASD than the airway, reflecting greater boundary stability in regions of higher tissue contrast. Interobserver agreement for automated segmentations was good (ICC(2,1) = 0.34). Automated reconstructions exhibited smoother and more continuous three-dimensional surfaces relative to corresponding manual delineations.

Conclusions

A self-configuring 3D convolutional neural network can reliably automate multi-structure laryngeal segmentation from MRI while improving interobserver consistency relative to manual annotation. By reducing geometric irregularities and annotation variability, this framework supports reproducible three-dimensional reconstructions suitable for downstream biomechanical and fluid-structure modeling applications in voice physiology research.

Azure Wilson, PhD is a postdoctoral research fellow in Industrial and Systems Engineering at Morgan State University. Her work bridges communication science, laryngology, and bioengineering through image-based modeling and quantitative analysis of laryngeal structure and function.

Read full bio

Azure Wilson, PhD is a postdoctoral research fellow in Industrial and Systems Engineering at Morgan State University. Her work bridges communication science, laryngology, and bioengineering through image-based modeling and quantitative analysis of laryngeal structure and function. With training in speech-language pathology and a doctorate in Communication Science and Disorders from the University of Pittsburgh, she approaches voice physiology from an interdisciplinary, translational perspective.

Dr. Wilson’s research integrates high-resolution medical imaging with computational modeling frameworks to study vocal fold biomechanics and support reproducible three-dimensional anatomical reconstruction. She collaborates closely with engineering teams to analyze model outputs, evaluate geometric fidelity, and translate computational findings into clinically and physiologically meaningful interpretations. Her recent work has focused on automated multi-structure laryngeal segmentation using deep learning architectures, examining both geometric accuracy and observer reliability.

More broadly, her research program centers on improving the reproducibility and scalability of image-informed modeling pipelines for voice research. By combining anatomical reconstruction, quantitative evaluation, and structured qualitative assessment, her work aims to strengthen the methodological foundation linking imaging data to biomechanical and translational applications.

Harikrishnan Unnikrishnan

Optimal Sampling and Noise Mitigation for Vocal Fold Kinematics in High-Speed Videoendoscopy

Harikrishnan Unnikrishnan
Orchard Robotics
Additional authors: Rita Patel, Kevin Donohue

While the glottal area waveform (GAW) is the most widely reported HSDI-derived signal, it conflates lateral fold excursion with longitudinal glottal opening, precluding direct estimation of fold velocity, acceleration, and left-right asymmetry — kinematic features of established clinical relevance. In contrast, displacement waveforms preserve these quantities but are more susceptible to spatial quantization and temporal undersampling when frame rate is insufficient relative to the fundamental frequency. Here, optimal frame rate and spatial resolution, critical for reducing these errors, are investigated along with mitigation strategies.

Read full abstract
Objectives

While the glottal area waveform (GAW) is the most widely reported HSDI-derived signal, it conflates lateral fold excursion with longitudinal glottal opening, precluding direct estimation of fold velocity, acceleration, and left-right asymmetry — kinematic features of established clinical relevance. In contrast, displacement waveforms preserve these quantities but are more susceptible to spatial quantization and temporal undersampling when frame rate is insufficient relative to the fundamental frequency. Here, optimal frame rate and spatial resolution, critical for reducing these errors, are investigated along with mitigation strategies.

Methods

We derive closed-form bounds for the feasible frame-rate interval: fs,min = Nf₀(1+s) / (o·min(s, 1)) ; fs,max = Kf₀(1+s) / (o·max(s,1)), where f₀, o, s, K are fundamental frequency, open quotient, speed quotient, and peak glottal excursion in pixels. N is the minimum sample count in the opening/closing phases. Above fs,max, velocity signal-to-noise ratio paradoxically degrades as inter-frame displacement falls below one pixel. A valid interval exists if K ≥ N·max(s, s⁻¹). A Kalman smoother with sinc interpolation is evaluated against FFT denoising and spline interpolation on two datasets: 55 subjects (4000 fps for interpolation evaluation) and 12 subjects synthetically downsampled from 16000 fps to 2000–8000 fps benchmarked against a 16000 fps reference (smoother evaluation).

Results

Kalman-sinc processing reduced velocity and speed quotient Mean Absolute Percentage Error (MAPE) by 60–80% and ~66%; open quotient improvement was rate-dependent — at 4000 fps, the clinical standard, raw extraction was already reliable. Peak displacement MAPE remained low (8–14%), indicating robustness to the frame-rate–resolution trade-off.

Conclusions

For kinematic measurements, frame rate and spatial resolution must scale together. Operating near fs,min and reporting K alongside clinical HSDI measurements supports reliable kinematic estimation. Kalman-sinc processing is recommended for kinematic analysis. An online frame-rate calculator is available (doi:10.5281/zenodo.19120795).

Dr. Harikrishnan Unnikrishnan is a computer vision engineer and researcher with over 15 years of experience building ML-powered visual systems across robotics, healthcare, and enterprise applications. He holds a Ph.D. in Electrical Engineering (Computer Vision) from the University of Kentucky, where his dissertation focused on the analysis of vocal fold kinematics using high-speed video.

Read full bio

Dr. Harikrishnan Unnikrishnan is a computer vision engineer and researcher with over 15 years of experience building ML-powered visual systems across robotics, healthcare, and enterprise applications. He holds a Ph.D. in Electrical Engineering (Computer Vision) from the University of Kentucky, where his dissertation focused on the analysis of vocal fold kinematics using high-speed video.

His research in voice and laryngeal imaging includes the development of glottal area segmentation algorithms, kinematic biomarker extraction from high-speed videoendoscopy and structured light methods for vocal fold depth estimation. His work has been published in the Journal of Voice, the Journal of Speech, Language, and Hearing Research, and PLOS ONE.

In industry, he has held senior engineering and leadership roles at KeyMe, Butlr Technologies, Heystack.tech, and Orchard Robotics, where he has architected computer vision pipelines from research to production and deployed systems at scale across thousands of devices. His technical expertise spans deep learning, object detection, semantic segmentation, stereo vision, camera calibration, and edge computing.

Open block · 10:45 – 11:30 AM (panel, extended Q&A)
Lunch · 11:30 AM – 12:30 PM
Post-Lunch Keynote
Brad Story, Ph.D.

Brad Story, Ph.D.

University of Arizona

Dr. Brad Story is a Professor of Speech, Language, and Hearing Sciences at the University of Arizona. He earned his Bachelor’s in Applied Physics from the University of Northern Iowa and spent his early career in industry as an acoustics engineer, developing computer models and instrumentation systems for designing and measuring the performance of mufflers and other acoustic filters.

Read full bio

Dr. Brad Story is a Professor of Speech, Language, and Hearing Sciences at the University of Arizona. He earned his Bachelor’s in Applied Physics from the University of Northern Iowa and spent his early career in industry as an acoustics engineer, developing computer models and instrumentation systems for designing and measuring the performance of mufflers and other acoustic filters. That work led him to pursue a doctoral degree in Speech and Hearing Science at the University of Iowa. Following his PhD, he held research scientist appointments at the University of Iowa and at the Wilbur James Gould Voice Center in Denver, Colorado, before joining Arizona. His research focuses on the acoustics and aerodynamics of speech and voice production, computational modeling of the vocal tract, and the connections between physical filter design and the human vocal instrument.

Session 7 — Acoustic & Biomechanic Analyses · Chair: Brad Story
Isao T. Tokuda

Vocal-Ventricular Fold Co-Oscillations in Macaque Vocalizations

Isao T. Tokuda
Ritsumeikan University
Additional authors: Takeshi Nishimura

To explore functional role of the ventricular folds in the sound production of macaques, Macaca mulatta, excised larynx and in vivo experiments were carried out. In the excised larynx experiments, 20 recorded sounds out of 63 showed the occurrences of vocal-ventricular fold co-oscillations. Transitions from normal vocal fold oscillations to vocal-ventricular fold co-oscillations as well as chaotic irregular oscillations were also observed. The in-vivo experiments indicated that the vocal-ventricular fold co-oscillations were also observed in 2 macaque individuals.

Read full abstract

To explore functional role of the ventricular folds in the sound production of macaques, Macaca mulatta, excised larynx and in vivo experiments were carried out. In the excised larynx experiments, 20 recorded sounds out of 63 showed the occurrences of vocal-ventricular fold co-oscillations. Transitions from normal vocal fold oscillations to vocal-ventricular fold co-oscillations as well as chaotic irregular oscillations were also observed. The in-vivo experiments indicated that the vocal-ventricular fold co-oscillations were also observed in 2 macaque individuals. In both excised larynx and in vivo experiments, the vocal-ventricular fold co-oscillations significantly lowered the fundamental frequency. Next, we studied how the ventricular folds interact with vocal membranes. The vocal membrane is an accessory extension of the vocal fold, observed in a wide range of species including bats and primates but not in humans. Recent in vivo study found that, in vocalizations of non-human primates such as chimpanzees and macaques, the vocal membranes always vibrate, while the vocal folds vibrate only sometimes but never alone. Using a physical model of a macaque larynx, we observed that the ventricular folds can oscillate synchronously either with the vocal folds or with the vocal membranes depending upon the adduction level of the vocal membranes. The locking frequency ratio was 1:1 between the vocal folds and the ventricular folds and 2:1 between the vocal membranes and the ventricular folds. Again, co-oscillations with the ventricular folds significantly lowered the fundamental frequency and increased the vocal efficiency. A mathematical model further elucidated that the frequency ratio of 2:1 observed between the vocal membranes and the ventricular folds was due to the ratio between their natural frequencies. It is argued from a physiological standpoint that the macaques may utilize the ventricular fold oscillations more frequently than humans. The advantages as well as disadvantages of using the ventricular folds as an additional vocal repertoire are discussed.

Isao T. Tokuda is a Professor of Mechanical Engineering at Ritsumeikan University, Japan. In 2000, he completed the Ph.D in Mathematical Engineering at University of Tokyo, Japan.

Read full bio

Isao T. Tokuda is a Professor of Mechanical Engineering at Ritsumeikan University, Japan. In 2000, he completed the Ph.D in Mathematical Engineering at University of Tokyo, Japan. From 2003 to 2004, he was an Alexander-von-Humboldt Research Fellow at Humboldt University of Berlin, Germany. As a specialist of nonlinear dynamics, he worked in various research fields ranging from human voice, animal vocalization, theoretical chronobiology, computational neuroscience to robotics.

Sooah Ellen Park

Physiological Measurement of Performance Anxiety Reduction Through the Four Vocal Strategies

Sooah Ellen Park
University of Texas at Tyler

The rapidly rising rates of anxiety among college students, has created an urgent need for evidence-based vocal interventions that voice teachers can implement confidently within their scope of practice. These students present with challenges such as difficulty focusing, heightened sensitivity to mistakes, and extreme muscle tone in the masseter. While Stephen Porges’s Polyvagal Theory has been introduced to the clinical field, the specific benefits have not been quantified that can be used in the voice studio teaching. This study investigates whether specific vocal warm-up techniques (humming, nostril breathing, gurgling while phonation, and pulsating) can measurably regulate vagal tone and reduce anxiety when students sing in voice lessons and performances.

Read full abstract
Objectives

The rapidly rising rates of anxiety among college students, has created an urgent need for evidence-based vocal interventions that voice teachers can implement confidently within their scope of practice. These students present with challenges such as difficulty focusing, heightened sensitivity to mistakes, and extreme muscle tone in the masseter. While Stephen Porges’s Polyvagal Theory has been introduced to the clinical field, the specific benefits have not been quantified that can be used in the voice studio teaching. This study investigates whether specific vocal warm-up techniques (humming, nostril breathing, gurgling while phonation, and pulsating) can measurably regulate vagal tone and reduce anxiety when students sing in voice lessons and performances.

Methods

The masseter muscle is a highly sensitive indicator of psychophysiological tension. Using surface electromyography (sEMG) to monitor masseter stiffness and a Polar H10 chest-strap ECG to measure heart rate variability (HRV), we measured participants’ physiological stress responses before and after the four vocal interventions with secondary outcomes including self-reported anxiety (STAI-S) and blind expert vocal performance evaluations.

Results

Primary outcome (sEMG): Masseter RMS (normalized) across phases (Baseline, Intervention, Recovery, Performance); change scores (Intervention–Baseline; Performance–Baseline). Secondary: Left–Right masseter asymmetry = |L–R|/mean × 100; SCM RMS during performance; Exploratory: correlations with STAI-S and expert vocal tension ratings. Findings have direct implications for voice pedagogy, supporting the integration of neuroscience-informed strategies into applied voice instruction.

Conclusions

Pedagogically, it offers the first systematic, multi-measure physiological evidence for the anxiety-regulatory effects of common vocal techniques, equipping voice teachers with empirically grounded tools for supporting students with performance and learning anxiety. Scientifically, it extends the application of polyvagal theory and HRV biofeedback methodology into singing voice. Clinically, the findings may inform therapeutic voice work for populations managing anxiety, somatic tension disorders, or neurodevelopmental conditions where vocal and ANS regulation intersect. Acknowledgments: The author declares no competing interests. No financial or non-financial relationships exist that could have influenced the design, conduct, or reporting of this research.

Sooah Ellen Park, D.M.A., is Associate Professor of Music at The University of Texas at Tyler, where she combines her expertise in vocal performance with theatrical direction. Dr. Park holds a Doctoral of Musical Arts in Voice with Opera Emphasis and a Master of Music in Opera Performance from The University of Texas at Austin, a Bachelor of Music in Voice Performance from Eastman School of Music, and a Master of Business Administration from The University of Texas at Tyler.

Read full bio

Sooah Ellen Park, D.M.A., is Associate Professor of Music at The University of Texas at Tyler, where she combines her expertise in vocal performance with theatrical direction. Dr. Park holds a Doctoral of Musical Arts in Voice with Opera Emphasis and a Master of Music in Opera Performance from The University of Texas at Austin, a Bachelor of Music in Voice Performance from Eastman School of Music, and a Master of Business Administration from The University of Texas at Tyler.

An accomplished recording artist, Dr. Park has released two albums: Cecile Chaminade Songs with Albany Records and Dream Songs with CD Baby. Her recordings of Charles Cadman’s Four American Indian Songs (‘From Wigwam and Tepee’) are available on Apple Music, Amazon Music, Spotify, and over 120 digital streaming platforms. In 2023, she performed a solo recital at Carnegie Hall, marking a significant milestone in her performing career.

Dr. Park is an active scholar and presenter at international and national conferences, including the College Music Society (CMS), National Association of Teachers of Singing (NATS), Pan American Vocology Association (PAVA), Texas Music Educators Association (TMEA), and National Opera Association (NOA). As a theatrical director at UT Tyler, she has helmed fully staged productions including The Hunchback of Notre Dame, Into the Woods, Once On This Island, The Magic Flute, Hansel and Gretel, The Merry Widow, and The Gondoliers, among many others.

Ingo Titze

Toward Estimation of Inertagrams from External Audio and Video Signals

Ingo Titze
National Center for Voice and Speech
Additional authors: Anil Palaparthi, Brad Story

Inertagrams are the key graphic illustrations for assessing the effect of airways on the full spectrum of source frequencies. A direct calculation is possible if the vocal tract area function is known. This area function is generally obtained with MRI or CT imaging. However, such imaging procedures are still relatively expensive, time consuming, and unavailable for routine assessment in clinics and studios.

Read full abstract
Objective

Inertagrams are the key graphic illustrations for assessing the effect of airways on the full spectrum of source frequencies. A direct calculation is possible if the vocal tract area function is known. This area function is generally obtained with MRI or CT imaging. However, such imaging procedures are still relatively expensive, time consuming, and unavailable for routine assessment in clinics and studios.

Procedure

Two non-invasive measurements, a microphone signal and a calibrated image of the mouth area, are proposed as potential inputs. In this preliminary study, a neural network is trained to predict the area function from an inventory of known area functions with their corresponding resonance frequencies and mouth areas. To test the neural network predictions, mouth areas and formant frequencies outside the training set are used as inputs to predict area functions outside the training set. From these predicted area functions, the vocal tract input impedances are calculated, which then yields the inertagrams.

Results

At this stage, the procedure is purely computational. Errors between the known and predicted area functions are discussed. A primary limitation with human subjects will be the extraction of formant frequencies at high fundamental frequencies.

Dr. Ingo Titze, educated as a physicist (Ph.D.) and engineer (M.S.E.E.), has applied his scientific knowledge to a lifelong love of clinical voice and vocal music. His research interests include biomechanics of human tissues, acoustic phonetics, speech science, voice disorders, professional voice, music acoustics, and the computer simulation of voice.

Read full bio

Dr. Ingo Titze, educated as a physicist (Ph.D.) and engineer (M.S.E.E.), has applied his scientific knowledge to a lifelong love of clinical voice and vocal music. His research interests include biomechanics of human tissues, acoustic phonetics, speech science, voice disorders, professional voice, music acoustics, and the computer simulation of voice. He is the father of vocology, a specialty in speech-language pathology. He defined the word as ‘the science and practice of voice habilitation.’

Fabian Thomas

Effects of Vocal Intensity, Fundamental Frequency, Gender and Age on CPPS, HNR, Jitter and Shimmer in Vowel Phonation and Text of Adults Without and With Mild Dysphonia

Fabian Thomas
University of Education Weingarten, Germany
Additional authors: Meike Brockmann-Bauser

Acoustic assessment metrics including CPPS, HNR, jitter and shimmer vary with voice loudness and fundamental frequency (fo) in vowel phonation. This work investigates the effects of calibrated intensity (SPL), fo, vowel (SV) and text (TR), and gender on CPPS, HNR, jitter and shimmer in adults without and with mild dysphonia.

Read full abstract
Objectives

Acoustic assessment metrics including CPPS, HNR, jitter and shimmer vary with voice loudness and fundamental frequency (fo) in vowel phonation. This work investigates the effects of calibrated intensity (SPL), fo, vowel (SV) and text (TR), and gender on CPPS, HNR, jitter and shimmer in adults without and with mild dysphonia.

Methods

SV and TR recordings of 927 speakers (699 women, 228 men; mean age 20.5± 3.5 years; range 17–45) without (G0, n=328 [GRBAS]) and with mild dysphonia (G1, n=599) were retrospectively investigated with PRAAT. The effects of calibrated SPL (dB) and fo (Hz), gender and age on the dependent variables (CPPS/ HNR/ jitter/ shimmer) in SV and TR were investigated with Linear Mixed Models (LMM). Multiple regression models with standardized coefficients (β) and squared semi-partial correlations (sr²) were applied to determine effect sizes.

Results

The applied LMM models explained the majority of the observed variance (R²=.81–.94). SPL was the dominant predictor of CPPS in SV (sr²SPL≈.20) and TR (sr²SPL≈.22), and of HNR in SV (sr²SPL≈.19, all p<.001). In SV, SPL predicted lower jitter and shimmer (sr²SPL≈ .06–.10; p< .001), while in TR, there were only small effects for jitter (sr²SPL≈.02; p<.001). Moreover, SPL×gender interactions were stronger in males (CPPS: β_female≈0.96 vs. β_male≈2.51; Δβ≈1.55, p<.001; ΔR²≈.02). In TR, fo was the primary predictor of HNR and jitter (sr²fo≈ .07–.09). The interaction of SPL× fo had significant effects on jitter in SV (ΔR² = .011, p < .001), but negligible effects for CPPS and HNR. There was no significant age effect on any of the parameters (p<.05).

Conclusions

SPL exerts strong positive effects on CPPS and HNR in both SV and CS, moderated by gender, regardless of fo and age. This highlights the importance of documenting gender-specific SPL norms when interpreting cepstral and harmonic voice measures. Acknowledgments: We would like to thank Kerstin Hillegeist for supporting the development and implementation of the Weingarten Voice Screening.

Thomas Fabian, M.A., is a PhD candidate and academic staff member in the departments of German Studies and Speech and Language Therapy at the University of Education Weingarten, Germany. His research addresses a clinically urgent question: can systematic voice screening at the start of teacher training reliably identify at-risk voices early enough to enable timely intervention?

Read full bio

Thomas Fabian, M.A., is a PhD candidate and academic staff member in the departments of German Studies and Speech and Language Therapy at the University of Education Weingarten, Germany. His research addresses a clinically urgent question: can systematic voice screening at the start of teacher training reliably identify at-risk voices early enough to enable timely intervention? He examines the validity and reliability of acoustic and perceptual voice assessment methods, investigating how key acoustic measures relate to perceptual ratings across a broad clinical spectrum — from healthy voices to severe dysphonia. Drawing on a longitudinal screening corpus that now exceeds 1,800 recordings and grows each semester, his work underscores the importance of standardized, gender-specific norms for valid acoustic voice assessment.

Thomas holds a B.A. in Speech and Language Therapy (HAWK Hildesheim, 2008) and an M.A. in Speech Science (Philipps University Marburg, 2012). Following an early appointment at Weingarten, he spent three years teaching linguistics and oral communication at Ocean University of China (2014–2017). Since 2017, he has led courses in computer-based voice diagnostics, phonetics, and rhetorical communication at Weingarten, where he co-developed the institutional voice screening program. He has published on the reliability of perceptual and acoustic voice assessment (Sprache · Stimme · Gehör, 2021), and currently serves as guest editor of a special issue on voice disorder prevention in teachers in the same journal.

Katherine L. Marks

Acoustics and Aerodynamics: Clinical Perspectives and Practice Patterns

Katherine L. Marks
Emory University
Additional authors: Meghana Darla, Michael Madoule, Kathleen F. Nagle

Acoustic and aerodynamic measures are widely used in voice research to quantify vocal function, yet their clinical implementation remains inconsistent. Although the American Speech-Language-Hearing Association released recommendations for instrumental voice assessment in 2018, there is limited evidence that describes current clinical practice. This study aimed to characterize current practice patterns and speech-language pathologist (SLP) perspectives on the perceived benefits and challenges of acoustic and aerodynamic assessment.

Read full abstract
Objectives

Acoustic and aerodynamic measures are widely used in voice research to quantify vocal function, yet their clinical implementation remains inconsistent. Although the American Speech-Language-Hearing Association released recommendations for instrumental voice assessment in 2018, there is limited evidence that describes current clinical practice. This study aimed to characterize current practice patterns and speech-language pathologist (SLP) perspectives on the perceived benefits and challenges of acoustic and aerodynamic assessment.

Methods

A REDCap survey was distributed via Fall Voice Doc Matter and ASHA SIG 3 Listservs to U.S.-based SLPs whose caseloads consisted of ≥30% voice patients. Items addressed prior training, current acoustic and aerodynamic use, perceived clinical value, barriers to implementation, and open-ended feedback. Descriptive statistics and subgroup comparisons were conducted across current users, sometimes-users, and non-users.

Results

Among 111 respondents, 70% currently collect voice recordings, 17% do so sometimes, and 13% do not. Acoustic analysis was less common: 62% analyze recordings, 14% analyze sometimes, and 23% do not. Commonly endorsed beliefs included that acoustic analysis offers objective data that complement subjective measures (88%), plays an important role in evaluation (66%), and reveals features not perceived by ear (57%). However, 36% felt that acoustic analysis does not influence clinical decision-making. For aerodynamic measures, 46% of respondents currently use them, 19% use them sometimes, and 35% do not. Respondents noted that aerodynamic assessment complements other evaluation components (86%), aids in characterizing vocal function (71%), and provides insight not obtained through auditory-perceptual judgment (62%). Reported barriers included logistical constraints (31%), lack of equipment (28%), time burden (22%), and need for additional training (19%).

Conclusions

Respondents valued acoustic and aerodynamic measures for patient education, progress tracking, and specific clinical cases, but use was limited by time demands and uncertainty regarding clinical application. Findings highlight persisting gaps that restrict broader clinical adoption of instrumental voice assessment. Acknowledgments: Early results of the acoustic section of the survey were included as part of a broader presentation at the 2025 ASHA Convention.

Katherine L. Marks, PhD, CCC-SLP is an Assistant Professor of Otolaryngology at Emory University School of Medicine. She is a speech-language pathologist at the Emory Voice Center.

Read full bio

Katherine L. Marks, PhD, CCC-SLP is an Assistant Professor of Otolaryngology at Emory University School of Medicine. She is a speech-language pathologist at the Emory Voice Center. Dr. Marks’ program of clinical research focuses on investigating and measuring vocal behaviors associated with voice disorders and implementing them in clinical practice. She has a specific interest in laryngeal dystonia and tremor, working to develop and validate measures that may aid a differential diagnosis.

Christopher Apfelbach

Birds of a Feather: Detecting and Characterizing Archetypes of Vocal Demand Responses During Vocal Loading Using Cluster Analysis

Christopher Apfelbach
University of Minnesota
Additional authors: Mark Berardi, Eric Hunter

During phonation, voice users engage specific biomechanical and aerodynamic adaptations termed vocal demand responses to meet their vocal demands. The subsets of adaptations engaged are, presumably, unique to each voice user. In this exploratory study, we examine analytic methods for describing group-level vocal performance during a laryngeal diadochokinetic (LDDK) vocal loading task (VLT) without flattening heterogeneity across subjects. These methods may (1) suggest archetypal categories of vocal demand responses and (2) evaluate the functional consequences of specific archetypal responses on vocal health and quality of life.

Read full abstract
Objectives

During phonation, voice users engage specific biomechanical and aerodynamic adaptations termed vocal demand responses to meet their vocal demands. The subsets of adaptations engaged are, presumably, unique to each voice user. In this exploratory study, we examine analytic methods for describing group-level vocal performance during a laryngeal diadochokinetic (LDDK) vocal loading task (VLT) without flattening heterogeneity across subjects. These methods may (1) suggest archetypal categories of vocal demand responses and (2) evaluate the functional consequences of specific archetypal responses on vocal health and quality of life.

Methods

As described in our companion abstract, 30 participants completed two 30-minute, intervallic LDDK-based VLTs against 0 and 5 cm H2O back pressure, respectively. Airflow was captured continuously to calculate pulse-by-pulse LDDK rate ( ) , AC airflow ( ), and DC airflow ( ); Borg CR100 ratings of 𝑣 𝑖 𝑛 𝑠 𝑡𝑄 𝐴 𝐶𝑄 𝐷 𝐶 perceived exertion ( ) were collected once per minute. Within-subjects feature matrices were then analyzed using 𝑅 𝑃 𝐸 k- means and HDBSCAN clustering techniques whose hyperparameters had been optimized in iterative analyses.

Results

Both k- means and HDBSCAN clustering produced similar results, suggesting adequate convergent validity. Silhouette scores and dendrogram visualization most strongly supported two- or five-cluster configurations: Clusters 2 and 4 were characterized, respectively, by low and high AC airflow, LDDK rate, and RPE that remained stable over time; Cluster 1, by exceptional RPE and early task termination; Cluster 3, by declining DC airflow and LDDK rate; and Cluster 5, by sharp discontinuities in performance during the early intervals of the VLT.

Conclusions

Using machine learning-based clustering techniques, we demonstrate that it is possible to construct archetypes of vocal demand responses during an LDDK-based VLT. We propose that these techniques should be adopted into the voice research canon, as they may illuminate why certain archetypes predispose speakers to developing voice disorders while others do not.

Dr. Christopher Apfelbach, PhD, CCC-SLP, is an assistant professor in the Department of Speech-Language-Hearing Sciences at the University of Minnesota. His work investigates how speakers respond to strenuous vocal exercise, how vocal performance and pacing strategies shift in the face of fatigue or effort, and how vocal and whole-body exercise modalities may prepare voice users to meet their daily vocal demands.

Astrid Isabel Frykman

Investigating the Latent Space’s Connection to the Biomechanics of Voice Production: Voice Map Analysis Using Variational Autoencoders

Astrid Isabel Frykman
KTH Royal Institute of Technology
Additional authors: Bob L. T. Sturm, Sten Ternström

This study investigates how variational autoencoder (VAE) models trained on electroglottographic and acoustic voice maps can reveal latent structure related to the biomechanics of voice production. The main aim is to separate the latent dimensions and interpret them as biomechanical axes that reflect different aspects of voice production.

Read full abstract
Objectives

This study investigates how variational autoencoder (VAE) models trained on electroglottographic and acoustic voice maps can reveal latent structure related to the biomechanics of voice production. The main aim is to separate the latent dimensions and interpret them as biomechanical axes that reflect different aspects of voice production.

Methods

Forty-three voice maps from the KinderEGG dataset were used to train several VAE variants: a baseline model trained on normalized point-wise data, a conditional VAE using the voice field from the voice maps fundamental frequency (f_0) and Sound Pressure Level (SPL) as conditioning variables, a patch-based conditional model to incorporate local structural context, and an attention-based model. After training, reconstruction quality was evaluated by comparing reconstructed maps with the original data. The latent spaces were evaluated using dimensionality reduction techniques, including t-SNE, to assess clustering, separation, and interpretability across voice types (e.g., male/female/children, etc.). Correlation and latent traversal analyses were used to examine whether individual latent dimensions could be associated with distinct biomechanical factors.

Results

Preliminary analysis showed that the VAE framework learned a compact latent representation that preserved structure in the voice maps and revealed separation among different voice categories. Conditioning on f_0 and SPL improved latent organization relative to the baseline model, while structural-context and attention-based variants are expected to improve reconstruction fidelity and latent disentanglement. The latent dimensions showed potential to reflect biomechanical elements of voice production.

Conclusions

Variational autoencoders provide a promising framework for investigating the relationship between voice-map structure and the biomechanics of voice production. Future work will focus on improving separation of latent dimensions and assigning them clearer biomechanical interpretations.

Astrid Isabel Frykman holds an MS in Machine Learning from KTH, following a BSc in Engineering Physics at KTH. For the past six months, she has worked under the supervision of Prof. Sten Ternström and Assoc. Prof. Bob L. T. Sturm at TMH, researching voice biomechanics using variational autoencoders.

Yuwen Sun

Front-Vowel F2 Compression at High Pitch as an Acoustic Marker of Vocal-Tract Adjustment in Trained Sopranos

Yuwen Sun
Tongji University

High-pitch singing requires coordinated laryngeal and vocal-tract adjustments that can systematically alter vowel acoustics. We aimed to quantify pitch-dependent vowel modification using controlled recordings and to identify acoustic markers that are sensitive to high-pitch demands.

Read full abstract
Objectives

High-pitch singing requires coordinated laryngeal and vocal-tract adjustments that can systematically alter vowel acoustics. We aimed to quantify pitch-dependent vowel modification using controlled recordings and to identify acoustic markers that are sensitive to high-pitch demands.

Methods

Twenty-seven trained Mandarin-speaking female sopranos produced sustained vowels (/a, e, i, o, u/) under controlled conditions in sung D4 and sustained A4 tasks (plus a speech-like baseline). From the mid portion of each take, we extracted F0, within-take F0 variability (SD(F0)), and vowel formants (F1, F2). We fit a linear mixed-effects model on sung productions with F2 as the dependent variable and fixed effects of vowel, pitch (D4 vs A4), and their interaction, with a random intercept for subject.

Results

Vowel-space plots showed systematic reconfiguration between D4 and A4. The mixed-effects model revealed robust vowel-dependent differences in F2 at D4 and a modest pitch-related F2 increase for /a/ at A4 (β=133.9, p=0.017). Critically, significant negative vowel×pitch interactions for the front vowels /e/ (β=-196.3, p=0.013) and /i/ (β=-412.9, p<0.001) indicated relative F2 reduction at A4 compared with D4, consistent with increased vowel modification under high-pitch conditions.

Conclusions

Front-vowel F2 compression emerges as a sensitive acoustic marker of high-pitch vocal-tract adjustment in trained sopranos. This controlled baseline can support future work linking acoustic changes to physiological measurements and modeling of source–filter interactions under high-pitch demands.

Yuwen Sun is a vocalist and interdisciplinary researcher working at the intersection of vocal performance, voice science, linguistics, music psychology, and artificial intelligence. She holds a PhD in Music from the University of Minnesota, Twin Cities, with a minor in Speech-Language-Hearing Sciences, an MM in Vocal Performance from the New England Conservatory, and a BA in Music Education from East China Normal University.

Read full bio

Yuwen Sun is a vocalist and interdisciplinary researcher working at the intersection of vocal performance, voice science, linguistics, music psychology, and artificial intelligence. She holds a PhD in Music from the University of Minnesota, Twin Cities, with a minor in Speech-Language-Hearing Sciences, an MM in Vocal Performance from the New England Conservatory, and a BA in Music Education from East China Normal University.

She is currently a Postdoctoral Researcher at the College of Design, Tongji University, where her research focuses on AI-based vocal health mechanisms and interactive therapeutic design. Her previous research experience includes laryngeal anatomy and vocal fatigue studies at the University of Minnesota, as well as neuroimaging and cognitive science research at East China Normal University. Her scholarly work has been published in Journal of Voice, Behavioral Sciences, and Philosophia, among others, and she serves as an ad hoc reviewer for several international journals including Musicae Scientiae and PLOS ONE.

Alongside her academic work, Sun maintains an active international performing career. She has appeared in major operatic productions and concerts at venues such as the Vienna Musikverein (Golden Hall), the Minnesota Opera, and leading stages in Berlin and Prague, and has presented lecture-recitals that integrate anatomy, acoustics, phonetics, and multilingual vocal practice. Her work is driven by a commitment to bridging artistic practice, scientific inquiry, and technological innovation in music and voice studies.

Open block · 3:00 – 3:30 PM (panel, extended Q&A)
Afternoon Break · 3:30 – 3:45 PM
Session 8 — Vocal Fatigue & Demand, Mindfulness · Chair: Christopher Apfelbach
Carlos Calvache

Operationalizing Vocal Demand Response (VDR): Construction and Validation of the VDR-Index

Carlos Calvache
Iberoamericana / Vocology Center
Additional authors: Lady Catherine Cantor-Cutiva, Eric J. Hunter

To develop and validate an objective Vocal Demand Response Index (VDR-Index) that operationalizes structured demand response behavior under standardized communicative challenge. The VDR-Index was constructed and cross-validated in an English-speaking population through a methodological model-development approach, leveraging the PDVQ open-access database. VDR was defined as the way voice is produced to respond to a perceived Vocal Demand. The elicitation protocol was standardized, and VDR was sampled using two phonatory tasks per participant: a sustained vowel and connected speech.

Read full abstract
Objectives

To develop and validate an objective Vocal Demand Response Index (VDR-Index) that operationalizes structured demand response behavior under standardized communicative challenge.

Methods

VDR-Index was constructed and cross-validated in an English-speaking population through a methodological model-development approach, leveraging the PDVQ open-access database. VDR was defined as the way voice is produced to respond to a perceived Vocal Demand (VD). The elicitation protocol was standardized, and VDR was sampled using two phonatory tasks per participant: (1) sustained vowel and (2) connected speech/phrase. Both tasks were consolidated into a single subject-level observation to capture VDR as a phenomenon emerging across complementary tasks. The primary reference outcome was CAPE-V Strain, given its conceptual and physiological linkage to hyperfunctional patterns and its consistent mapping to VDR-related markers, including lower CPPS and HNR, higher jitter and NHR, and spectral-balance changes. GRBAS Grade (0–3) was used as an ordinal reference to evaluate score monotonicity with severity. A minimal-sufficient predictor set was extracted, including CPPS, HNR, NHR, jitter, F0SD, alpha ratio/spectral tilt, and an aerodynamic cost proxy (Pressure Lung). Predictors were robust-standardized to improve comparability and resistance to outliers and artifacts. The primary model was LASSO, with Elastic Net used as a comparator and Random Forest used as a non-linear exploratory approach; stratified cross-validation was applied.

Results

The continuous score increased monotonically with GRBAS Grade, supporting the interpretation of VDR as a graded response construct rather than a binary classification outcome. LASSO achieved AUC=0.809 (sensitivity=85.9%; specificity=67.1%). Elastic Net was comparable (AUC=0.811; sensitivity=79.9%; specificity=73.3%). Random Forest showed lower discrimination (AUC=0.727).

Conclusions

The VDR-Index provides an objective quantification of structured demand-response behavior under standardized vocal demand challenges and demonstrates performance consistent with graded physiological response and longitudinal monitoring. As a response marker (not ecological demand), it offers a scalable component for adaptive capacity profiling when interpreted alongside contextual and temporal covariates. Field applications should additionally record contextual vocal-demand covariates (e.g., noise, acoustics, phonation time) to contextualize observed responses. These findings support VDR as a measurable dimension of functional vocal response that may inform future modeling of reserve and depletion dynamics across time.

Carlos Calvache is a Colombian Speech-Language Pathologist (SLP) and Vocologist. He holds a degree in Vocology from the University of Chile, an M.Sc. in Communication-Education from Universidad Distrital Francisco José de Caldas, and a Ph.D. in Applied Sciences Engineering from Universidad Militar Nueva Granada.

Read full bio

Carlos Calvache is a Colombian Speech-Language Pathologist (SLP) and Vocologist. He holds a degree in Vocology from the University of Chile, an M.Sc. in Communication-Education from Universidad Distrital Francisco José de Caldas, and a Ph.D. in Applied Sciences Engineering from Universidad Militar Nueva Granada. With over 16 years of clinical experience, Dr. Calvache specializes in vocal rehabilitation and professional voice training.

He is the Founder and Director of Vocology Center, the first institution dedicated to vocology in Colombia and Latin America. Currently, he serves as a Professor and Principal Researcher for the Speech-Language Pathology program at Corporación Universitaria Iberoamericana. Throughout his career, Dr. Calvache has taught at both undergraduate and postgraduate levels across Latin America. As an active researcher, he has authored numerous peer-reviewed articles and book chapters focused on voice science, and communication. He currently leads the Vocology Research group, spearheading cross-institutional research lines in collaboration with various Latin American universities.

Jack Jiang

Minimal Perturbation Method for Acoustic Data Selection, from Principle to Practice with Supportive Evidence

Jack Jiang
University of Wisconsin–Madison

Many acoustic analysis methods exist to evaluate pathological voices, yet clinical utility has been limited by inconsistent methods for selecting representative voice samples. The Minimal Perturbation Method has a physiologically grounded framework for standardizing acoustic assessment by targeting what a larynx is capable of, rather than characterizing one random data sample. The perturbation of the voice is governed by biomechanics and neuromuscular control. Humans can choose to alter biomechanics through neuromuscular changes and degrade phonation quality.

Read full abstract

Many acoustic analysis methods exist to evaluate pathological voices, yet clinical utility has been limited by inconsistent methods for selecting representative voice samples. The Minimal Perturbation Method has a physiologically grounded framework for standardizing acoustic assessment by targeting what a larynx is capable of, rather than characterizing one random data sample. The perturbation of the voice is governed by biomechanics and neuromuscular control. Humans can choose to alter biomechanics through neuromuscular changes and degrade phonation quality. However, to improve the phonation intentionally is limited by laryngeal structure. Without considering neurological control, we found the range of phonation is bounded by the phonation threshold pressure (PTP) and phonation instability pressure (PIP) from laryngeal modeling studies. Above PIP, vibrations become irregular and chaotic. Pathologic conditions such as vocal fold lesions or paralysis narrow this stable phonatory range by increasing tissue stiffness, mass, and asymmetry, thereby reducing PIP and increasing PTP. As pathology progresses, two thresholds can converge, eliminating the stable phonatory range and explaining why severely impaired larynges cannot sustain periodic vibration. A pathologic larynx, regardless of type of pathology, can phonate worse, however the threshold and range cannot improve. This is the foundation of the Minimal Perturbation Method: the most stable, least perturbed segment of a voice sample reflects the larynx’s true functional capacity, independent of voluntary modulation. Because phonatory capacity is bounded by mechanical structure, we target the least perturbed segment of a speech sample as the most accurate reflection of laryngeal function. While many segmentation methods exist, applying a moving window technique to identify this minimally perturbed segment can obtain an objective, reproducible assessment of phonatory capacity. This approach reframes voice assessment around function, with direct implications for diagnosis, treatment monitoring, and clinical acoustic analysis protocols for laryngeal lesions. The limitation of this method is that it focuses on mechanical structure, rather than neurological control.

Dr. Jack Jiang is Director of International Collaborative Research and Translational Research in the Department of Otolaryngology–Head and Neck Surgery at the University of Wisconsin–Madison, where he also directs the Otolaryngic Biomedical Engineering Research Center and the Laryngeal Physiology Lab. He earned his MD from Shanghai Medical University and completed otolaryngology residency at the Affiliated EENT Hospital before earning a PhD in Speech Pathology and Audiology at the University of Iowa under Professor Ingo Titze, followed by an otolaryngology research fellowship under Dr. Brian F. McCabe.

Read full bio

Dr. Jack Jiang is Director of International Collaborative Research and Translational Research in the Department of Otolaryngology–Head and Neck Surgery at the University of Wisconsin–Madison, where he also directs the Otolaryngic Biomedical Engineering Research Center and the Laryngeal Physiology Lab. He earned his MD from Shanghai Medical University and completed otolaryngology residency at the Affiliated EENT Hospital before earning a PhD in Speech Pathology and Audiology at the University of Iowa under Professor Ingo Titze, followed by an otolaryngology research fellowship under Dr. Brian F. McCabe. His research focuses on objective assessment of pathological laryngeal function, vocal fold biomechanics, and the development of medical instrumentation and software for clinical voice evaluation. Current projects examine the chaotic dynamics of healthy and dysphonic vocal fold vibration, finite element modeling of asymmetry, polyps, and dehydration, and noninvasive evaluation methods for professional voice users and children. He has authored more than 330 peer-reviewed manuscripts, serves on the editorial boards of The Laryngoscope, Annals of Otology, Rhinology & Laryngology, and Journal of Voice, and is a 2001 recipient of the Presidential Early Career Award for Scientists and Engineers.

Reuben Walker

Modeling Subjective Vocal Attributes in Conservatory Singers Using Explainable Machine Learning

Reuben Walker
Charité – Universitätsmedizin Berlin
Additional authors: Mario Fleischer, Marie Bieber, Hartmut Zabel, Dirk Mürbe

Modeling Subjective Vocal Attributes In Conservatory Singers Using Explainable Machine Learning Reuben Walker1,2, Mario Fleischer2, Marie Bieber1,2, Hartmut Zabel1, Dirk Mürbe1,2 1Voice Research Laboratory, University of Music Carl Maria von Weber Dresden, Germany 2Department of Audiology and Phoniatrics, University Medicine Charité, Berlin, Germany Objective Timbre is a central construct in classical voice pedagogy and is often associated with resonance strategies and spectral energy distribution, yet it remains difficult to quantify acoustically. This study investigates whether pedagogical timbre labels can be predicted from acoustic features in a longitudinal dataset of classical singers. Method Between 2002 and 2019, 228 students (138 F, 90 M) at the Hochschule für Musik Carl Maria von Weber Dresden were evaluated on subjective metrics during an audition at the beginning of their studies and subsequently recorded seven standardized exercises annually over four years. Mel spectrograms and MFCCs were calculated for the exercise “Manca sollecita” from the initial year of study and temporally aligned prior to analysis.

Read full abstract

Modeling Subjective Vocal Attributes In Conservatory Singers Using Explainable Machine Learning Reuben Walker1,2, Mario Fleischer2, Marie Bieber1,2, Hartmut Zabel1, Dirk Mürbe1,2 1Voice Research Laboratory, University of Music Carl Maria von Weber Dresden, Germany 2Department of Audiology and Phoniatrics, University Medicine Charité, Berlin, Germany Objective Timbre is a central construct in classical voice pedagogy and is often associated with resonance strategies and spectral energy distribution, yet it remains difficult to quantify acoustically. This study investigates whether pedagogical timbre labels can be predicted from acoustic features in a longitudinal dataset of classical singers. Method Between 2002 and 2019, 228 students (138 F, 90 M) at the Hochschule für Musik Carl Maria von Weber Dresden were evaluated on subjective metrics during an audition at the beginning of their studies and subsequently recorded seven standardized exercises annually over four years. Mel spectrograms and MFCCs were calculated for the exercise “Manca sollecita” from the initial year of study and temporally aligned prior to analysis. Supervised machine learning models (convolutional neural network, multi-layer perceptron, and bottom-up broadcast neural network) were trained to predict pedagogical labels for timbre, resonance, nasality, and solo/choral. Feature analysis was performed using gradient-weighted class activation mapping (Grad-CAM) to identify time–frequency regions most influential for classification. Results Timbre labels were predicted with accuracies up to 72% in binary classification (bright vs. medium/dark), statistically significant compared to models trained on randomly shuffled labels (p = 3×10-5). Frequency regions most strongly associated with timbre classification were located between 2.5–3.5 kHz. Vowel-dependent temporal patterns further differentiated timbre classes: segments containing bright vowels (/i: e:/) contributed strongly to medium/dark classifications, whereas dark vowels (/o: u:/) were influential in bright classifications. Conclusion These findings provide quantitative evidence that traditionally qualitative pedagogical assessments of timbre are anchored in measurable acoustic patterns.

Dr. Reuben Scott Walker is currently based in Dresden, Germany, where he serves as a research associate at the Hochschule für Musik Carl Maria von Weber Studio for Voice Research while simultaneously pursuing a doctoral degree (Dr. rer. med.) in Audiology and Phoniatrics at the Charité – Universitätsmedizin Berlin. His research includes longitudinal spectral development in conservatory singers (Journal of the Acoustical Society of America), longitudinal vibrato development (SMAC, Stockholm, Sweden), voice onset/offset and superficial hydration (Journal of Voice), and aerosol emissions from pre-adolescent singers and their relevance to Covid-19 health protocols (Journal of the Royal Society Interface).

Read full bio

Dr. Reuben Scott Walker is currently based in Dresden, Germany, where he serves as a research associate at the Hochschule für Musik Carl Maria von Weber Studio for Voice Research while simultaneously pursuing a doctoral degree (Dr. rer. med.) in Audiology and Phoniatrics at the Charité – Universitätsmedizin Berlin. His research includes longitudinal spectral development in conservatory singers (Journal of the Acoustical Society of America), longitudinal vibrato development (SMAC, Stockholm, Sweden), voice onset/offset and superficial hydration (Journal of Voice), and aerosol emissions from pre-adolescent singers and their relevance to Covid-19 health protocols (Journal of the Royal Society Interface).

Dr. Walker has also been deeply involved in vocal pedagogy and education. From 2021-2022, he served as Lecturer of Voice at Friedrich Alexander Universität Erlangen. Between 2014 and 2016, he was an Associate Instructor and instructor of record for Undergraduate Vocal Pedagogy at the Jacobs School of Music, where he developed a curriculum emphasizing vocal physiology and acoustics, practical teaching experience, repertoire selection, and private studio finance. In addition to notable solo engagements across Europe and the United States, he is a salaried member of the opera chorus at the Landesbühnen Sachsen.

Karin Titze Cox

Optimizing Diameter, Length, and Water Immersion in Flow Resistant Tube Vocalization

Karin Titze Cox
NCVS / University of Utah / ENT Specialists
Additional authors: Ingo R. Titze, Lynn Maxfield

In voice training and therapy, flow-resistant tube (FRT) and water-resistant tube (WRT) vocalization are two of many SOVT methods used to improve the efficiency of sound production in speaking and singing. These exercises have a long history of success; however, the variation of methods, rationale, and new devices leads us to a deeper dive into how to evaluate the optimal diameters, lengths, and water immersion depths of straws. These variables influence the flow, pressure, and resistance that many devices and methods rely on. When a flow-resistant tube (or straw) is used, there is always a question regarding the optimal length and diameter.

Read full abstract
Objectives

In voice training and therapy, flow-resistant tube (FRT) and water-resistant tube (WRT) vocalization are two of many SOVT methods used to improve the efficiency of sound production in speaking and singing. These exercises have a long history of success; however, the variation of methods, rationale, and new devices leads us to a deeper dive into how to evaluate the optimal diameters, lengths, and water immersion depths of straws. These variables influence the flow, pressure, and resistance that many devices and methods rely on. When a flow-resistant tube (or straw) is used, there is always a question regarding the optimal length and diameter. Furthermore, if the distal end of the tube is inserted into water, there is an additional question about the appropriate depth of insertion. Our objective was to quantify the range of airflow resistances and oral pressures attainable with variations in the length, diameter, and water immersion depth of tubes used in practice.

Methods

Pressure-flow equations, determined previously for variable tube geometries, were used to calculate oral pressure ranges. Human subjects were then recruited to produce oral pressures using these variable geometries, which were quantified with commercial manometers. Several nomograms for airflow resistances and oral pressures were plotted as a function of tube length, diameter, and water insertion depth. These nomograms reveal the variables that change oral pressures most dramatically.

Results

It is shown that with tube diameters in the range of 2.5–3.0 mm and lengths of 39–40 cm, individuals can produce oral pressures in the range of 10–40 cm H2O. Insertion of the distal end into water adds pressure equal to the depth of insertion and wider diameter straws may require both length and water insertion to reach these higher oral pressures.

Conclusions

As individuals manipulate and compare diameters, lengths, and water immersion, they can determine the resistance and pressures needed for their specific efficiency goals. Varying lengths, diameters and water immersion allows for optimized strategies to improve glottal configuration, phonation threshold pressure, and impedance matching between the glottis and the vocal tract for enhanced source-filter interaction and maximum power transfer.

Karin Titze Cox is a certified Speech Language Pathologist (SLP-CCC) specializing in vocology, the science and practice of voice habilitation. She received her BA degree from Brigham Young University and her MA from the University of Iowa.

Read full bio

Karin Titze Cox is a certified Speech Language Pathologist (SLP-CCC) specializing in vocology, the science and practice of voice habilitation. She received her BA degree from Brigham Young University and her MA from the University of Iowa. She spent her early career in research and practicing in university hospital clinics. Over the last few decades, she has enjoyed private practice and serving as voice clinic director for several clinics within ENT Specialists in Salt Lake City, Utah. Karin served as a board member of the Pan American Vocology Association for three years and served on the National Center of Voice and Speech executive board and currently serves on the advisory board while engaging in teaching, research, and outreach opportunities. She finds joy in service to her patients, family, community, church, and friends. She also finds joy in singing and performing on occasion.

Eva van Leer

Effect of Practice on Task-Evoked Pupillary Response in Volitional Voice Quality Modification

Eva van Leer
Georgia State University
Additional authors: Lucas DeBail Ribas

This study examines the effect of practice on the Task-Evoked Pupillary Response- a physiological index of mental effort- during volitional production of novel voice qualities compared to habitual voice quality. Since practice is known to reduce mental effort in motor learning, the pupillary response was hypothesized to reduce for practice of the novel voice qualities as these are being acquired in the cognitive stage of motor learning, but not for practice (i.e. repetition) of each participant’s habitual voice, as this voice is already acquired and in the autonomous (habituated) stage of motor learning.

Read full abstract
Objectives

This study examines the effect of practice on the Task-Evoked Pupillary Response- a physiological index of mental effort- during volitional production of novel voice qualities compared to habitual voice quality. Since practice is known to reduce mental effort in motor learning, the pupillary response was hypothesized to reduce for practice of the novel voice qualities as these are being acquired in the cognitive stage of motor learning, but not for practice (i.e. repetition) of each participant’s habitual voice, as this voice is already acquired and in the autonomous (habituated) stage of motor learning.

Methods

Eleven participants without voice disorders or history of speaking-voice training produced three novel voice qualities- breathy, fry and twang- for three trials each, alternating with habitual voice production on a counting-aloud task. Pupil diameter was recorded via eye tracking glasses (SensoMotoric Instruments, Tallow, Germany) as participants alternated counting aloud in their habitual voice, resting, and counting aloud in a target voice for three trials, yielding a total of three pupillometry recordings: one in which breathy, one in which fry, and one in which twang was alternated with the habitual voice three times. The resulting three CSV files were pre-processed for blinks and processed for mean pupil size yielding an average pupil size for each trial of each voice quality. Paired sample t-tests were conducted to determine whether Trial 3 values differed significantly from Trial 1 values for each voice quality. Difference scores (i.e. Trial 1 vs Trial 3) for each novel voice quality were also compared directly to difference scores for each corresponding habitual voice quality using paired t-tests.

Results

For each intentionally altered voice quality of breathy, fry and twang, pupil diameter significantly reduced from Trial 1 to Trial 3, with corresponding moderate and large effect sizes. However, pupil size for habitual voice production remained stable from Trial 1 to Trial 3, with no significant difference over trials. Pupil size reductions were larger for all novel voice qualities than for corresponding habitual voice qualities. This difference was significant for both twang and fry (p<.05) but not for breathy, which showed the same trend but did not reach significance (p=.074).

Conclusions

Pupillometry can be used to detect practice effects in motor learning under strictly controlled, simple practice schedule.

Dr. Eva van Leer is an associate professor at Georgia State University and an applied clinical investigator. She examines factors that predict patient adherence to voice therapy and develops tools to improve it.

Read full bio

Dr. Eva van Leer is an associate professor at Georgia State University and an applied clinical investigator. She examines factors that predict patient adherence to voice therapy and develops tools to improve it. Her work has been funded by NIDCD, the ASHFoundation and the GA-CTSA. She is currently examining adherence to Conversation Training Therapy as part of an R01-funded team led by Dr. Amanda Gillespie at Emory University Voice Center. In addition to her MS and PhD in Communication Sciences and Disorders, she holds an MFA in Theatre Voice Training and enjoys the motor skill of inline skating.

Eric J. Hunter

Parameterizing Adaptive Vocal Capacity: Baseline Reserve, Depletion Rate, and Restoration Kinetics Across Laboratory and Ecological Evidence

Eric J. Hunter
University of Iowa
Additional authors: Ingo R. Titze, Matthew Schloneger, Lady Catherine Cantor-Cutiva, Adrián Castillo-Allendes, Mark Berardi

Most voice assessments, whether conducted in clinical, research, or occupational health contexts, capture a single performance at one point in time; the assessor cannot determine whether the observed voice reflects the individual’s stable functional ceiling, a temporarily depleted state, or a compensatory adaptation masking the underlying problem. This interpretive ambiguity, the representativeness problem, limits the utility of even technically rigorous single evaluations. To address this, this work characterizes acoustic and self-perceived change across vocal demand and time as measurable depletion indicators and proposes a three-parameter framework for individual vocal capacity profiling.

Read full abstract
Purpose

Most voice assessments, whether conducted in clinical, research, or occupational health contexts, capture a single performance at one point in time; the assessor cannot determine whether the observed voice reflects the individual’s stable functional ceiling, a temporarily depleted state, or a compensatory adaptation masking the underlying problem. This interpretive ambiguity, the representativeness problem, limits the utility of even technically rigorous single evaluations. To address this, this work characterizes acoustic and self-perceived change across vocal demand and time as measurable depletion indicators and proposes a three-parameter framework for individual vocal capacity profiling.

Methods

Evidence is synthesized across six independent datasets spanning two populations (teachers, singers), three methodological traditions (vocal loading tasks, repeated-measures designs, ecological ambulatory monitoring), and a publication span of 2006 to 2025. Measures include acoustic parameters (CPPs, fundamental frequency, pitch strength, speech level) and self-perceived indices (vocal effort ratings, inability to produce soft voice).

Results

Acoustic and self-perceived change across demand and time is consistently observed across designs, detectable using standard instrumentation and self-report tools, and varies systematically across individuals in both laboratory and ecological settings. Acoustic and self-perceived channels are partially independent and together provide a more complete depletion profile than either alone. Group-level analyses obscure this signal, whereas individual-level modeling reveals structured and reproducible patterns. Temporal variability reflects dynamic reserve state rather than measurement noise.

Conclusions

These findings support interpreting variability as structured signal reflecting individual capacity dynamics rather than measurement noise. Previous methods capture meaningful but complementary aspects of this signal; no single approach fully resolves the representativeness problem. An integrated approach combining functional challenge tasks with strategically repeated sampling, positioned between single snapshots and continuous monitoring, provides a practical path toward individual vocal capacity profiling, transforming familiar measures into interpretable indicators of vocal reserve with implications for risk stratification, scheduling, recovery, and interpretation of within-person variability. Acknowledgments: This research was in part supported by the NIDCD of the National Institutes of Health under Grant No. R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Presenter Bio: Eric J. Hunter, PhD, is Chair of the Department of Communication Sciences and Disorders at the University of Iowa and a Fellow of the Acoustical Society of America. His research focuses on occupational voice use, with particular emphasis on vocal health, fatigue, recovery, and adaptive capacity in teachers and other high-demand voice users. Trained in physics and acoustics at Brigham Young University and in speech and hearing science at the University of Iowa, Dr. Hunter integrates biomechanical modeling, speech acoustics, signal processing, and field-based measurement to study how voices function in real-world environments. Over his career, he has authored or coauthored more than 115 peer-reviewed publications spanning laboratory modeling, large-scale longitudinal voice monitoring, and translational studies connecting physiology, acoustics, and behavior. He previously worked for more than a decade with the National Center for Voice and Speech and later held faculty and leadership roles at Michigan State University, concluding his time there as Associate Dean for Research. Across research and leadership, he emphasizes interdisciplinary collaboration, mentorship, and the translation of scientific insight into practical solutions that improve vocal health and communication outcomes.

Eric J. Hunter, PhD, is Chair of the Department of Communication Sciences and Disorders at the University of Iowa and a Fellow of the Acoustical Society of America. His research focuses on occupational voice use, with particular emphasis on vocal health, fatigue, recovery, and adaptive capacity in teachers and other high-demand voice users.

Read full bio

Eric J. Hunter, PhD, is Chair of the Department of Communication Sciences and Disorders at the University of Iowa and a Fellow of the Acoustical Society of America. His research focuses on occupational voice use, with particular emphasis on vocal health, fatigue, recovery, and adaptive capacity in teachers and other high-demand voice users. Trained in physics and acoustics at Brigham Young University and in speech and hearing science at the University of Iowa, Dr. Hunter integrates biomechanical modeling, speech acoustics, signal processing, and field-based measurement to study how voices function in real-world environments. Over his career, he has authored or coauthored more than 115 peer-reviewed publications spanning laboratory modeling, large-scale longitudinal voice monitoring, and translational studies connecting physiology, acoustics, and behavior. He previously worked for more than a decade with the National Center for Voice and Speech and later held faculty and leadership roles at Michigan State University, concluding his time there as Associate Dean for Research. Across research and leadership, he emphasizes interdisciplinary collaboration, mentorship, and the translation of scientific insight into practical solutions that improve vocal health and communication outcomes.

Aude Cardona

The Impact of Mindfulness Meditation on Voice Production and Learning

Aude Cardona
University of Delaware
Additional authors: Luis Carlo Bulnes Fuentes, Shaheen Awan, Jordan Awan, Giuseppe Pagnoni, Katherine Verdolini Abbott

The kinesiology literature has established that internal attention to body biomechanics degrades learning compared to an external focus on movement outcomes, but has not examined the metacognitive dimension of attention in learning. This study investigated whether the metacognitive monitoring stance – how a performer relates to internal signals from the phonatory system (phonoception) during action – impacts voice motor learning. We compared a receptive, non-evaluative approach with an analytical, evaluative one.

Read full abstract
Objectives

The kinesiology literature has established that internal attention to body biomechanics degrades learning compared to an external focus on movement outcomes, but has not examined the metacognitive dimension of attention in learning. This study investigated whether the metacognitive monitoring stance – how a performer relates to internal signals from the phonatory system (phonoception) during action – impacts voice motor learning. We compared a receptive, non-evaluative approach with an analytical, evaluative one.

Methods

Twenty-two singers exhibiting salient register “breaks” at the primo passaggio completed a four-week randomized controlled trial. Participants were assigned to Mindfulness Meditation (MM), cultivating receptive meta-awareness of phonoceptive experience, or Basic Voice Science (BVS) training, fostering an analytical, evaluative monitoring stance toward voice-production sensations. The task involved smoothing the register transition on an ascending /i/ glide from A3 to A4. Learning was assessed via pre- and post-training perceptual and acoustic measures (maximum F₀ derivative). Self-reports indexed metacognitive monitoring traits. Functional magnetic resonance imaging (fMRI) assessed neurophysiological substrates.

Results

From pre- to post-training, only MM improved in register transition smoothness on both perceptual and acoustic measures. MM singers also reported increased awareness and reduced self-judgment. fMRI revealed distinct neural activation changes between groups. MM showed widespread changes across sensorimotor, audio-vocal, striatal, and cerebellar regions previously linked to voice production in experienced singers. Additional changes in frontopolar, cingulate, and default mode network regions were observed, previously associated with metacognitive monitoring and self-referential processing among other functions. BVS showed more spatially restricted changes in prefrontal and parietal regions with decreased cerebellar activation, consistent with cognitive-strategic control, without comparable sensorimotor changes.

Conclusions

The results suggest that the metacognitive monitoring stance is a trainable variable benefiting vocal motor learning. By reducing reactive interference with phonoceptive sensation, the receptive stance may improve the precision of sensory feedback to the motor system, supporting skill acquisition, refinement. Acknowledgments: This work was supported by the Francisco Varela Award from the European Mind and Life Institute. The authors gratefully acknowledge Alyssa Wronski, Adelyn Lichtenstein, Abigail DeWese, Emily Cohen, Keira Dougherty, Mia Trageser, and Coleman Walsh, research assistants who contributed to data collection and processing; Maryam Vaziri-Pashkam and Keith Schneider for their expert guidance on fMRI methodology.

Aude Cardona holds a PhD in Communication Sciences and Disorders from the University of Delaware. Her research sits at the intersection of voice science, cognitive neuroscience, phenomenology, and contemplative practices.

Read full bio

Aude Cardona holds a PhD in Communication Sciences and Disorders from the University of Delaware. Her research sits at the intersection of voice science, cognitive neuroscience, phenomenology, and contemplative practices. A mezzo-soprano, Aude graduated from the Manhattan School of Music and is a certified Iyengar Yoga teacher. A dedicated mindfulness meditation practitioner, she holds a certificate in Mindfulness Meditation and Psychotherapy from the Institute for Meditation and Psychotherapy.

Open block · 5:30 – 6:00 PM (closing remarks, networking)