The Impact of Visual and Audiovisual Inputs on Voice Perception and Production: Exploring the Role of Immersive Virtual Reality in Clinical Practice

Sensory feedback is crucial for regulating pitch, loudness and other aspects of voice and speech perception and production. (1) Auditory input is traditionally considered the primary signal in speech communication. One of many examples leading to this reasonable conclusion are studies showing that deaf infants exhibit delayed articulatory precision in babbling and phonological deficits during puberty. (2,3) Other studies have found that superior auditory skills enhance pitch accuracy in voice production, revealing the relationship between auditory input and voice production. (4) Interestingly, other evidence indicates that blind children may also experience phonological deficits. (5) Additionally, incongruent audiovisual input impacts speech perception. (6) Despite these findings, the influence of visual input on individuals’ voice perception and production has largely been overlooked.

To understand the role of sensory modalities in voice perception and production, we need to address this data gap. Our lab demonstrated that visual input can significantly impact both voice perception and production under certain conditions. (7) 

Motor control and learning theories, particularly as related to the specificity of learning principle, (8) suggest that for effective motor learning and generalization, sensory experiences during practice should closely replicate those in later “test” conditions. In the case of voice training, the “test” conditions of greatest interest are found in real-life situations. Traditional clinical settings typically lack the variety of sensory experiences needed to meet this requirement. Advanced technology, such as immersive virtual reality (IVR), can address this limitation by creating numerous scenarios that simulate real-life environments to use in training, thus presumably enhancing later transfer. IVR enables users to experience auditory and visual inputs in a replicated real-world environment, providing a multisensory experience.

Based on these considerations, we investigated the role of visual and audiovisual inputs on voice. Our study demonstrated that visual input significantly impacts voice perception and production in healthy voice adults. Furthermore, when combined with auditory input, auditory and visual modalities interact, influencing each other thus raising questions about the often-assumed dominant role of auditory input in voice production. (7) Although further studies are needed to fully understand the multisensory context of voice perception and production, our research highlights the critical role of multisensory integration in voice motor regulation.

In our study, we tested the feasibility of using IVR to simulate a multisensory context. Our findings indicated that IVR can enhance multisensory experiences in traditional clinical and pedagogical settings, thereby enabling the application of the specificity of learning principle in the clinic and voice studio.

We continue to conduct various studies to further elucidate the role of visual and audiovisual inputs on voice perception and production, and to enhance the practicality of immersive VR in traditional clinical and pedagogical settings. The future of clinical practice lies at least partially in technology, and its integration is essential to improve patient outcomes.

References

  1. Selleck, M. A., & Sataloff, R. T. (2014). The impact of the auditory system on phonation: a review. Journal of voice28(6), 688-693.
  2. Oller, D. K., Eilers, R. E., Bull, D. H., & Carney, A. E. (1985). Prespeech vocalizations of a deaf infant: A comparison with normal metaphonological development. Journal of Speech, Language, and Hearing Research28(1), 47-63.
  3. Stoel-Gammon, C., & Otomo, K. (1986). Babbling development of hearing-impaired and normally hearing subjects. Journal of Speech and Hearing Disorders51(1), 33-41.
  4. Watts, C., Moore, R., & McCaghren, K. (2005). The relationship between vocal pitch-matching skills and pitch discrimination skills in untrained accurate and inaccurate singers. Journal of Voice19(4), 534-543.
  5. Mills, A. (1987). B. Landau & LR Gleitman, Language and experience. Evidence from the blind child. Cambridge MA: Harvard UP, 1985. Pp. xi+ 250. Journal of Child Language14(2), 397-402.
  6. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature264(5588), 746-748.
  7. Daşdöğen, Ü., Awan, S. N., Bottalico, P., Iglesias, A., Getchell, N., & Abbott, K. V. (2023). The Influence of Multisensory Input On Voice Perception and Production Using Immersive Virtual Reality. Journal of Voice.
  8. Proteau, L., Marteniuk, R. G., & Lévesque, L. (1992). A sensorimotor basis for motor learning: Evidence indicating specificity of practice. The Quarterly Journal of Experimental Psychology44(3), 557-575.

How to Cite

Daşdöğen, Ü., Verdolini, K.A. (2024), The Impact of Visual and Audiovisual Inputs on Voice Perception and Production: Exploring the Role of Immersive Virtual Reality in Clinical Practice. NCVS Insights, Vol. 2(3), pp. 1-2. DOI: https://doi.org/10.62736/ncvs150214