We can make many different sounds with our voices, communicating not only with what is said, but also in which context it is said, and who is saying it. This large variability makes the voice a rich channel for communication, but it also presents us with challenges when we try to assess the status of a voice using quantitative measurements, rather than by listening. When voices run into trouble, even more variability can be expected. Voice production is usually described as three processes in sequence: respiration – breathing; phonation – the vibration of the vocal folds; and articulation – changing the shape of the vocal tract, which modifies the sound into vowels and consonants. For brevity, let’s look only at some aspects of phonation that can be expected to be clinically relevant.
Types of Phonation
The healthy vocal folds are a curious contraption that can vibrate in a number of ways, depending on how the folds are positioned and tensioned, and how their oscillation is driven by the lung pressure. Phoneticians, who study speech sounds, often use the term ‘phonation types’ [1], and so will we, but here we’ll keep them a bit simpler. One may discern four different vibratory regimes, or types of phonation, that can be exercised in speech without the speaker being regarded as having a voice pathology or deviant voice. The first three [2] are modal voice (a.k.a. chest voice, or M1), falsetto (head voice, or M2), and creaky voice (vocal fry, or M0, of which there seems to be several subtypes [3]). While M1 is predominant in normal speech, episodes of M0 and/or M2 are common, especially in voicing offset from M1 or in elevated phonation frequency (fo) for M2. In untrained voices, the M1-M2 transition is often abrupt and obvious, and can even be embarrassing, when it is unintended; it is the ‘voice break’, which classical singers train very hard to conceal. The M1-M0 transition is usually also rapid, but it tends to sound like an octave shift that does not perturb the perceived pitch contour as much as the M1-M2 transition does. Indeed, speech technologists are now looking at how to synthesise M0 appropriately, for improved naturalness of computer speech. (There is also an M3, but only in high-pitched singing.)
Breathy Voice
But wait – that’s only three regimes? Findings from electroglottography (EGG) [4] suggest that soft (breathy) voice, in which the vocal folds are vibrating without actually touching, deserves to be thought of as a regime of its own (the phoneticians do it), to which an ‘M’ category remains to be assigned – so let’s temporarily use “MX” here. If you listen to the EGG signal in very soft voicing, the onset of VF contacting is usually quite abrupt, similar to a register change, but it has little or no effect on the sound we hear. It is only at a slightly higher vocal effort that the voice timbre starts to change. In normal speakers, MX appears mostly at the offset of phonation, or in intimate or otherwise subdued voice; and more so in some individuals than in others. In vocal performance, MX is rarely used, but clinicians and voice users are quite familiar with ‘breathy voice’.
When a voice is impaired, its owner loses some control of these habitual phonation types, and may even resort to novel coping strategies, such as trying to phonate with other body parts than the vocal folds. The clinician, therefore, would like to document the patient’s ranges in both conventional and unconventional types of phonation. For instance, if a patient after treatment can phonate in M1 over a larger range than before, that is probably a desired outcome. So how might that be shown, through measurements of physical entities?
Mapping of metrics
After decades of research, there are now more voice metrics in the literature than one can shake a stick at. The search for a clinicians’ “holy grail” metric, however, is probably futile, because pathologies are very diverse, and no single metric is likely to function as a universal marker. Most metrics vary not just with vocal status, but considerably more with sound pressure level (SPL), fo and phonation type. This can be true even across the limited range of habitual speech, of which the softer half tends to include the contacting threshold. So in order to detect any effects of an intervention, we need to account also for those underlying variations. The good news is that, although individual voices are as different as faces, we now know that such variations tend to be quite systematic within a given voice, from accumulated experience with voice maps [5]. Voice maps are similar to voice range profiles (VRPs), but focus on how various metrics change over an elicited range, rather than on determining the informant’s extremes in SPL and fo. For clarity of example, the figure below shows a ‘clean’ voice map of a highly trained baritone doing soft-loud-soft exercises on the vowel /a/ while staying in normal modal voice. The map took 5½ minutes to record, and shows data averaged from over some 46,000 phonatory cycles. Six of the voice map’s layers are shown here.
Figure 1: A highly trained baritone did soft-loud-soft exercises, staying in modal voice. EGG metrics are in the top row, acoustic metrics in the bottom row. The dashed lines seen from (a) to (f) is where VF contacting sets in. The blue oval in (a) suggests the range of habitual speech. Metric definitions can be found in [4]. Data are adapted from [7].
Note how the peak dEGG in (a) aligns with changes in the contact quotient (b) and the entropy (disorder) of the EGG pulse shapes (c). Panels (c) and (f) are both perturbation metrics, and it is clear how sensitive these are to changes in SPL [6]. Maps can also be made of running speech and/or pathological voices, which makes them a bit noisier, but still quite interpretable.
Mapping of phonation types
The charts in Figure 1 are interesting, but what do they tell us about phonation types? Well, predictably, this singer stayed away from M0 and M2. Instead, we can create an MX-M1 scale by listening to the sound at various points in the map, and choosing, say, five representative examples of phonations that were no contact – transition – loose contact – firm contact – hard contact, where the first is MX, transition is in between, and the last three are grades of M1. The corresponding sets of values of the six metrics were then used to classify the singer’s productions, as shown in Figure 2.
Figure 2: (a) Polar plot of the six metrics used here for type classification, defining the clustering centroids. The radial axes are standardised 0-100% for each metric. (b) Map of the same production as in Figure 1, with ‘phonation type’ categories based on the weights in (a). (c) Ditto, but the singer sang with as breathy a voice as possible throughout. Note how the softer phonation types migrate upwards. (d) Example of a difference map, here of the Spectrum Balance SB, where red means a decrease in the latter production. The difference is defined only for the region that exists in both Normal and Breathy. Note how above the threshold of contact (dotted line), SB decreased as would be expected for breathy voice due to weaker high harmonics, but it increased below, presumably due to more aspirate noise.
By combining several metrics, we can come much closer to characterising types of phonation that can be clinically relevant. This combining can be done automatically, by statistical clustering; or by ear, picking representative samples of different phonation types from the map. The software used here can learn from only a few seconds of voicing to discriminate M1 from M2, for example. M0 seems trickier, and will probably require other metrics. With all metrics we have tried, clustering into only two categories will generally resolve two zones that are on either side of the vocal fold contact transition region [4].
Effects of interventions can be visualised by making voice maps pre and post, and then constructing a new map that shows the differences. As an example, our highly trained baritone repeated the soft-loud-soft exercise in as breathy a voice as he could, thus emulating a less well functioning voice. Figure 2d shows how the spectrum balance SB of the microphone signal changed, in dB on a red-green colour scale. It is still early days, but work is in progress on making such difference maps across various clinical and pedagogical interventions.
Conclusion
Mapping phonation types over the voice field may be more clinically relevant than trying to find normal/pathological cutoff values for particular metrics, as the types relate more closely to the functioning of the voice. In some sense, such categorization is what machine-learning approaches already do, but the problem is that they – as yet – do not explain how. Of course, experienced voice clinicians have their own mental maps of voice types that provide the grounding for perceptual voice assessment.
There are many other applications of voice mapping, so stay tuned. Would you like to try it for yourself? There is a free download of the latest FonaDyn software, on the author’s profile page at https://www.kth.se/profile/stern. Its documentation contains all the details that could not fit in this short article. Finally, apologies to any readers with alternative color vision. Admittedly, the color scales used here are not ideal; improved color mapping schemes are on the drawing board.
References
- Gordon, M.; Ladefoged, P. Phonation Types: A Cross-Linguistic Overview. J. Phon. 2001, 29, 383–406, doi:10.1006/jpho.2001.0147.
- Roubeau, B.; Henrich, N.; Castellengo, M. Laryngeal Vibratory Mechanisms: The Notion of Vocal Register Revisited. J. Voice 2009, 23, 425–438, doi:10.1016/j.jvoice.2007.10.014.
- Proctor, K.; Scherer, R.C.; Perrine, B.L. Vocal Fry Patterns While Reading. J. Voice 2022, doi:10.1016/j.jvoice.2022.01.013.
- Cai, H.; Ternström, S. Mapping Phonation Types by Clustering of Multiple Metrics. Appl. Sci. 2022, 12, 12092, doi:10.3390/app122312092.
- Ternström, S.; Pabon, P. Voice Maps as a Tool for Understanding and Dealing with Variability in the Voice. Appl. Sci. 2022, 12, 11353, doi:10.3390/app122211353.
- Brockmann-Bauser, M.; Bohlender, J.E.; Mehta, D.D. Acoustic Perturbation Measures Improve with Increasing Vocal Intensity in Individuals With and Without Voice Disorders. J. Voice 2018, 32, 162–168, doi:10.1016/j.jvoice.2017.04.008.
- Jacobsson, L. Listeners’ Perception of Voice Quality: Perceptual and acoustic analyses. Internal student project report, Göteborgs Universitet, Dept of Neuroscience and Physiology, Division of Logopedics, 2023. (unpublished)
How to Cite
Ternström, S. (2024), Vocal Function and Range. NCVS Insights, Vol 2(2), pp. 1-2. DOI: https://doi.org/10.62736/ncvs183832