NCVS Insights – Science that Resonates
Ambulatory Voice Assessment: Challenges, Opportunities, and Future Directions
September 23, 2025
Volume 3, Issue 9 – September 2025
By Victor M. Espinoza, Ph.D.
Continuous recording of voice on a daily basis has emerged as a new paradigm in voice assessment over the past decades (Svec, Titze, & Popolo, 2005; Mehta et al., 2012; Bottalico & Nudelman, 2023). As an example, the ubiquity of smartphone technology has opened new possibilities to record, store, and process raw voice data, enabling researchers to investigate voice features in an ambulatory scenario with app-type software (Penha et al., 2025). This integration of consumer technology facilitates the way for alternative recording methods.
Additionally, portable audio recorders with built-in or external microphones offer viable alternatives for capturing speech recordings and analyzing for voice data in everyday settings (Bottalico & Nudelman, 2023). Such tools expand the options available for researchers seeking robust data collection outside controlled environments.
Beyond consumer-level solutions, specialized voice devices designed for objective voice assessment are gaining increasing attention in clinical research. These devices, often incorporating neck surface accelerometers (Zañartu et al., 2013; Cortés et al., 2018) or contact microphones, serve as alternatives to traditional airborne sound recording. Whether researchers use airborne sound waves or neck surface vibrations, clinicians aim to gather complementary information about voice behavior outside clinical settings. This information could reveal previously unrecognized voice patterns in real-world contexts. These innovative devices prompt a re-examination of traditional clinical evaluation methods.
Clinicians typically evaluate voice problems using clinical or laboratory-based methods, including endoscopic imaging, voice acoustics recordings, auditory perceptual evaluation, and comprehensive voice histories. As specialized voice devices gain prominence, interest in their clinical potential continues to grow. However, technological advancements often necessitate a re-evaluation of clinical practices, potentially leading to shifts toward new evidence-based methodologies. This evolution mirrors trends observed in other medical fields.
As seen in other medical fields, technological progress tends to improve diagnostic accuracy and reduce uncertainties in treatment. A useful comparison can be drawn with Ambulatory Blood Pressure Monitors (ABPM) (Kain, Hinman, & Sokolow, 1964) and Holter Monitor (Holter, 1961). These devices underwent decades of development and validation before becoming standard medical tools. Similarly, ambulatory voice assessment is still in its infancy, and despite promising initial results, substantial work is required to establish meaningful experimental protocols and robust data analysis methodologies. These challenges highlight the need to address fundamental questions about the direction of ambulatory voice research.
Once reliable technology and data acquisition methods are established for specialized voice devices, a fundamental question arises in voice research: Should ambulatory voice monitoring aim to replicate laboratory conditions (e.g., data collected via controlled vocal gestures like /pae/), or should it embrace a more ecological perspective (e.g., conversational speech), capturing natural voice use throughout daily activities? Alternatively, could a hybrid approach be considered? The ecological perspective remains largely unexplored, presenting unique challenges that warrant further research. Addressing these questions requires careful consideration of the technical and ethical factors involved.
Developers of voice assessment applications and wearable devices must consider various factors that influence voice recordings in ambulatory settings. Assuming ethical considerations, data privacy, and potential conflicts of interest have been carefully addressed—drawing lessons from similar approaches—some technical challenges remain unresolved. These challenges extend to the management of future datasets in ambulatory voice research.
While no public database of ambulatory voice recordings is available to date (2025), several issues are expected to arise once such data becomes available, particularly regarding storage and management. Publicly available voice datasets often suffer from inconsistent recording conditions and formats, poor microphone or sensor quality, unmatched frequency responses, high environmental noise levels, or are unrelated to specific voice disorders. Additionally, metadata may be incomplete or inaccurate, complicating efforts to extract meaningful insights from real-world data. Despite these issues, such datasets tend to grow over time and, with careful curation, can serve as valuable resources for voice research (Mozilla, 2025; Walden, 2020; Saarbrücken Voice Dataset, n.d.). The complexity of these datasets underscores the need for advanced analytical approaches.
Ambulatory voice monitoring generates vast amounts of raw data, potentially spanning days of continuous recordings. Analyzing these data is an ongoing endeavor for voice researchers, requiring rigorous validation through a hybrid approach that integrates automated statistical methods, machine learning algorithms, advanced signal processing techniques, and clinical expertise from medical doctors and speech-language pathologists (Mehta et al., 2015). These latter roles are essential for ensuring the interpretability of results, as complex data must be translated into actionable clinical insights. Such insights could bridge the gap between traditional and ambulatory voice assessment methods.
The advent of ambulatory voice monitoring also presents new opportunities to understand vocal behavior in ecological settings. Given these developments, existing clinical evaluation methods, such as the CAPE-V and GRBAS rating scales, may be more directly relatable to the interpretation of data from ambulatory methods. Contrasting these auditory perceptual voice evaluation methods with daily-based voice information could complement traditional clinical voice assessments with continuous voice monitoring in daily life. This potential synergy highlights practical applications of ambulatory monitoring.
Although ambulatory voice assessment remains in its early stages, it holds significant potential to advance the understanding of vocal behavior. For example, detecting mild phonotrauma serves as a preventive health strategy, alerting clinicians to potential vocal fold lesions that may require surgical intervention (Van Stan et al., 2023). Vocal dose (Svec et al., 2003) offers another approach to evaluate voice behavior, such as vocal fatigue in teachers on a daily basis (Atará-Piraquive, Bottalico, & Cantor-Cutiva, 2025). These applications raise broader questions about the relationship between daily voice use and vocal health.
Ambulatory voice monitoring could reveal causal relationships between daily voice use and pathologies (e.g., nodules or dysphonia) beyond what is provided by clinic-based snapshots. Consequently, the following questions are raised: Are these relationships known? Under which environmental or social conditions do these relationships emerge? Are they solely associated with occupational activities? Furthermore, if these relationships are identified, how can current vocal therapy be adapted to this scenario? These questions represent initial areas for exploration in ambulatory settings.
Addressing these questions and challenges is considered essential for realizing the full potential of ambulatory voice monitoring in both research and clinical applications.
REFERENCES
Atará-Piraquive, A. P., Bottalico, P., & Cantor-Cutiva, L. D. (2025). Impact of a workplace vocal health promotion program on vocal doses in college professors: A Colombian exploratory study. Revista de Investigación e Innovación en Ciencias de la Salud, 7(2), 1–16. https://riics.info/index.php/RCMC/article/view/4483
Bottalico, P., & Nudelman, C. J. (2023). Do-it-yourself voice dosimeter device: A tutorial and performance results. Journal of Speech, Language, and Hearing Research, 66(7), 2149–2163. https://doi.org/10.1044/2023-JSLHR-23-00060
Cortés, J. P., Espinoza, V. M., Ghassemi, M., Mehta, D. D., Van Stan, J. H., Hillman, R. E., Guttag, J. V., & Zañartu, M. (2018). Ambulatory assessment of phonotraumatic vocal hyperfunction using glottal airflow measures estimated from neck-surface acceleration. PLOS ONE, 13(12), 1–22. https://doi.org/10.1371/journal.pone.0209017
Holter, N. J. (1961). New method for heart studies. Science, 134(3486), 1214–1220. https://doi.org/10.1126/science.134.3486.1214
Kain, H. K., Hinman, A. T., & Sokolow, M. (1964). Arterial blood pressure measurements with a portable recorder in hypertensive patients. Circulation, 30(6), 882–892. https://doi.org/10.1161/01.CIR.30.6.882
Mehta, D. D., Van Stan, J. H., Zañartu, M., Ghassemi, M., Guttag, J. V., Espinoza, V. M., Cortés, J. P., Cheyne, H. A., & Hillman, R. E. (2015). Using ambulatory voice monitoring to investigate common voice disorders: Research update. Frontiers in Bioengineering and Biotechnology, 3, Article 155. https://doi.org/10.3389/fbioe.2015.00155
Mehta, D. D., Zañartu, M., Feng, S. W., Cheyne, H. A., & Hillman, R. E. (2012). Mobile voice health monitoring using a wearable accelerometer sensor and a smartphone platform. IEEE Transactions on Biomedical Engineering, 59(11), 3090–3096. https://doi.org/10.1109/TBME.2012.2207896
Mozilla. (2025). Common Voice [Dataset]. https://commonvoice.mozilla.org/en
Penha, P. B., Mendes, A. L., Guedes, A., Santos, C., Duarte, L. S., & Lima-Silva, M. F. (2025). Mobile applications used for vocal health: An integrative review. Journal of Voice. Advance online publication. https://www.sciencedirect.com/science/article/pii/S0892199725000906
Saarbrücken Voice Dataset. (n.d.). https://stimmdb.coli.uni-saarland.de/
Svec, J. G., Titze, I. R., & Popolo, P. S. (2005). Estimation of sound pressure levels of voiced speech from skin vibration of the neck. The Journal of the Acoustical Society of America, 117(3), 1386–1394. http://scitation.aip.org/content/asa/journal/jasa/117/3/10.1121/1.1850074
Svec, J. G., Titze, I. R., Popolo, P. S., Muller, F., & Wittenberg, T. (2003). Vocal dosimetry: Theoretical and practical issues. In G. Schade & M. Hess (Eds.), Proceedings of the Conference Advances in Quantitative Laryngology, Voice and Speech Research (AQL 2003)(pp. xx–xx). Hamburg, Germany.
Van Stan, J. H., Burns, J., Hron, T., Zeitels, S., Panuganti, B. A., Purnell, P. R., Mehta, D. D., Hillman, R. E., & Ghasemzadeh, H. (2023). Detecting mild phonotrauma in daily life. The Laryngoscope, 133(11), 3094–3099. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10592567/
Walden, P. R. (2020). Perceptual voice qualities database (PVQD) (Version 1) [Data set]. Mendeley Data. https://data.mendeley.com/datasets/9dz247gnyb/1
Zañartu, M., Ho, J. C., Mehta, D. D., Hillman, R. E., & Wodicka, G. R. (2013). Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration. IEEE Transactions on Audio, Speech, and Language Processing, 21(9), 1929–1939. https://doi.org/10.1109/TASL.2013.2261314
Victor M. Espinoza
Víctor M. Espinoza, Ph.D., is a Chilean scientist specializing in voice research in a clinical context. His work focuses on extracting biomedical signals and patterns from the voice using physical modeling, signal processing, and pattern recognition techniques to support clinical monitoring and early diagnosis of vocal pathologies. He currently serves as an Associate Professor and Director of the Voice Research Laboratory in the Department of Sound at the Universidad de Chile, Santiago, Chile.
HOW TO CITE
Subscribe to NCVS Notes
Contact
975 S. State Street
Clearfield, UT 84015