Human Vocalization Libraries

G. Robert Vincent Voice Library at Michigan State University

The G. Robert Vincent Voice Library (VVL), housed at Michigan State University in East Lansing, Michigan, is the largest academic voice archive in the United States. It contains more than 100,000 hours of spoken-word recordings and preserves the voices of over 500,000 individuals — from presidents and public figures to everyday citizens whose stories capture the texture of their time.

Founded in 1962 through the personal collection of sound engineer and recording pioneer G. Robert Vincent, the library focuses exclusively on the spoken word — not music or video. Its holdings date back to 1888 and include early Edison recordings, presidential speeches, World War II broadcasts, oral histories, and interviews spanning cultural, political, and academic life.

Original analog materials are preserved under archival conditions, while an ongoing digitization effort continues to migrate recordings into digital formats. A portion of the collection is available online as MP3 files where rights allow, while the majority of materials can be accessed onsite in the MSU Main Library’s Special Collections area.

Researchers, educators, media professionals, and the general public use the Voice Library for historical research, rhetorical analysis, documentary production, and the study of language and culture across time. Public recordings can be explored at the official website.

Enter the G. Robert Vincent Voice Library

Perceptual Voice Quality Database (PVQD)

The Perceptual Voice Qualities Database (PVQD) is a publicly available collection of 296 high-quality human voice recordings developed under the auspices of The Voice Foundation and created by researchers including Patrick R. Walden, Ph.D., CCC-SLP. The database includes sustained vowels (such as /a/ and /i/) and standardized sentences from the CAPE-V clinical protocol, all recorded in controlled conditions and provided in .wav format.

PVQD was designed to support education, research, and clinical training in voice science and speech-language pathology. Each recording has been systematically rated by experienced voice clinicians using established perceptual scales such as CAPE-V and GRBAS, making it a structured reference set for studying voice quality across a spectrum from normal to varying severities of dysphonia.

In clinical education, the database helps students and emerging clinicians learn to recognize and rate voice qualities such as breathiness, roughness, strain, and overall severity. For researchers, it provides a benchmark dataset for studying listener reliability, acoustic-perceptual relationships, and training effects. Because each file includes expert ratings and demographic metadata, it also serves as valuable ground truth data for validating voice assessment tools and automated analysis systems.

The PVQD is freely available for download (typically under a Creative Commons license), making it accessible for classroom use, research studies, and technology development. By standardizing perceptual reference material, the database plays an important role in strengthening the bridge between clinical listening, acoustic measurement, and emerging voice analysis technologies.

Enter the PVQD

Arizona Child Acoustic Database (ACAD) at The University of Arizona

The Arizona Child Acoustic Database (ACAD) is a systematically assembled longitudinal database of acoustic speech recordings collected from typically developing children aged approximately 2 to 7 years. The primary objective of this dataset is to support empirical studies of vocal development during a formative period of speech acquisition and physiological growth, offering researchers access to high-quality audio samples that can improve understanding of childhood speech production.

The database was created through repeated recordings of the same cohort of participants at regular three-month intervals, enabling both cross-sectional and longitudinal analyses of acoustic properties over time. The protocol includes a range of speech materials: isolated vowels and diphthongs, controlled multi-vowel sequences, words designed to elicit each American English vowel, brief sentences, and spontaneous conversational speech. These varied elicitation conditions increase the utility of the corpus for acoustic phonetic research, particularly in examining how articulatory and phonatory features evolve with age and maturation.

The ACAD project was conducted under the auspices of the University of Arizona’s Speech Acoustics and Physiology Laboratory, with the dataset archived and distributed through the University of Arizona’s institutional repository. The repository supports open access to the audio files for scientific investigation, subject to appropriate use and citation of the foundational work.

The motivation behind assembling the database was twofold: to generate a large, structured dataset that captures developmental trends in vowel and speech production and to facilitate research efforts such as computational modeling of child speech. The longitudinal design is particularly valuable for tracking individual developmental trajectories, a feature that distinguishes ACAD from many other speech corpora that rely solely on cross-sectional sampling.

Enter the ACAD