The Smithsonian Institution
Browse

Evaluating OpenAI Whisper and its variants for increasing the accessibility of audio collections

Download (1.47 MB)
presentation
posted on 2024-01-03, 21:44 authored by Michael TriznaMichael Trizna, Crystal Sanchez, Emily Cain, Mark Custer

Original abstract: The development at the Smithsonian Institution of a central Media Asset Delivery Service (MADS) aims to address accessibility compliance by remediating digital content. MADS supports audio and video delivery with captions through Smithsonian's Digital Asset Management System (DAMS). Machine generated captioning has previously not been accurate enough to justify use of staff time to fix the results. However, with the release of OpenAI's Whisper model, the Smithsonian launched a pilot project that aimed to test the feasibility of tools to help remediate centrally delivered audio files and make them available through the new MADS player. The Smithsonian Transcription Center is a digital volunteer program that allows the public to help transcribe the Smithsonian’s historical documents and collections records to improve accessibility and discoverability. Volunteers have transcribed over 200 hours of audio on Transcription Center, which we used alongside audio transcriptions generated by professional vendor support, as a baseline for comparing and evaluating the output of different versions of the Whisper model. We conducted a meticulous evaluation, comparing Whisper-generated captions against human-transcribed content from varied contexts, shedding light on their relative strengths and limitations.

History