Project SMRT – Technology walkthrough video: fine-tuning ASR deep learning model

Project SMRT – Technology walkthrough video: fine-tuning ASR deep learning model

hi everyone welcome to our technology walkthrough video we’re going to use this format to document separate parts of our project and guide you through some implementation details hopefully enjoy in our project we aim to determine in real time the degree of alignment between citizens priorities and government priorities and we’re going to offer it as a free service to communities institutions and leaders to help together design the next generation of digital services and for that we’re aiming to scrape live feeds from every parliament in australia federal as well as state and territory as soon as we harvest the audio components of video stream fragment we need to apply speech recognition model to convert it to text and here we aim to use a wave to vactive model released by facebook we’re hugging phase the model is already state of the art but sometimes it has trouble understanding australian politicians to fix that we retrain the model on a small subset of voice samples containing speech fragments of prime minister scott morrison and here we’re using google collab environment and google drive as storage we also hosted a copy of our data set and made it open source and if you like to reuse it for tuning your own speech recognition model you’re more than welcome you can download it from our google drive folder [Music] as you can see the data set contains both test and trained subsets of data and here some random text samples you see that the vocabulary contains every letter which is expected and if you compare it to the vocabulary of the original facebook model we see that they’re pretty similar and to optimize fine tuning we’re going to inherit the original model vocabulary as well as its original processor and feature extractor here’s the random audio clip from our data set the more than a billion dollars and before we try to improve our model um we need to understand how good it is already and here we load the original model and we check its word error rate score [Music] so here we check it on the test subset of our data set and we see that it’s just under 10 which is pretty good here we show how the model handles voice samples and we want to be able to continue to achieve that as best as we possibly can as we continue through this suppression phase and that’s why there aren’t any shortcuts here we have a true label of that sample and below we ask our model not our model but the original facebook model to transcribe this sample you see that this transcription is pretty much identical to the true label now let’s try to improve uh the model and for this we train it on the train subset of our data set here we initialize the weights and we freeze the feature extractor the only part that we’re going to retrain is the top of the model [Music] um here training it was done on 150 apocs and we saved three checkpoints here we noticed that the word error rate has indeed approved from 10 it went down to about 7 which is a very good result taking into account our tiny data set and short training time here we evaluate the uh word error rate score on the of the last checkpoint and it is indeed seven seven seven point three percent um if you check the word error rate on the train subset we see that it’s very small less than 0.1 percent uh which is expected as we effectively overfeeding that data if you’d like to try this model yourself uh please visit our github and google drive repositories and yeah give a like to this video see you in the next one
rn

Project SMRT - Technology walkthrough video: fine-tuning ASR deep learning model

rn

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *