By Zilun Peng, Akshay Budhkar, Jumana Nassour, Ilana Tuil and Jason Levy

Thanks to advances in speech recognition, companies can now build a whole range of products with accurate transcription capabilities at their heart. Conversation intelligence platforms, personal assistants and video and audio editing tools, for example, all rely on speech to text transcription. However, you often need to train these systems for every domain you want to transcribe, using supervised data. In practice, you need a large body of transcribed audio that’s similar to what you are transcribing just to get started in a new domain.

Recently, Facebook released…


