
OpenAI, the company responsible for ChatGPT (built on GPT-3.5 architecture) has launched the second version of Whisper, an open-sourced multilingual speech recognition model.
The new model is trained for more Epochs with regularization and has improved performance, but it has the same architecture as the original large version. The team notifies that it would be updating its research paper soon.
In October, AI research and development company OpenAI released Whisper. Whisper is trained on over 680,000 hours of multilingual data from the web. Unfortunately, the dataset used for training Whisper has been kept private.
Since Whisper’s first version has been to a larger and more diverse dataset. It isn’t fine-tuned to a specific dataset, due to which it didn’t surpass other models that were specialized around LibriSpeech benchmark, one of the most noted parameters to judge speech recognition.
OpenAI’s blog said that they hope Whisper will be the foundation for building their future products and that their research on robust speech processing is just beginning.
Due to its many features, Whisper is capable of transcribing audio and translating text. It also has the ability to engage in simple conversations, create art from text, and everything in between. Currently, the company is experimenting with these various tools. This includes DALL.E 2 which can produce art from text, GPT 4 (the newest version), or even ChatGPT which is used for quick messages. But to get the most out of Whisper’s abilities, you’ll want to use it for more than just translating and transcribing audio.
The Challenge
There are many challenges when working with laptops: Laptops don’t have the same processing power that professional transcription software needs, installation isn’t always easy, and the prediction is often biased to integer timestamps.
Those models usually tend to be less accurate; obscuring the predicted distribution may help, but no conclusions can be drawn as of yet.
Consider these potential concerns.
You may be thinking about using the freemium model for your content, but before you make a decision, it’s important that you are aware of both of the risks and rewards.
According to an article on GitHub, under the “Broader Implications” section of the model card OpenAI warns that artificial intelligence could be used to automate surveillance or identify individual speakers in a conversation. In spite of this possibility, the company hopes that it will be primarily used for beneficial purposes.