Here you will find the definition of some of the features that I think that need a bit of description.
Local mode vs. API mode
Web-whisper allows you to add an OpenAI token and make use of the OpenAI API as backend, this brings some advantages as well as some concerns.
Web-whisper allows you to have both modes at the same time and switch between them from the UI itself.
API mode pros:
- API mode runs very fast and always uses the large-v2 model, giving high quality and fast transcriptions.
- OpenAI API is cheap, at $0.006/min.
- With API mode, you won't have any hardware requirements.
API mode cons:
- You require internet connection.
- Your files will be sent to OpenAI servers for processing, which can have some privacy implications.
- Max file size of 25 MB. You will need to compress any larger files or break into chunks of 25 MB or less.
Audio x2
This feature speeds up the audio input by x2. This means that it will be processed much faster, but it will also have a negative impact on accuracy of word detection.
This feature works very good when the audio source is from a slow speaker. It will also work better with most accurate languages.
Language auto-detection
When selecting the language from the dropdown, you can select the "auto" option. If this option is selected, whisper will attempt to guess the language of the speaker. This works good enough most of the time with the base
model at least and works better with most accurate languages.