This repository has been archived on 2023-05-31. You can view files and clone it, but cannot push or open issues/pull-requests.
 
 
 
 
 
 
Go to file
pluja fb4c014a90 Update 'README.md' 2023-05-13 22:26:40 +00:00
backend Some improvements and fixes 2023-03-22 18:43:37 +01:00
docker v2 beta - openai api on backend 2023-03-02 12:02:59 +01:00
frontend Fix #37 2023-05-02 23:36:11 +02:00
.gitignore add files local folder 2023-03-03 11:42:24 +01:00
LICENSE Add LICENSE (#8) 2022-12-11 18:42:14 +00:00
README.md Update 'README.md' 2023-05-13 22:26:40 +00:00
backend.Dockerfile Fix #36 2023-05-03 00:12:15 +02:00
backend.apionly.Dockerfile Add ability to disable local whisper 2023-03-02 13:25:28 +01:00
example.docker-compose.yml Add ability to disable local whisper 2023-03-02 13:25:28 +01:00
example.env Add ability to disable local whisper 2023-03-02 13:25:28 +01:00
frontend.Dockerfile Some improvements and fixes 2023-03-22 18:43:37 +01:00

README.md

Currently working on an improved version. See web whisper plus

Please don't upload to GitHub

🎶 Convert any audio to text 📝

Changelog · Setup · Demo


A light user interface for OpenAI's Whisper right into your browser!

WEB WHISPER

Features:

  • Record and transcribe audio right from your browser.
  • Run it 100% locally, or you can make use of OpenAI Whisper API.
    • Ability to switch between API and LOCAL mode.
  • Upload any media file (video, audio) in any format and transcribe it.
    • Option to cut audio to X seconds before transcription.
    • Option to disable file uploads.
  • Enter a video URL to transcribe it to text (uses yt-dlp for getting video).
  • Select input audio language.
    • Auto-detect input audio language.
  • Option to speed up audio by 2x for faster results (this has negative impact on accuracy).
  • Translate input audio transcription to english.
  • Download .srt subtitle file generated from audio.
  • Option to enable transcription history.
  • Configure whisper
    • Choose the Whisper model you want to use (tiny, base, small...)
    • Configure the number of threads and processors to use.
  • Docker compose for easy self-hosting
  • Privacy respecting (when run locally):
    • All happens locally. No third parties involved.
    • Option to delete all files immediately after processing.
    • Option keep files for later use / download.
  • Uses C++ whisper version from whisper.cpp.
    • You don't need a GPU, uses CPU.
    • No need for complex installations.
  • Backend written in Go
  • Lightweight and beautiful UI.
    • Frontend written with Svelte and Tailwind CSS.

Roadmap:

  • Ability to transcribe videos from a URL using the API.
  • Summarize transcriptions via ChatGPT API.

Test it!

You can easily self host your own instance with docker (locally or in a server).

Also, I have made testing instance available at: https://whisper.r3d.red

Note that this instance is limited:

  • Maximum of 10 seconds audio recordings
  • File uploads are disabled.
  • Uses the base model.

Screenshots

*Logo generated with Stable Diffusion*

Main page

Video options

Recording

Transcription Options

Processing

Result

Other information

How fast is this?

Whisper.cpp usually provides faster results than the python implementation. Although it will highly depend on your machine resources, the length of the media source and the file size. Here is a little benchmark:

Processor RAM Threads Processors Length Size Elapsed time
i7 16 4 1 30m 7MB 7m 38s
i7 16 8 1 30s < 1MB 5s

What is the difference between models?

There are several models, which differ by size. The size difference is related to having more or less parameters. The more parameters the better it can "understand" what it is listening to (less errors). With smaller models, more errors will occur (i.e. confusing words).

Also note that when using bigger models, the transcription time and the memory usage will increase:

Model Disk Mem (since v1.6.1)
tiny 75 MB ~125 MB
base 142 MB ~210 MB
small 466 MB ~600 MB
medium 1.5 GB ~1.7 GB
large 2.9 GB ~3.3 GB

Table from Whisper.cpp repo.

How accurate is this?

Not all languages provide the same accuracy when using Whisper. Please, take a look at the following graphic to see the Languages and their related WER (Word Error Ratio). The smaller the WER, the better the model will understand the language.

Image from original Whisper repo.

Similar projects

  • Whisper WASM - If you want to run Whisper directly in your browser without the need of a server, you can use this project. Note that performance for this version is not very good.