faster-whisper is a reimplementation of OpenAI’s Whisper model using CTranslate2, an engine designed for fast inference of Transformer models. The overall speed is significantly improved.

Below is a simple example of generating subtitles. First, install faster_whisper and pysubs2:

# pip install faster_whisper pysubs2
from faster_whisper import WhisperModel
import pysubs2

model = WhisperModel(model_size='large-v2')
segments, _ = model.transcribe(audio='audio.mp3')

# Prepare results for SRT file format
results = []
for s in segments:
    segment_dict = {'start': s.start, 'end': s.end, 'text': s.text}
    results.append(segment_dict)

subs = pysubs2.load_from_whisper(results)
subs.save('output.srt')  # save srt file

You can modify it to display a progress bar using tqdm:

from faster_whisper import WhisperModel
import pysubs2
from tqdm import tqdm

model = WhisperModel(model_size='large-v2')
segments, info = model.transcribe(audio='audio.mp3')

# Prepare results for SRT file format
results = []
timestamps = 0.0  # for progress bar
with tqdm(total=info.duration, unit=" audio seconds") as pbar:
    for seg in segments:
        segment_dict = {'start': seg.start, 'end': seg.end, 'text': seg.text}
        results.append(segment_dict)
        # Update progress bar based on segment duration
        pbar.update(seg.end - timestamps)
        timestamps = seg.end

    # Handle silence at the end of the audio
    if timestamps < info.duration:
        pbar.update(info.duration - timestamps)

subs = pysubs2.load_from_whisper(results)
subs.save('output.srt')  # save srt file

Additionally, here’s a Dockerfile to set up the environment:

# Use the official NVIDIA CUDA image as the base image
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
ARG DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies
RUN apt-get update && apt-get install -y \
    wget \
    python3 \
    python3-pip \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Set the working directory inside the container
WORKDIR /app

# Install required Python packages
RUN pip install faster_whisper pysubs2 tqdm

# Create directories to store the models
RUN mkdir -p /models/faster-whisper-medium

# Download the medium model using wget to the specified directory
RUN wget -O /models/faster-whisper-medium/config.json https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/config.json && \
    wget -O /models/faster-whisper-medium/model.bin https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/model.bin && \
    wget -O /models/faster-whisper-medium/tokenizer.json https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/tokenizer.json && \
    wget -O /models/faster-whisper-medium/vocabulary.txt https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/vocabulary.txt

COPY app.py /app/

# Run the script
CMD ["python3", "app.py"]

Source Code: https://github.com/taka-wang/docker-whisper