faster-whisper is a reimplementation of OpenAI’s Whisper model using CTranslate2
, an engine designed for fast inference of Transformer models. The overall speed is significantly improved.
Below is a simple example of generating subtitles. First, install faster_whisper
and pysubs2
:
# pip install faster_whisper pysubs2
from faster_whisper import WhisperModel
import pysubs2
model = WhisperModel(model_size='large-v2')
segments, _ = model.transcribe(audio='audio.mp3')
# Prepare results for SRT file format
results = []
for s in segments:
segment_dict = {'start': s.start, 'end': s.end, 'text': s.text}
results.append(segment_dict)
subs = pysubs2.load_from_whisper(results)
subs.save('output.srt') # save srt file
You can modify it to display a progress bar using tqdm
:
from faster_whisper import WhisperModel
import pysubs2
from tqdm import tqdm
model = WhisperModel(model_size='large-v2')
segments, info = model.transcribe(audio='audio.mp3')
# Prepare results for SRT file format
results = []
timestamps = 0.0 # for progress bar
with tqdm(total=info.duration, unit=" audio seconds") as pbar:
for seg in segments:
segment_dict = {'start': seg.start, 'end': seg.end, 'text': seg.text}
results.append(segment_dict)
# Update progress bar based on segment duration
pbar.update(seg.end - timestamps)
timestamps = seg.end
# Handle silence at the end of the audio
if timestamps < info.duration:
pbar.update(info.duration - timestamps)
subs = pysubs2.load_from_whisper(results)
subs.save('output.srt') # save srt file
Additionally, here’s a Dockerfile to set up the environment:
# Use the official NVIDIA CUDA image as the base image
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
ARG DEBIAN_FRONTEND=noninteractive
# Install necessary dependencies
RUN apt-get update && apt-get install -y \
wget \
python3 \
python3-pip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Set the working directory inside the container
WORKDIR /app
# Install required Python packages
RUN pip install faster_whisper pysubs2 tqdm
# Create directories to store the models
RUN mkdir -p /models/faster-whisper-medium
# Download the medium model using wget to the specified directory
RUN wget -O /models/faster-whisper-medium/config.json https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/config.json && \
wget -O /models/faster-whisper-medium/model.bin https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/model.bin && \
wget -O /models/faster-whisper-medium/tokenizer.json https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/tokenizer.json && \
wget -O /models/faster-whisper-medium/vocabulary.txt https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/vocabulary.txt
COPY app.py /app/
# Run the script
CMD ["python3", "app.py"]
Source Code: https://github.com/taka-wang/docker-whisper