faster-whisper 使用CTranslate2重新實現 OpenAI 的 Whisper 模型,CTranslate2是一個用於 Transformer 模型快速推論的引擎。整體速度提升不少,前提還是要有 GPU。

以下是產生字幕的簡單範例,請先安裝 faster_whisperpysubs2

transcribe without progress bar
# pip install faster_whisper pysubs2
from faster_whisper import WhisperModel
import pysubs2

model = WhisperModel(model_size = 'large-v2')
segments, _ = model.transcribe(audio='audio.mp3')

# to use pysubs2, the argument must be a segment list-of-dicts
results= []
for s in segments:
    segment_dict = {'start':s.start,'end':s.end,'text':s.text}
    results.append(segment_dict)

subs = pysubs2.load_from_whisper(results)
subs.save('output.srt') #save srt file

我們可以這樣改寫,讓他透過 tqdm 產生進度條

transcribe with progress bar
from faster_whisper import WhisperModel
import pysubs2

model = WhisperModel(model_size = 'large-v2')
segments, _ = model.transcribe(audio='audio.mp3')

  # Prepare results for SRT file format
  results = []
  timestamps = 0.0  # for progress bar
  with tqdm(total=info.duration, unit=" audio seconds") as pbar:
      for seg in segments:
          segment_dict = {'start': seg.start, 'end': seg.end, 'text': seg.text}
          results.append(segment_dict)
          # Update progress bar based on segment duration
          pbar.update(seg.end - timestamps)
          timestamps = seg.end

      # Handle silence at the end of the audio
      if timestamps < info.duration:
          pbar.update(info.duration - timestamps)

subs = pysubs2.load_from_whisper(results)
subs.save('output.srt') #save srt file

順便附上 Docker file

Dockerfile
# Use the official NVIDIA CUDA image as the base image
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
ARG DEBIAN_FRONTEND=noninteractive

# Install necessary dependencies
RUN apt-get update && apt-get install -y \
    wget \
    python3 \
    python3-pip \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Set the working directory inside the container
WORKDIR /app

# Install required Python packages
RUN pip install faster_whisper pysubs2

# Create directories to store the models
RUN mkdir -p /models/faster-whisper-medium

# Download the medium model using wget to the specified directory
RUN wget -O /models/faster-whisper-medium/config.json https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/config.json && \
    wget -O /models/faster-whisper-medium/model.bin https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/model.bin && \
    wget -O /models/faster-whisper-medium/tokenizer.json https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/tokenizer.json && \
    wget -O /models/faster-whisper-medium/vocabulary.txt https://huggingface.co/guillaumekln/faster-whisper-medium/resolve/main/vocabulary.txt


COPY app.py /app/

# Run script
CMD ["python3", "app.py"]

Source Code: https://github.com/taka-wang/docker-whisper