.. | ||
src | ||
package.json | ||
README.md | ||
tsconfig.json | ||
tsconfig.types.json |
Transcription
Video transcription consists in transcribing the audio content of a video to a text.
This process might be called Automatic Speech Recognition or Speech to Text in more general context.
Provide a common API to many transcription backend, currently:
openai-whisper
CLIfaster-whisper
(viawhisper-ctranslate2
CLI)
Potential candidates could be: whisper-cpp, vosk, ...
Requirements
- Python 3
- PIP
And at least one of the following transcription backend:
- Python:
openai-whisper
whisper-ctranslate2>=0.4.3
Usage
Create a transcriber manually:
import { OpenaiTranscriber } from '@peertube/peertube-transcription'
(async () => {
// Optional if you want to use a local installation of transcribe engines
const binDirectory = 'local/pip/path/bin'
// Create a transcriber powered by OpenAI Whisper CLI
const transcriber = new OpenaiTranscriber({
name: 'openai-whisper',
command: 'whisper',
languageDetection: true,
binDirectory
});
// If not installed globally, install the transcriber engine (use pip under the hood)
await transcriber.install('local/pip/path')
// Transcribe
const transcriptFile = await transcriber.transcribe({
mediaFilePath: './myVideo.mp4',
model: 'tiny',
format: 'txt'
});
console.log(transcriptFile.path);
console.log(await transcriptFile.read());
})();
Using a local model file:
import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'
const transcriptFile = await transcriber.transcribe({
mediaFilePath: './myVideo.mp4',
model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
format: 'txt'
});
You may use the builtin Factory if you're happy with the default configuration:
import { transcriberFactory } from '@peertube/peertube-transcription'
transcriberFactory.createFromEngineName({
engineName: transcriberName,
logger: compatibleWinstonLogger,
transcriptDirectory: '/tmp/transcription'
})
For further usage ../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts
Lexicon
- ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
- GPTs: Generative Pre-Trained Transformers
- LLM: Large Language Models
- NLP: Natural Language Processing
- MLP: Multilayer Perceptron
- ASR: Automatic Speech Recognition
- WER: Word Error Rate
- CER: Character Error Rate