Transcribe audio files into text using OpenAI’s Whisper-1 model.
Overview
The Whisper-1 model provides high-quality speech-to-text transcription. It supports multiple languages and can handle various audio formats.
Authentication
All requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Request Parameters
The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
ID of the model to use. Use whisper-1.
The language of the input audio in ISO-639-1 format. Improves accuracy and latency.
Optional text to guide the model’s style or continue a previous segment.
Output format: json, text, srt, verbose_json, or vtt.
Sampling temperature between 0 and 1.
Request Example
curl -X POST https://api.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/audio.mp3" \
-F model="whisper-1" \
-F language="en" \
-F response_format="json"
Response Example (JSON)
{
"text": "Hello, this is a sample transcription of the audio file. The Whisper model has converted the speech to text accurately."
}
Response Example (Verbose JSON)
{
"task": "transcribe",
"language": "english",
"duration": 5.5,
"text": "Hello, this is a sample transcription.",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.5,
"text": "Hello, this is a sample",
"tokens": [50364, 2425, 11, 341, 307, 257, 6889],
"temperature": 0.0,
"avg_logprob": -0.25,
"compression_ratio": 1.2,
"no_speech_prob": 0.01
}
]
}
| Format | Extension |
|---|
| FLAC | .flac |
| MP3 | .mp3 |
| MP4 | .mp4 |
| MPEG | .mpeg |
| MPGA | .mpga |
| M4A | .m4a |
| OGG | .ogg |
| WAV | .wav |
| WebM | .webm |
| Format | Description |
|---|
| json | Simple JSON with text field |
| text | Plain text output |
| srt | SubRip subtitle format |
| verbose_json | JSON with timestamps and segments |
| vtt | WebVTT subtitle format |
Available Models