POST
/
v1
/
audio
/
transcriptions
Audio Transcriptions Whisper
curl --request POST \
  --url https://api.example.com/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'
Transcribe audio files into text using OpenAI’s Whisper-1 model.

Overview

The Whisper-1 model provides high-quality speech-to-text transcription. It supports multiple languages and can handle various audio formats.

Authentication

All requests require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Parameters

file
file
required
The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
model
string
required
ID of the model to use. Use whisper-1.
language
string
The language of the input audio in ISO-639-1 format. Improves accuracy and latency.
prompt
string
Optional text to guide the model’s style or continue a previous segment.
response_format
string
default:"json"
Output format: json, text, srt, verbose_json, or vtt.
temperature
number
default:"0"
Sampling temperature between 0 and 1.

Request Example

curl -X POST https://api.example.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/audio.mp3" \
  -F model="whisper-1" \
  -F language="en" \
  -F response_format="json"

Response Example (JSON)

{
  "text": "Hello, this is a sample transcription of the audio file. The Whisper model has converted the speech to text accurately."
}

Response Example (Verbose JSON)

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.5,
  "text": "Hello, this is a sample transcription.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a sample",
      "tokens": [50364, 2425, 11, 341, 307, 257, 6889],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    }
  ]
}

Supported Audio Formats

FormatExtension
FLAC.flac
MP3.mp3
MP4.mp4
MPEG.mpeg
MPGA.mpga
M4A.m4a
OGG.ogg
WAV.wav
WebM.webm

Response Format Options

FormatDescription
jsonSimple JSON with text field
textPlain text output
srtSubRip subtitle format
verbose_jsonJSON with timestamps and segments
vttWebVTT subtitle format

Available Models

  • whisper-1