Audio Transcriptions Whisper

curl --request POST \
  --url https://api.example.com/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'

POST

audio

transcriptions

curl --request POST \
  --url https://api.example.com/v1/audio/transcriptions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'

Transcribe audio files into text using OpenAI’s Whisper-1 model.

Overview

The Whisper-1 model provides high-quality speech-to-text transcription. It supports multiple languages and can handle various audio formats.

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Request Parameters

file

required

The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

model

string

required

ID of the model to use. Use whisper-1.

language

string

The language of the input audio in ISO-639-1 format. Improves accuracy and latency.

prompt

string

Optional text to guide the model’s style or continue a previous segment.

response_format

string

default:"json"

Output format: json, text, srt, verbose_json, or vtt.

temperature

number

default:"0"

Sampling temperature between 0 and 1.

Request Example

curl -X POST https://api.example.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/audio.mp3" \
  -F model="whisper-1" \
  -F language="en" \
  -F response_format="json"

Response Example (JSON)

{
  "text": "Hello, this is a sample transcription of the audio file. The Whisper model has converted the speech to text accurately."
}

Response Example (Verbose JSON)

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.5,
  "text": "Hello, this is a sample transcription.",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a sample",
      "tokens": [50364, 2425, 11, 341, 307, 257, 6889],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    }
  ]
}

Supported Audio Formats

Format	Extension
FLAC	.flac
MP3	.mp3
MP4	.mp4
MPEG	.mpeg
MPGA	.mpga
M4A	.m4a
OGG	.ogg
WAV	.wav
WebM	.webm

Response Format Options

Format	Description
json	Simple JSON with text field
text	Plain text output
srt	SubRip subtitle format
verbose_json	JSON with timestamps and segments
vtt	WebVTT subtitle format

Available Models

whisper-1

Audio Transcriptions Text to Speech

⌘I

Chat

Responses

Image Models

Video Models

GPTs

Doubao Series

Audio Transcriptions Whisper

Overview

Authentication

Request Parameters

Request Example

Response Example (JSON)

Response Example (Verbose JSON)

Supported Audio Formats

Response Format Options

Available Models

Chat

Responses

Image Models

Video Models

GPTs

Doubao Series

​Overview

​Authentication

​Request Parameters

​Request Example

​Response Example (JSON)

​Response Example (Verbose JSON)

​Supported Audio Formats

​Response Format Options

​Available Models

Overview

Authentication

Request Parameters

Request Example

Response Example (JSON)

Response Example (Verbose JSON)

Supported Audio Formats

Response Format Options

Available Models