Official documentation: https://platform.openai.com/docs/guides/speech-to-text
Transcribes audio into the input language as text.
Request Parameters
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
ID of the model to use. Available models: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe.
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
The format of the transcript output, in one of these options: json, text.
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Response
The transcribed text content.
curl -X POST https://api.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/audio.mp3" \
-F model="gpt-4o-transcribe" \
-F response_format="json"
{
"text": "One two three four five six seven eight nine ten"
}