CLOVA Speech short text recognition API

Prev Next

The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

Available in Classic and VPC

version

Version Date Changes
v1.0.0 2023.11.23. Initial draft
v1.0.1 2023.12.21. Added the pronunciation check (English) feature

Requests

Method Request URI
POST Calls with InvokeURL of API Gateway created in the CLOVA Speech domain
Creates a unique call URL for each domain

API URL

Method Request URI
POST https://clovaspeech-gw.ncloud.com/recog/v1/stt

Request headers

Header Name Description
X-CLOVASPEECH-API-KEY {Secret Key}
Content-Type application/octet-stream

Query Param

name value required value
lang string true Kor, Eng, Jpn, Chn
assessment bool false Parameter that determines whether to return the pronunciation check result (Eng only)
utterance string false Pronunciation check target text
graph bool false Parameter that determines whether to return the voice waveform
  • Assessment is enabled only when English (Eng) is selected.

Responses

Response bodies

Field Name Type Description
text string Result value of the recognized sound source
quota int Sound source length (in 15-second units)
assessment_score int Pronunciation score of the entire sentence (0-100)
ref_graph int array Array of the voice waveform values of the standard pronunciation (positive integer, 50 samples per second)
usr_graph int array Array of the voice waveform values of the entered pronunciation (positive integer, 50 samples per second)

Example (cURL shell)

curl --location 'https://clovaspeech-gw.ncloud.com/recog/v1/stt?lang=Eng&assessment=true&graph=true' \
--header 'X-CLOVASPEECH-API-KEY: ${secret key}' \
--header 'Content-Type: application/octet-stream' \
--data '@/D:/example.mp3'
{
    "text": "sunday morning in an angry creditor",
    "quota": 15, "assessment_score": 14, "assessment_details": "false|{f(f):45, a(ɔː):100, l(l):97, se(s):43} ",
    "ref_graph": [
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 4, 6, 8, 10, 11, 13, 15, 17, 18, 20, 21, 21, 22, 21, 21, 21, 20, 20, 19, 18, 17, 15, 14, 12, 11, 9, 7, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0
    ],
    "usr_graph": [
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 6, 7, 9, 11, 13, 15, 16, 18, 19, 20, 21, 21, 21, 21, 20, 20, 19, 18, 17, 16, 15, 13, 12, 10, 8, 6, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0
    ]
}

Error codes

{
    "timestamp": 1700536699045,
    "error": {
        "errorCode": "STT005",
        "message": "Invalid Language"
    }
}

API errors

HttpStatusCode ErrorCode ErrorMessage Description
400 400 - Invalid request parameters
401 401 Invalid secret Invalid secret
413 STT001 Exceed Sound Data length Voice data length limit exceeded (60 seconds)
400 STT002 Invalid Content Type content-type other than application/octet-stream
400 STT003 Empty Sound Data No voice data entered
400 STT005 Invalid Language Entered data not in the selected language
400 STT004 Empty Language No language parameter entered
500 STT006 Failed to pre-processing Error during voice recognition pre-processing: check if the voice data is in the proper wav, mp3 or flac format
500 STT998 Failed to STT Error during voice recognition (Contact Customer Support for prompt action)
500 STT999 Internal Server Error Unknown error (Contact Customer Support for prompt action)