CLOVA Speech short text recognition API
- Print
- PDF
CLOVA Speech short text recognition API
- Print
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback
The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.
Available in Classic and VPC
version
Version | Date | Changes |
---|---|---|
v1.0.0 | 2023.11.23. | Initial draft |
v1.0.1 | 2023.12.21. | Added the pronunciation check (English) feature |
Requests
Method | Request URI |
---|---|
POST | Calls with InvokeURL of API Gateway created in the CLOVA Speech domain Creates a unique call URL for each domain |
API URL
Method | Request URI |
---|---|
POST | https://clovaspeech-gw.ncloud.com/recog/v1/stt |
Request headers
Header Name | Description |
---|---|
X-CLOVASPEECH-API-KEY | {Secret Key} |
Content-Type | application/octet-stream |
Query Param
name | value | required | value |
---|---|---|---|
lang | string | true | Kor, Eng, Jpn, Chn |
assessment | bool | false | Parameter that determines whether to return the pronunciation check result (Eng only) |
utterance | string | false | Pronunciation check target text |
graph | bool | false | Parameter that determines whether to return the voice waveform |
- Assessment is enabled only when English (Eng) is selected.
Responses
Response bodies
Field Name | Type | Description |
---|---|---|
text | string | Result value of the recognized sound source |
quota | int | Sound source length (in 15-second units) |
assessment_score | int | Pronunciation score of the entire sentence (0-100) |
ref_graph | int array | Array of the voice waveform values of the standard pronunciation (positive integer, 50 samples per second) |
usr_graph | int array | Array of the voice waveform values of the entered pronunciation (positive integer, 50 samples per second) |
Example (cURL shell)
curl --location 'https://clovaspeech-gw.ncloud.com/recog/v1/stt?lang=Eng&assessment=true&graph=true' \
--header 'X-CLOVASPEECH-API-KEY: ${secret key}' \
--header 'Content-Type: application/octet-stream' \
--data '@/D:/example.mp3'
{
"text": "sunday morning in an angry creditor",
"quota": 15, "assessment_score": 14, "assessment_details": "false|{f(f):45, a(ɔː):100, l(l):97, se(s):43} ",
"ref_graph": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 4, 6, 8, 10, 11, 13, 15, 17, 18, 20, 21, 21, 22, 21, 21, 21, 20, 20, 19, 18, 17, 15, 14, 12, 11, 9, 7, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0
],
"usr_graph": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 6, 7, 9, 11, 13, 15, 16, 18, 19, 20, 21, 21, 21, 21, 20, 20, 19, 18, 17, 16, 15, 13, 12, 10, 8, 6, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0
]
}
Error codes
{
"timestamp": 1700536699045,
"error": {
"errorCode": "STT005",
"message": "Invalid Language"
}
}
API errors
HttpStatusCode | ErrorCode | ErrorMessage | Description |
---|---|---|---|
400 | 400 | - | Invalid request parameters |
401 | 401 | Invalid secret | Invalid secret |
413 | STT001 | Exceed Sound Data length | Voice data length limit exceeded (60 seconds) |
400 | STT002 | Invalid Content Type | content-type other than application/octet-stream |
400 | STT003 | Empty Sound Data | No voice data entered |
400 | STT005 | Invalid Language | Entered data not in the selected language |
400 | STT004 | Empty Language | No language parameter entered |
500 | STT006 | Failed to pre-processing | Error during voice recognition pre-processing: check if the voice data is in the proper wav, mp3 or flac format |
500 | STT998 | Failed to STT | Error during voice recognition (Contact Customer Support for prompt action) |
500 | STT999 | Internal Server Error | Unknown error (Contact Customer Support for prompt action) |
Was this article helpful?