CLOVA Speech Short Text Recognition API
- Print
- PDF
CLOVA Speech Short Text Recognition API
- Print
- PDF
Article Summary
Share feedback
Thanks for sharing your feedback!
Available in Classic and VPC
version
Version | Date | Changes |
---|---|---|
v1.0.0 | 2023.11.23. | Initial draft |
v1.0.1 | 2023.12.21. | Added the pronunciation check (English) feature |
Requests
Method | Request URI |
---|---|
POST | Calls with InvokeURL of API Gateway created in the CLOVA Speech domain Creates a unique call URL for each domain |
API URL
Method | Request URI |
---|---|
POST | https://clovaspeech-gw.ncloud.com/recog/v1/stt |
Request headers
Header Name | Description |
---|---|
X-CLOVASPEECH-API-KEY | {Secret Key} |
Content-Type | application/octet-stream |
Query Param
name | value | required | value |
---|---|---|---|
lang | string | true | Kor, Eng, Jpn, Chn |
assessment | bool | false | Parameter that determines whether to return the pronunciation check result (Eng only) |
utterance | string | false | Pronunciation check target text |
graph | bool | false | Parameter that determines whether to return the voice waveform |
- Assessment is enabled only when English (Eng) is selected.
Responses
Response bodies
Field Name | Type | Description |
---|---|---|
text | string | Result value of the recognized sound source |
quota | int | Sound source length (in 15-second units) |
assessment_score | int | Pronunciation score of the entire sentence (0-100) |
ref_graph | int array | Array of the voice waveform values of the standard pronunciation (positive integer, 50 samples per second) |
usr_graph | int array | Array of the voice waveform values of the entered pronunciation (positive integer, 50 samples per second) |
Example (cURL shell)
curl --location 'https://clovaspeech-gw.ncloud.com/recog/v1/stt?lang=Eng&assessment=true&graph=true' \
--header 'X-CLOVASPEECH-API-KEY: ${secret key}' \
--header 'Content-Type: application/octet-stream' \
--data '@/D:/example.mp3'
{
"text": "sunday morning in an angry creditor",
"quota": 15, "assessment_score": 14, "assessment_details": "false|{f(f):45, a(ɔː):100, l(l):97, se(s):43} ",
"ref_graph": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 4, 6, 8, 10, 11, 13, 15, 17, 18, 20, 21, 21, 22, 21, 21, 21, 20, 20, 19, 18, 17, 15, 14, 12, 11, 9, 7, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0
],
"usr_graph": [
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 6, 7, 9, 11, 13, 15, 16, 18, 19, 20, 21, 21, 21, 21, 20, 20, 19, 18, 17, 16, 15, 13, 12, 10, 8, 6, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0
]
}
Error codes
{
"timestamp": 1700536699045,
"error": {
"errorCode": "STT005",
"message": "Invalid Language"
}
}
API errors
HttpStatusCode | ErrorCode | ErrorMessage | Description |
---|---|---|---|
400 | 400 | - | Invalid request parameters |
401 | 401 | Invalid secret | Invalid secret |
413 | STT001 | Exceed Sound Data length | Voice data length limit exceeded (60 seconds) |
400 | STT002 | Invalid Content Type | content-type other than application/octet-stream |
400 | STT003 | Empty Sound Data | No voice data entered |
400 | STT005 | Invalid Language | Entered data not in the selected language |
400 | STT004 | Empty Language | No language parameter entered |
500 | STT006 | Failed to pre-processing | Error during voice recognition pre-processing: check if the voice data is in the proper wav, mp3 or flac format |
500 | STT998 | Failed to STT | Error during voice recognition (Contact Customer Support for prompt action) |
500 | STT999 | Internal Server Error | Unknown error (Contact Customer Support for prompt action) |
Was this article helpful?