CLOVA Speech overview

Prev Next

Available in Classic and VPC

CLOVA Speech is a NAVER Cloud Platform service that provides quick and easy speech recognition services through CLOVA's Neural End-to-end Speech Transcriber (NEST) speech recognition technology. It provides APIs in RESTful form for speech-based features such as transcribing long audio/video files, creating voice notes, video subtitles, and managing call transcripts.

Note

CLOVA Speech service allows you to upload long audio/video files to check the speech recognition results. On the other hand, the CLOVA Speech Recognition (CSR) service is optimized for imperative speech recognition within one minute.

Common CLOVA Speech settings

The following describes commonly used request and response formats in CLOVA Speech APIs.

Request

The following describes the common request format.

API URL

The request API URL is as follows.

API Gateway's unique invoke URL created in CLOVA Speech Domain
Note

For more information on how to check the InvokeURL, see CLOVA Speech User Guide.

Request headers

The following describes the headers.

Field Required Description
Content-Type Required Request data format
  • application/json | multipart/form-data | application/octet-stream

Response

The following describes the common response format.

Response status codes

The following describes the response status codes.

Note

For response status codes common to NAVER Cloud Platform, see Ncloud API response status codes.

HTTP status code Code Message Description
400 400 Invalid request parameters Entered request parameter value is invalid
401 401 Invalid secret Entered secret key value is invalid
400 STT002 Invalid Content Type Content-Type other than application/octet-stream is entered
400 STT003 Empty Sound Data Voice data missing
400 STT005 Invalid Language Entered language (lang) parameter not supported
400 STT004 Empty Language Language (lang) parameter missing
413 STT001 Exceed Sound Data length Entered voice data value exceeds the allowed length (60 seconds)
500 STT006 Failed to pre-processing Error during speech recognition preprocessing
  • Need to ensure voice data is legitimate wav, mp3, or flac
500 STT998 Failed to STT Error during speech recognition
  • Need to contact Support
500 STT999 Internal Server Error Internal server error
  • Need to contact Support
- - SUCCEEDED Task successful
- - PROCESSING Job in progress
- - ERROR_SERVER_BUSY Server has no free resources
- - ERROR_TOKEN_INVALID Token doesn't exist
- - ERROR_AUDIO_EMPTY Voice data value doesn't exist
- - ERROR_AUDIO_CONVERSION Failed to convert voice
- - ERROR_PARAMS_FORMAT_INVALID Entered parameter format is not JSON
- - ERROR_REQUEST_PARAMETER Entered request parameter is invalid
- - ERROR_REQUEST_PARAMETER Speaker not recognized
- - ERROR_INVALID_SECRET Entered secret key value is invalid
- - ERROR_DATA_NOT_FOUND Internal server errors
- - ERROR_DATA_CONFLICT Data conflict
- - ERROR_INTERNAL_ERROR Internal server errors
- - ERROR_EXTERNAL_ERROR Service not operational
- - ERROR_TOO_MANY_JOBS Too many jobs
- - ERROR_GATEWAY_TIMEOUT Timeout
- - FAILED Other errors

CLOVA Speech API

The following describes the APIs provided by the CLOVA Speech service.

API Description
Long sentence recognition > Object Storage file recognition Recognize long text as unique URLs of media files stored in Object Storage on NAVER Cloud Platform
Long sentence recognition > External file recognition Recognize long text from unique URLs of publicly available audio files
Long sentence recognition > Local file recognition Recognize long text for local files
Long sentence recognition > Get job status Check the job status asynchronously
Short sentence recognition Recognize short voice files up to 60 seconds long
Live streaming recognition Real-time speech recognition and text-to-speech

CLOVA Speech related resources

NAVER Cloud Platform provides a variety of related resources to help users better understand CLOVA Speech APIs.