CLOVA Speech Recognition (CSR) overview

English

CLOVA Speech Recognition (CSR) overview

Article summary

Did you find this summary helpful?

Thank you for your feedback

Available in Classic and VPC

CLOVA Speech Recognition is a NAVER Cloud Platform service that converts human speech into text. The CLOVA Speech Recognition service provides APIs in RESTful form for various voice recognition features utilized in assistant applications, chatbots, voice memos, and more. For mobile environments, it provides APIs in the form of Android and iOS SDKs to receive voice input from users.

Note

CLOVA Speech allows you to upload long audio/video files to check the speech recognition results. On the other hand, the CSR (CLOVA Speech Recognition) service is optimized for imperative speech recognition within one minute.

Common CLOVA Speech Recognition (CSR) settings

The following describes commonly used request and response formats in CLOVA Speech Recognition APIs.

Request

The following describes the common request format.

API URL

The request API URL is as follows.

https://naveropenapi.apigw.ntruss.com/recog/v1

Note

See the Mobile SDK documentation for platform-specific ways to use the APIs in a mobile environment.

Request headers

The following describes the request headers.

Field	Required	Description
`x-ncp-apigw-api-key-id`	Required	Client ID issued after application registration in NAVER Cloud Platform console
`x-ncp-apigw-api-key`	Required	Client secret issued after application registration in NAVER Cloud Platform console
`Content-Type`	Required	Request data format `application/octet-stream`

Note

For information on how to register an application in the NAVER Cloud Platform console to obtain authentication information (client ID, client secret) required to use the API, see CLOVA Speech Recognition (CSR) User Guide.
In the console, make sure that the API you want to use after registering the application is selected by clicking the [Edit] button. If it is not selected, you will receive a 429 (Quota Exceed) error.

Response

The following describes the common response format.

Response status codes

The following describes the response status codes.

STT (Speech-to-Text)

HTTP status code	Code	Message	Description
413	STT000	Request Entity Too Large	Entered voice data value exceeds the allowed capacity (up to 3 MB)
413	STT001	Exceed Sound Data length	Entered voice data value exceeds the allowed length (up to 60 seconds)
400	STT002	Invalid Content Type	`Content-Type` other than `application/octet-stream` is entered
400	STT003	Empty Sound Data	No voice data entered
400	STT004	Empty Language	Language (`lang`) parameter not entered
400	STT005	Invalid Language	Entered language (`lang`) parameter not supported
500	STT006	Failed to pre-processing	Error during speech recognition preprocessing Need to ensure voice data is legitimate `wav`, `mp3`, or `flac` file
400	STT007	Too Short Sound Data	Voice data is too short (400 ms or less)
500	STT998	Failed to STT	Error during speech recognition Contact Support
500	STT999	Internal Server Error	Internal server error Contact Support

Mobile SDK

HTTP status code	Code	Message	Description
-	10	ERROR_NETWORK_INITIALIZE	Error resetting network resources
-	11	ERROR_NETWORK_FINALIZE	Error releasing network resources
-	12	ERROR_NETWORK_READ	Error receiving network data Timeout due to slow network environment on client device
-	13	ERROR_NETWORK_WRITE	Error sending network data Timeout due to slow network environment on client device
-	14	ERROR_NETWORK_NACK	Error on speech recognition server Timeout due to slow network environment on the client device not sending voice packets to the server in time
-	15	ERROR_INVALID_PACKET	Error due to sending invalid packets
-	20	ERROR_AUDIO_INITIALIZE	Error resetting audio resources Need to verify audio permissions
-	21	ERROR_AUDIO_FINALIZE	Error releasing audio resources
-	22	ERROR_AUDIO_RECORD	Error during voice input (recording) Need to verify audio permissions
-	30	ERROR_SECURITY	Authentication permission error
-	40	ERROR_INVALID_RESULT	Recognition result error
-	41	ERROR_TIMEOUT	Failed to send voice to the server for a period of time or didn't receive recognition results
-	42	ERROR_NO_CLIENT_RUNNING	Detected certain speech recognition-related event in situation where client is not performing speech recognition
-	50	ERROR_UNKNOWN_EVENT	Detected undefined event inside the client
-	60	ERROR_VERSION	Protocol version error
-	61	ERROR_CLIENTINFO	Client information error
-	62	ERROR_SERVER_POOL	Not enough servers available for speech recognition
-	63	ERROR_SESSION_EXPIRED	Speech recognition server session expired
-	64	ERROR_SPEECH_SIZE_EXCEEDED	Speech packet size exceeded
-	65	ERROR_EXCEED_TIME_LIMIT	Error in timestamp for authentication
-	66	ERROR_WRONG_SERVICE_TYPE	Invalid service type
-	67	ERROR_WRONG_LANGUAGE_TYPE	Invalid language type
-	70	ERROR_OPENAPI_AUTH	Error when authenticating with Open API Invalid client ID and registered package name (Android) or Bundle ID information (iOS)
-	71	ERROR_QUOTA_OVERFLOW	Exhausted a set API call limit (quota)

Other errors and inquiries

Phenomenon or inquiry	Cause or solution
UnsatifiedLinkError occurred	The CSR API provides libraries built with armeabi and armeabi-v7a If any of the libraries used by the app you're developing don't support armeabi and armeabi-v7a, you may encounter this error
android fatal signal 11 (sigsegv) error occurred	Must prepare resources before accepting voice input using the CSR API Make sure `initialize()` and `release()` are called before calling `recognize()`
`""(null)` returned as recognition result	Can occur if the user speaks in a very low voice, or if the voice is not recognized due to ambient noise Although it is extremely rare, it is recommended to throw an exception when the recognition result is null (empty)
Audio file recognition	CSR API does not support audio file recognition
Does not work well on low-end smartphones	Android SDK version 10 or higher, iOS version 8 or higher devices supported

Note

For response status codes common to NAVER Cloud Platform, see Ncloud API response status codes.

CLOVA Speech Recognition API

The following describes the APIs provided by the CLOVA Speech Recognition service.

API	Description
STT (Speech-to-Text)	Extract speech to text
Mobile SDK	Extract speech to text in mobile environment

NAVER Cloud Platform provides a variety of related resources to help users better understand CLOVA Speech Recognition APIs.

CLOVA Speech Recognition API guides
- API overview: how to issue and check access key and secret key issued by NAVER Cloud Platform, how to generate the signature required for request headers
- API Gateway User Guide: how to check the API key required for the request header
- Common Ncloud response status codes: information on common response status codes of NAVER Cloud Platform used by the CLOVA Speech Recognition service
How to use the CLOVA Speech Recognition service
- CLOVA Speech Recognition User Guide: how to use CLOVA Speech Recognition in the NAVER Cloud Platform console
- Ncloud use environment guide: guide for VPC and Classic environments and support availability
- Introduction to pricing, characteristics, and detailed features: the summary of pricing system, characteristics, and detailed features of CLOVA Speech Recognition
- Latest service news: the latest news on CLOVA Speech Recognition
- FAQ: frequently asked questions from CLOVA Speech Recognition users
- Contact Us: Send direct inquiries in case of any unresolved questions that aren't answered by the user guides.

Was this article helpful?

What's Next

STT (Speech-to-Text)

Table of contents

Common CLOVA Speech Recognition (CSR) settings
CLOVA Speech Recognition API
CLOVA Speech Recognition related resources

CLOVA Speech Recognition (CSR) overview

Common CLOVA Speech Recognition (CSR) settings

Request

API URL

Request headers

Response

Response status codes

CLOVA Speech Recognition API

CLOVA Speech Recognition related resources

What's Next