CLOVA Speech Recognition (CSR) overview
    • PDF

    CLOVA Speech Recognition (CSR) overview

    • PDF

    Article summary

    Available in Classic and VPC

    CLOVA Speech Recognition is a NAVER Cloud Platform service that converts human speech into text. The CLOVA Speech Recognition service provides APIs in RESTful form for various voice recognition features utilized in assistant applications, chatbots, voice memos, and more. For mobile environments, it provides APIs in the form of Android and iOS SDKs to receive voice input from users.

    Note

    CLOVA Speech allows you to upload long audio/video files to check the speech recognition results. On the other hand, the CSR (CLOVA Speech Recognition) service is optimized for imperative speech recognition within one minute.

    Common CLOVA Speech Recognition (CSR) settings

    The following describes commonly used request and response formats in CLOVA Speech Recognition APIs.

    Request

    The following describes the common request format.

    API URL

    The request API URL is as follows.

    https://naveropenapi.apigw.ntruss.com/recog/v1
    
    Note

    See the Mobile SDK documentation for platform-specific ways to use the APIs in a mobile environment.

    Request headers

    The following describes the request headers.

    FieldRequiredDescription
    x-ncp-apigw-api-key-idRequiredClient ID issued after application registration in NAVER Cloud Platform console
    x-ncp-apigw-api-keyRequiredClient secret issued after application registration in NAVER Cloud Platform console
    Content-TypeRequiredRequest data format
    • application/octet-stream
    Note

    For information on how to register an application in the NAVER Cloud Platform console to obtain authentication information (client ID, client secret) required to use the API, see CLOVA Speech Recognition (CSR) User Guide.
    In the console, make sure that the API you want to use after registering the application is selected by clicking the [Edit] button. If it is not selected, you will receive a 429 (Quota Exceed) error.

    Response

    The following describes the common response format.

    Response status codes

    The following describes the response status codes.

    • STT (Speech-to-Text)
    HTTP status codeCodeMessageDescription
    413STT000Request Entity Too LargeEntered voice data value exceeds the allowed capacity (up to 3 MB)
    413STT001Exceed Sound Data lengthEntered voice data value exceeds the allowed length (up to 60 seconds)
    400STT002Invalid Content TypeContent-Type other than application/octet-stream is entered
    400STT003Empty Sound DataNo voice data entered
    400STT004Empty LanguageLanguage (lang) parameter not entered
    400STT005Invalid LanguageEntered language (lang) parameter not supported
    500STT006Failed to pre-processingError during speech recognition preprocessing
    • Need to ensure voice data is legitimate wav, mp3, or flac file
    400STT007Too Short Sound DataVoice data is too short (400 ms or less)
    500STT998Failed to STTError during speech recognition
    • Contact Support
    500STT999Internal Server ErrorInternal server error
    • Contact Support
    • Mobile SDK
    HTTP status codeCodeMessageDescription
    -10ERROR_NETWORK_INITIALIZEError resetting network resources
    -11ERROR_NETWORK_FINALIZEError releasing network resources
    -12ERROR_NETWORK_READError receiving network data
    • Timeout due to slow network environment on client device
    -13ERROR_NETWORK_WRITEError sending network data
    • Timeout due to slow network environment on client device
    -14ERROR_NETWORK_NACKError on speech recognition server
    • Timeout due to slow network environment on the client device not sending voice packets to the server in time
    -15ERROR_INVALID_PACKETError due to sending invalid packets
    -20ERROR_AUDIO_INITIALIZEError resetting audio resources
    • Need to verify audio permissions
    -21ERROR_AUDIO_FINALIZEError releasing audio resources
    -22ERROR_AUDIO_RECORDError during voice input (recording)
    • Need to verify audio permissions
    -30ERROR_SECURITYAuthentication permission error
    -40ERROR_INVALID_RESULTRecognition result error
    -41ERROR_TIMEOUTFailed to send voice to the server for a period of time or didn't receive recognition results
    -42ERROR_NO_CLIENT_RUNNINGDetected certain speech recognition-related event in situation where client is not performing speech recognition
    -50ERROR_UNKNOWN_EVENTDetected undefined event inside the client
    -60ERROR_VERSIONProtocol version error
    -61ERROR_CLIENTINFOClient information error
    -62ERROR_SERVER_POOLNot enough servers available for speech recognition
    -63ERROR_SESSION_EXPIREDSpeech recognition server session expired
    -64ERROR_SPEECH_SIZE_EXCEEDEDSpeech packet size exceeded
    -65ERROR_EXCEED_TIME_LIMITError in timestamp for authentication
    -66ERROR_WRONG_SERVICE_TYPEInvalid service type
    -67ERROR_WRONG_LANGUAGE_TYPEInvalid language type
    -70ERROR_OPENAPI_AUTHError when authenticating with Open API
    • Invalid client ID and registered package name (Android) or Bundle ID information (iOS)
    -71ERROR_QUOTA_OVERFLOWExhausted a set API call limit (quota)
    • Other errors and inquiries
    Phenomenon or inquiryCause or solution
    UnsatifiedLinkError occurred
    • The CSR API provides libraries built with armeabi and armeabi-v7a
    • If any of the libraries used by the app you're developing don't support armeabi and armeabi-v7a, you may encounter this error
    android fatal signal 11 (sigsegv) error occurred
    • Must prepare resources before accepting voice input using the CSR API
    • Make sure initialize() and release() are called before calling recognize()
    ""(null) returned as recognition result
    • Can occur if the user speaks in a very low voice, or if the voice is not recognized due to ambient noise
    • Although it is extremely rare, it is recommended to throw an exception when the recognition result is null (empty)
    Audio file recognitionCSR API does not support audio file recognition
    Does not work well on low-end smartphonesAndroid SDK version 10 or higher, iOS version 8 or higher devices supported
    Note

    For response status codes common to NAVER Cloud Platform, see Ncloud API response status codes.

    CLOVA Speech Recognition API

    The following describes the APIs provided by the CLOVA Speech Recognition service.

    APIDescription
    STT (Speech-to-Text)Extract speech to text
    Mobile SDKExtract speech to text in mobile environment

    NAVER Cloud Platform provides a variety of related resources to help users better understand CLOVA Speech Recognition APIs.


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.