MENU
      External file recognition

        External file recognition


        Article summary

        Available in Classic and VPC

        Call the URL of an externally accessible audio/video file to recognize and convert it to text.

        Request

        The following describes the request format for the endpoint. The request format is as follows:

        MethodURI
        POST/recognizer/url

        Request headers

        For headers common to all CLOVA Speech APIs, see Common CLOVA Speech headers.

        Request body

        The following describes the request body.

        FieldTypeRequiredDescription
        urlStringRequiredAudio/video file URL
        languageStringRequiredText recognition language
        • ko-KR (default) | en-US | enko | ja | zh-cn | zh-tw
          • ko-kR: Korean
          • en-US: English
          • enko: Korean/English simultaneous recognition
          • ja: Japanese
          • zh-cn: Chinese (Simplified)
          • zh-tw: Chinese (Traditional)
        completionStringOptionalResponse method after recognition request
        • sync | async (default)
          • sync: Return results in JSON format
          • async: Return in the form of callback URL or resultToObs (ObjectStorage)
        callbackStringConditionalCallback URL
        • If completion is async, either callback or resultToObs must be entered
        userdataObjectOptionalUser data details
        wordAlignmentBooleanOptionalWhether to output speech and text alignment of recognition results
        • true (default) | false
          • true: output
          • false: no output
        fullTextBooleanOptionalWhether to output full recognition result text
        • true (default) | false
          • true: output
          • false: no output
        resultToObsBooleanConditionalWhether to save results in Object Storage
        • true | false (default)
          • true: results saved
          • false: results not saved
        • If completion is async, either callback or resultToObs must be entered
        noiseFilteringBooleanOptionalNoise filtering
        • true (default) | false
          • true: filtered
          • false: not filtered
        boostingsArrayOptionalKeyword boosting details
        • List of keywords to boost speech recognition for
        • Can't be used concurrently with useDomainBoostings
        • Up to 1000 entries allowed
        • Only available in Korean and English
          • English: lowercase conversion by default, capitalize keywords requested for boosting
        • No boosting for single-syllable words due to risk of misidentification
          • <e.g.> yes, yeah, no
        • Boosting is applied regardless of spacing
          • <e.g.> Request boosting for only one keyword between "CLOVA Speech" and "CLOVASpeech"
        • There is no restriction on keyword length, but if the phrase to be boosted is a combination of multiple words, it will not be affected by boosting unless it is that exact phrase
          • <e.g.> If you boost the keyword "CLOVA Speech," all sentences containing "CLOVA Speech" will be affected by boosting
          • <e.g.> If you boost a long keyword such as "CLOVA Speech's media speech recognition technology," sentences that contain only "CLOVA Speech" are unlikely to be affected by boosting
        useDomainBoostingsBooleanOptionalWhether to use domain boosting
        • true | false (default)
          • true: boosting used
          • false: boosting not used
        • Can't be used concurrently with boostings
        forbiddensStringOptionalSensitive keywords
        • List of keywords to reduce the speech recognition rate (if you don't want them to appear in the recognition results)
        • No limit on the number and length of keywords
        • Spaces and capitalization are required to be matched exactly
        diarizationObjectOptionalDetailed settings for speaker recognition
        diarization.enableBooleanOptionalWhether to recognize speaker
        • true (default) | false
          • true: speaker recognized
          • false: speaker not recognized
        sedObjectOptionalEvent detection result details
        sed.enableBooleanOptionalWhether to detect events
        • true | false (default)
          • true: event detected
          • false: event not detected
        formatStringOptionalResponse result return format
        • JSON (default) | SRT | SMI

        boostings

        The following describes boostings.

        FieldTypeRequiredDescription
        wordsStringOptionalList of words to keyword boost
        Note

        When requesting completion (request-and-response method) as async, the recognition result is returned as follows depending on whether there is a callback URL address or resultToObs(ObjectStorage) entered.

        Callback URLresultToObs(ObjectStorage)Result
        URL address existsTrueReturn results to both callback URL and Object Storage
        URL address existsFalseReturn results only to the callback URL
        URL address doesn't existTrueReturn results only to Object Storage
        URL address doesn't existFalseReturn an error

        Request example

        The following is a sample request.

        curl --location --request POST 'https://clovaspeech-gw.ncloud.com/external/v1/8881/5f7e1b4c866f1c605946c9236f9aa8************/recognizer/url' \
        --header 'Content-Type: application/json' \
        --header 'X-CLOVASPEECH-API-KEY: {Secret key issued when registering the app}' \
        --data '{
          "language": "ko-KR",
          "completion":"async",
          "url": "{url}",
          "resultToObs" : true
        }'
        Shell

        Response

        The following describes the response format.

        Response body

        The following describes the response body.

        FieldTypeRequiredDescription
        resultString-Response code
        messageString-Response message
        tokenString-Result token
        versionString-Engine version
        paramsObject-Parameter details
        params.serviceString-Service code
        params.domainString-Domain type
        • Use when calling the engine
        • general
        params.langString-Recognition language
        • ko | en | enko | ja | zh-cn | zh-tw
          • ko: Korean
          • en: English
          • enko: Korean/English simultaneous translation
          • ja: Japanese
          • zh-cn: Chinese (Simplified)
          • zh-tw: Chinese (Traditional)
        params.completionString-Request format
        • sync: Return results in JSON format
        • async: Return in the form of callback URL or resultToObs (ObjectStorage)
        params.callbackString-Callback URL
        params.diarizationObject-Speaker recognition (separation) details
        params.diarization.enableBoolean-Whether to recognize (separate) speaker
        • true | false
          • true: speaker recognized
          • false: speaker not recognized
        params.diarization.speakerCountMinInteger-Minimum number of speakers
        params.diarization.speakerCountMaxInteger-Maximum number of speakers
        params.sedObject-Event detection result
        params.sed.enableBoolean-Whether to detect events
        • true | false (default)
          • true: event detected
          • false: event not detected
        params.boostingsArray-Keyword boosting details
        params.forbiddensString-Sensitive keywords
        params.wordAlignmentBooleanOptionalWhether to output speech and text alignment of recognition results
        • true (default) | false
          • true: output
          • false: no output
        params.fullTextBoolean-Whether to output full recognition result text
        • true (default) | false
          • true: output
          • false: no output
        params.noiseFilteringBoolean-Noise filtering
        • true (default) | false
          • true: filtered
          • false: not filtered
        params.resultToObsBoolean-Whether to save results in Object Storage
        • Operate only if completion is async
        • true | false (default)
          • true: results saved
          • false: results not saved
        params.priorityInteger-Priority
        • 0 - 4
        • The lower the number, the higher the priority
        params.userdataObject-User data details
        params.userdata._ncp_DomainCodeString-Domain code
        • long-speech | short-speech
          • long-speech: long sentence recognition
          • short-speech: short sentence recognition
        params.userdata._ncp_DomainIdInteger-Domain ID
        params.userdata._ncp_TaskIdInteger-Task ID
        • Use to track specific recognition tasks
        params.userdata._ncp_TraceIdString-Trace ID
        • Use to track logs
        progressInteger-Recognition progress
        segmentsArray-segments details
        textString-Overall text
        confidenceDouble-Overall accuracy
        speakersArray-All speaker details
        eventsArray-Event details
        eventTypesArray-Details of all recognized events

        params.boostings

        The following describes params.boostings.

        FieldTypeRequiredDescription
        wordsString-List of words to keyword boost

        segments

        The following describes segments.

        FieldTypeRequiredDescription
        startLong-Analysis start time (ms)
        endLong-Analysis end time (ms)
        textString-Analyzed text
        confidenceDouble-Analysis accuracy
        • 0.0 - 1.0
        diarizationObject-Recognized speaker details
        diarization.labelString-Recognized speaker's number
        speakerObject-Changed speaker's details
        speaker.labelString-Changed speaker's number
        speaker.nameString-Changed speaker's name
        speaker.editedBoolean-Whether speaker is changed
        • true | false (default)
          • true: speaker changed
          • false: speaker same
        wordsArray<Long, Long, String>-List of recognized words
        words.[0]Long-Segment start time (ms)
        words.[1]Long-Segment end time (ms)
        words.[2]String-Segment text
        textEditedString-Modification details

        speakers

        The following describes speakers.

        FieldTypeRequiredDescription
        labelString-Numbers of all speakers
        nameString-Names of all speakers
        editedBoolean-Whether speaker is changed
        • true | false (default)
          • true: speaker changed
          • false: speaker same

        events

        The following describes events.

        FieldTypeRequiredDescription
        typeString-Event type
        labelString-Event name
        labelEditedString-Event change name
        startLong-Event start time
        endLong-Event end time

        eventTypes

        The following describes eventTypes.

        FieldTypeRequiredDescription
        labelString-Recognized event

        Response status codes

        For response status codes common to all CLOVA Speech APIs, see Common CLOVA Speech response status codes.

        Response example

        The following is a sample example.

        Request with async and return in JSON

        The following is a sample response requested with async and returned in JSON format.

        {
            "token": "*****f6a1015466bae2c926177f26310",
            "result": "SUCCEEDED",
            "message": "Succeeded"
        }
        JSON

        Request with sync and return in JSON

        The following is a sample response requested with sync and returned in JSON format.

        {
            "result": "COMPLETED",
            "message": "Succeeded",
            "token": "*****166039e486abbb90e4a84c3b3a5",
            "version": "ncp_v2_v2.3.0-aa6cd8d-20231205_231211-3cf30bfc_v0.0.0_",
            "params": {
                "service": "ncp",
                "domain": "general",
                "lang": "enko",
                "completion": "sync",
                "callback": "",
                "diarization": {
                    "enable": true,
                    "speakerCountMin": -1,
                    "speakerCountMax": -1
                },
                "sed": {
                    "enable": true
                },
                "boostings": [
                    {
                        "words": "Hello, test"
                    }
                ],
                "forbiddens": "",
                "wordAlignment": true,
                "fullText": true,
                "noiseFiltering": true,
                "resultToObs": false,
                "priority": 0,
                "userdata": {
                    "_ncp_DomainCode": "NEST",
                    "_ncp_DomainId": 1,
                    "_ncp_TaskId": **442,
                    "_ncp_TraceId": "*****ce98ec342d8a8c8fe9191cec343",
                    "id": 1
                }
            },
            "progress": 100,
            "keywords": {},
            "segments": [
                {
                    "start": 5870,
                    "end": 8160,
                    "text": "This is the Seoul swimming pool.",
                    "confidence": 0.9626975,
                    "diarization": {
                        "label": "2"
                    },
                    "speaker": {
                        "label": "2",
                        "name": "B",
                        "edited": false
                    },
                    "words": [
                        [
                            5871,
                            6730,
                            "This is the Seoul"
                        ],
                        [
                            6860,
                            7530,
                            "swimming pool."
                        ]
                    ],
                    "textEdited": "This is the Seoul swimming pool."
                },
                {
                    "start": 8160,
                    "end": 12950,
                    "text": "How much is the entry fee? It's 5000 KRW. Thank you.",
                    "confidence": 0.8835926,
                    "diarization": {
                        "label": "1"
                    },
                    "speaker": {
                        "label": "1",
                        "name": "A",
                        "edited": false
                    },
                    "words": [
                        [
                            8161,
                            9220,
                            "How much is"
                        ],
                        [
                            9390,
                            10020,
                            "the entry fee?"
                        ],
                        [
                            10410,
                            10640,
                            "It's 5000"
                        ],
                        [
                            10710,
                            11140,
                            "KRW."
                        ],
                        [
                            11910,
                            12500,
                            "Thank you."
                        ]
                    ],
                    "textEdited": "How much is the entry fee? It's 5000 KRW. Thank you."
                }
            ],
            "text": "This is the Seoul swimming pool. How much is the entry fee? It's 5000 KRW. Thank you.",
            "confidence": 0.9071357,
            "speakers": [
                {
                    "label": "1",
                    "name": "A",
                    "edited": false
                },
                {
                    "label": "2",
                    "name": "B",
                    "edited": false
                }
            ],
            "events": [
                {
                    "type": "music",
                    "label": "music",
                    "labelEdited": "music",
                    "start": 1400,
                    "end": 5000
                }
            ],
            "eventTypes": [
                "music"
            ]
        }
        JSON

        Request with sync and return in SRT

        The following is a sample response requested with sync and returned in SRT format.

        1
        00:00:00,000 --> 00:00:01,425
        A: Not long ago,
        
        2
        00:00:02,533 --> 00:00:11,550
        A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
        
        3
        00:00:11,550 --> 00:00:19,025
        A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
        
        4
        00:00:19,025 --> 00:00:26,317
        C: You thought of saccharin, a bit. You had it super sweet.
        
        5
        00:00:26,317 --> 00:00:28,240
        A: Is it corn?
        
        6
        00:00:28,240 --> 00:00:35,318
        B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
        
        7
        00:00:35,318 --> 00:00:42,800
        A: No, Chodang corn meant super sweet. No one has understood right now.
        Srt

        Request with sync and return in SMI

        The following is a sample response requested with sync and returned in SMI format.

        <SAMI>
        <Body>
          <SYNC Start=0>
            <P>A: Not long ago,
          <SYNC Start=2533>
            <P>A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
          <SYNC Start=11550>
            <P>A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
          <SYNC Start=19025>
            <P>C: You thought of saccharin, a bit. You had it super sweet.
          <SYNC Start=26317>
            <P>A: Is it corn?
          <SYNC Start=28240>
            <P>B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
          <SYNC Start=35318>
            <P>A: No, Chodang corn meant super sweet. No one has understood right now.
        </Body>
        </SAMI>
        Smi

        Was this article helpful?

        Changing your password will log you out immediately. Use the new password to log back in.
        First name must have atleast 2 characters. Numbers and special characters are not allowed.
        Last name must have atleast 1 characters. Numbers and special characters are not allowed.
        Enter a valid email
        Enter a valid password
        Your profile has been successfully updated.