CLOVA Speech Long Text Recognition API
    • PDF

    CLOVA Speech Long Text Recognition API

    • PDF

    Article Summary

    The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

    Available in Classic and VPC

    version

    VersionDateChanges
    v1.0.02020.9.17.Initial draft
    V1.1.02020.11.18.Added boosting and forbidden keywords
    V1.2.02021.4.8.Added speaker recognition function
    V1.3.02021.5.27.Added English recognition function
    V1.4.02021.7.22.Added Korean/English simultaneous recognition function
    V1.5.02021.11.25.Asynchronous Mode supported
    V1.6.02022.2.17.Added Japanese recognition function
    V1.7.02022.6.8.domain boosting support
    V1.8.02022.10.20.Added traditional and simplified Chinese recognition function
    V1.9.02022.12.15.Added noise filtering function
    V2.0.02023.12.21.Added event detection function

    Requests

    MethodRequest URI
    POSTCalls with InvokeURL of API Gateway created in the CLOVA Speech domain.
    Creates a unique call URL for each domain.

    How to use CLOVA Speech API

    You can select the CLOVA Speech API in one of the following three ways:

    1. Request recognition with Object Storage file's url

      : Use the unique url of the file saved in Object Storage. (The file to be recognized must be uploaded on Object Storage in advance.)

    2. Request recognition with external url

      : Use the unique url of a file accessible externally.

    3. Request by uploading file from local storage

      : Use the file system path.

    After recognition request is made, response can be made in one of the two following ways:

    1. sync

      If request is made with sync, the response result (json) can be received when recognition is completed.

    2. async
      If request is made with async, the recognition result is returned to the Callback url entered for the request or to ResultToObs(ObjectStorage).

    Callback urlresultToObs(ObjectStorage)result
    URL exists (O)TrueResult is returned both to the Callback url and Object Storage
    URL exists (O)FalseResult is returned to the Callback url only
    URL does not exist (X)TrueResult is returned to Object Storage only
    URL does not exist (X)FalseError is returned

    1. Request recognition with Object Storage file's url

    : Use the unique url of the file saved in Object Storage.
    (The file to be recognized must be uploaded on Object Storage in advance.)

    POST /recognizer/object-storage

    • recognize media from object storage
    MethodRequest URI
    POST${Invoke URL}/recognizer/object-storage

    Request headers

    Header NameDescription
    Content-Typeapplication/json

    Request bodies

    namedesctyperequirementvaluedefault
    dataKeyKey to access the Object Storage path of the file to be recognizedstringrequired
    languagelanguagestringrequiredko-KR, en-US, enko, ja, zh-cn, zh-twko-KR
    completionSelect between sync and asyncstringoptionalasync
    callbackSee the part on Callbackstringoptional
    userdatajson objectobjectoptional
    wordAlignmentOutput word alignment in the recognition resultbooleanoptionaltrue
    fullTextOutput the entire recognition result in textbooleanoptionaltrue
    resultToObsSave the result in Object Storage selected while creating the domainbooleanoptionalfalse
    noiseFilteringWhether to enable noise filteringbooleanoptionaltrue
    boostingsboosting object arrayarrayoptional
    boostings.wordscomma separated wordsstringoptional
    useDomainBoostingsuse domain boostingsbooleanoptionalfalse
    forbiddenscomma separated wordsstringoptional
    diarizationSpeaker recognition (diarization) settingobjectoptional
    diarization.enableWhether to enable speaker recognition (diarization)booleanoptionaltrue
    sedevent detectobjectoptional
    sed.enableevent detectbooleanoptionalfalse

    Example (cURL shell)

    curl --location --request POST '${Invoke URL}/recognizer/object-storage' \  
    --header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \  
    --header 'Content-Type: application/json' \  
    --data-raw '{  
      "language": "ko-KR",  
      "callback": "http://example/callback",  
      "userdata": {  
        "dataId": "1"  
      },  
      "boostings": [  
      	{  
      		"words": "comma separated words"  
      	}  
      ],  
      "forbiddens": "comma separated words",  
      "completion":"async",  
      "dataKey": "data/sample.wav"  
    }'  
    
    • Response: refer to Common Response

    2. Request recognition with external url

    Use the unique url of a file accessible externally.

    POST /recognizer/url

    • recognize media from URL
    MethodRequest URI
    POST${Invoke URL}/recognizer/url

    Request headers

    Header NameDescription
    Content-Typeapplication/json

    Request bodies

    namedesctyperequirementvaluedefault
    urlthe media URLstringrequired
    languagelanguagestringrequiredko-KR, en-US, enko, ja, zh-cn, zh-twko-KR
    completionSelect between sync and asyncstringoptionalasync
    callbackSee the part on Callbackstringoptional
    userdatajson objectobjectoptional
    wordAlignmentOutput word alignment in the recognition resultbooleanoptionaltrue
    fullTextOutput the entire recognition result in textbooleanoptionaltrue
    resultToObsSave the result in Object Storage selected while creating the domainbooleanoptionalfalse
    noiseFilteringWhether to enable noise filteringbooleanoptionaltrue
    boostingsboosting object arrayarrayoptional
    boostings.wordscomma separated wordsstringoptional
    useDomainBoostingsuse domain boostingsbooleanoptionalfalse
    forbiddenscomma separated wordsstringoptional
    diarizationSpeaker recognition (diarization) settingobjectoptional
    diarization.enableWhether to enable speaker recognition (diarization)booleanoptionaltrue
    sedevent detectobjectoptional
    sed.enableevent detectbooleanoptionalfalse
    • Keyword boosting
      • You can include lists of keywords in the API request body to enhance recognition rate.
      • This function refers to the params.boostings and , params.boostings.words fields in the request body.
      • You can enter up to 1000 keywords to boost.
      • Only Korean and English are supported for boosting.
      • One-syllable words, such as , , and , no, are not supported for boosting since they have the risk of being mis-recognized.
      • By default, all English letters in the recognition results are changed to lowercase, but if a request is made to boost uppercase keywords, lowercase letters are replaced with uppercase ones.
      • Boosting ignores spacing.
        For example, you only need to request boosting either for CLOVASpeech or CLOVA Speech.
      • There is no limit placed on keyword length, but if a phrase consisting of multiple words is boosted, nothing less than the exact phrase can benefit from the boosting. For example, if you boost the keyword "CLOVA Speech," all sentences including "CLOVA Speech" can benefit from the boosting. However, if you boost "Media voice recognition technology of CLOVA Speech," sentences that only include "CLOVA Speech" can hardly benefit from the boosting.
    • Sensitive keyword detecting
      • You can include in the API request body a list of keywords to hide in the recognition result.
      • This function refers to the params.forbiddens field in the request body.
      • There is no limit placed on the number or lengths of sensitive keywords.
      • Both spacing and capitalization must match exactly for a keyword to be detected.

    Example (cURL shell)

    curl --location --request POST '${Invoke URL}/recognizer/url' \  
    --header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \  
    --header 'Content-Type: application/json' \  
    --data-raw '{  
      "language": "ko-KR",  
      "callback": "http://example/callback",  
      "userdata": {  
        "dataId": "1"  
      },  
      "boostings": [  
      {  
        "words": "comma separated words"  
      }],  
      "forbiddens": "comma separated words",  
      "completion":"async",  
      "url": "https://kr.object.ncloudstorage.com/nest/data/IMG_3866.mp4"  
    }'  
    
    • Response: refer to Common Response

    3. Request by uploading file from local storage

    You can use the path in the local file system.

    POST /recognizer/upload

    • upload a media for recognize
    MethodRequest URI
    POST${Invoke URL}/recognizer/upload

    Request headers

    Header NameDescription
    Content-Typemultipart/form-data

    Request bodies

    namedesctyperequirementvaluedefault
    mediathe media filefilerequired
    paramsobjectrequired
    params.languagelanguagestringrequiredko-KR, en-US, enko, ja, zh-cn, zh-twko-KR
    params.completionsync, asyncstringoptionalasync
    params.callbackrefer to Callbackstringoptional
    params.userdatajson objectobjectoptional
    params.wordAlignmentOutput word alignment in the recognition resultbooleanoptionaltrue
    params.fullTextOutput the entire recognition result in textbooleanoptionaltrue
    params.resultToObsSave the result in Object Storage selected while creating the domainbooleanoptionalfalse
    params.noiseFilteringWhether to enable noise filteringbooleanoptionaltrue
    params.boostingsboosting object arrayarrayoptional
    params.boostings.wordscomma separated wordsstringoptional
    params.useDomainBoostingsuse domain boostingsbooleanoptionalfalse
    params.forbiddenscomma separated wordsstringoptional
    params.diarizationSpeaker recognition (diarization) settingobjectoptional
    params.diarization.enableWhether to enable speaker recognition (diarization)booleanoptionaltrue
    sedDetect eventobjectoptional
    sed.enableWhether to enable event detectionbooleanoptionalfalse
    • Keyword boosting
      • You can include lists of keywords in the API request body to enhance recognition rate.
      • This function refers to the params.boostings and , params.boostings.words fields in the request body.
      • You can enter up to 1000 keywords to boost.
      • Only Korean, English, Japanese and Chinese letters and numbers are supported for boosting.
      • By default, all English letters in the recognition results are changed to lowercase, but if a request is made to boost uppercase keywords, lowercase letters are replaced with uppercase ones.
      • Boosting ignores spacing.
        For example, you only need to request boosting either for CLOVASpeech or CLOVA Speech.
      • There is no limit placed on keyword length, but if a phrase consisting of multiple words is boosted, nothing less than the exact phrase can benefit from the boosting. For example, if you boost the keyword "CLOVA Speech," all sentences including "CLOVA Speech" can benefit from the boosting. However, if you boost "Media voice recognition technology of CLOVA Speech," sentences that only include "CLOVA Speech" can hardly benefit from the boosting.
    • Sensitive keyword detecting
      • You can include in the API request body a list of keywords to hide in the recognition result.
      • This function refers to the params.forbiddens field in the request body.
      • There is no limit placed on the number or lengths of sensitive keywords to be detected.
      • Both spacing and capitalization must match exactly for a keyword to be detected.

    Example (cURL shell)

    curl --location --request POST '${Invoke URL}/recognizer/upload' \
    --header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
    --form 'media=@/video/sample.wav' \
    --form 'params={"language":"ko-KR","completion":"sync","callback":"http://localhost:9010","forbiddens":"comma separated words","boostings":[{"words": "comma separated words"}]};type=application/json'
    
    • Response: refer to Common Response

    Responses

    After recognition request is made, response can be made in one of the two following ways:

    1. sync

      If request is made with sync, the response result (json) can be received when recognition is completed.

    2. async

      If request is made with async, the recognition result is returned to the Callback url entered for the request or to ResultToObs(ObjectStorage).

    Callback urlresultToObs(ObjectStorage)result
    URL exists (O)TrueResult is returned both to the Callback url and Object Storage
    URL exists (O)FalseResult is returned to the Callback url only
    URL does not exist (X)TrueResult is returned to Object Storage only
    URL does not exist (X)FalseError is returned

    Callback

    Request headers

    Header NameDescription
    Content-Typeapplication/application-json; charset=utf-8
    • Method

      Method
      POST
    • Body

      • Same as Common Response(sync)

    4. Get job status

    GET /recognizer/{token}

    • Get the status of async request
    MethodRequest URI
    GET${Invoke URL}/recognizer/{token}

    Request headers

    Header NameDescription
    Content-Typeapplication/json

    Request bodies

    namedesctyperequirementvaluedefault
    tokentokenstringrequired

    Example (cURL shell)

    curl --location --request GET '${Invoke URL}/recognizer/ceb77af3dae44a6c8c4de3dce519140a' \
    --header 'X-CLOVASPEECH-API-KEY: ${Secret Key}'
    
    • Response
    {
        "token": "ceb77af3dae44a6c8c4de3dce519140a",
        "result": "PROCESSING"
    }
    

    result:

    • WAITING
    • PROCESSING
    • FAILED
    • COMPLETED
    • TIMEOUT

    Common Response

    • Response(async)

      {
          "token": "a951af6a1015466bae2c926177f26310",
          "result": "SUCCEEDED",
          "message": "Succeeded"
      }
      
    • Response(sync)

      {
          "result": "COMPLETED",
          "message": "Succeeded",
          "token": "d3bea166039e486abbb90e4a84c3b3a5",
          "version": "ncp_v2_v2.3.0-aa6cd8d-20231205_231211-3cf30bfc_v0.0.0_",
          "params": {
              "service": "ncp",
              "domain": "general",
              "lang": "enko",
              "completion": "sync",
              "callback": "",
              "diarization": {
                  "enable": true,
                  "speakerCountMin": -1,
                  "speakerCountMax": -1
              },
              "sed": {
                  "enable": true
              },
              "boostings": [
                  {
                      "words": "Hello, test"
                  }
              ],
              "forbiddens": "",
              "wordAlignment": true,
              "fullText": true,
              "noiseFiltering": true,
              "resultToObs": false,
              "priority": 0,
              "userdata": {
                  "_ncp_DomainCode": "NEST",
                  "_ncp_DomainId": 1,
                  "_ncp_TaskId": 55442,
                  "_ncp_TraceId": "36a75ce98ec342d8a8c8fe9191cec343",
                  "id": 1
              }
          },
          "progress": 100,
          "keywords": {},
          "segments": [
              {
                  "start": 5870,
                  "end": 8160,
                  "text": "This is Seoul pool.",
                  "confidence": 0.9626975,
                  "diarization": {
                      "label": "2"
                  },
                  "speaker": {
                      "label": "2",
                      "name": "B",
                      "edited": false
                  },
                  "words": [
                      [
                          5871,
                          6730,
                          "This is"
                      ],
                      [
                          6860,
                          7530,
                          "Seoul pool."
                      ]
                  ],
                  "textEdited": "This is Seoul pool."
              },
              {
                  "start": 8160,
                  "end": 12950,
                  "text": "How much is the entrance fee? It's 5000 won. Thank you.",
                  "confidence": 0.8835926,
                  "diarization": {
                      "label": "1"
                  },
                  "speaker": {
                      "label": "1",
                      "name": "A",
                      "edited": false
                  },
                  "words": [
                      [
                          8161,
                          9220,
                          "How much is"
                      ],
                      [
                          9390,
                          10020,
                          "the entrance fee?"
                      ],
                      [
                          10410,
                          10640,
                          "It's"
                      ],
                      [
                          10710,
                          11140,
                          "5000 won."
                      ],
                      [
                          11910,
                          12500,
                          "Thank you."
                      ]
                  ],
                  "textEdited": "How much is the entrance fee? It's 5000 won. Thank you."
              }
          ],
          "text": "This is Seoul pool. How much is the entrance fee? It's 5000 won. Thank you.",
          "confidence": 0.9071357,
          "speakers": [
              {
                  "label": "1",
                  "name": "A",
                  "edited": false
              },
              {
                  "label": "2",
                  "name": "B",
                  "edited": false
              }
          ],
          "events": [
              {
                  "type": "music",
                  "label": "music",
                  "labelEdited": "music",
                  "start": 1400,
                  "end": 5000
              }
          ],
          "eventTypes": [
              "music"
          ]
      }
      
      • Body

        fielddesctype
        resultResult codestring
        messageResult messagestring
        tokenResult tokenstring
        versionEngine versionstring
        paramsParameterobject
        params: serviceService codestring
        params: domainDomainstring
        params: langRecognition languagestring
        params: completionRequest methodstring
        params: diarizationSpeaker separation dataobject
        params: diarization.enableWhether to use speaker separationboolean
        params: diarization.speakerCountMinMinimum number of speakersnumber
        params: diarization.speakerCountMaxMaximum number of speakersnumber
        params: boostingsBoosting dataarray
        params: boostings: wordsBoosting keywordstring
        params: forbiddensSensitive keywordstring
        params: fullTextWhether the entire recognition result is output in textboolean
        params: noiseFilteringWhether noise filtering is enabledboolean
        params: resultToObsWhether the result is saved in Object Storageboolean
        params: segmentSegmentstring
        params: morphemeMorphemestring
        params: completionSynchronous or asynchronousstring
        params: userdataUser dataobject
        segmentsSegment dataarray
        segments: startSegment start time (ms)number
        segments: endSegment end time (ms)number
        segments: textSegment textstring
        segments: textEditedEdits madestring
        segments: diarizationRecognized speakerobject
        segments: diarization.labelRecognized speaker numberstring
        segments: speakerReplaced speakerobject
        segments: speaker.labelReplaced speaker numberstring
        segments: speaker.nameReplaced speaker namestring
        segments: confidenceSegment confidence (0.0-1.0)number
        segments: wordsWord segmentarray
        segments: words: [0]Word segment start time (ms)number
        segments: words: [1]Word segment end time (ms)number
        segments: words: [2]Word segment textstring
        textEntire textstring
        confidenceTotal confidencenumber
        eventsEventarray
        events.typeEvent typestring
        events.labelEvent namestring
        events.labelEditedChanged event namestring
        events.startEvent start timenumber
        events.endEvent end timenumber

    Examples

    Java

    dependency

    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.5.12</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpmime</artifactId>
        <version>4.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.8.5</version>
    </dependency>
    

    ClovaSpeechClient

    package org.example.clovaspeech.client;
    
    import java.io.File;
    import java.nio.charset.StandardCharsets;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    import org.apache.http.Header;
    import org.apache.http.HttpEntity;
    import org.apache.http.client.methods.CloseableHttpResponse;
    import org.apache.http.client.methods.HttpPost;
    import org.apache.http.entity.ContentType;
    import org.apache.http.entity.StringEntity;
    import org.apache.http.entity.mime.MultipartEntityBuilder;
    import org.apache.http.impl.client.CloseableHttpClient;
    import org.apache.http.impl.client.HttpClients;
    import org.apache.http.message.BasicHeader;
    import org.apache.http.util.EntityUtils;
    
    import com.google.gson.Gson;
    
    public class ClovaSpeechClient {
    
        // Clova Speech secret key
    	private static final String SECRET = "";
        // Clova Speech invoke URL
    	private static final String INVOKE_URL = "";
    
    	private CloseableHttpClient httpClient = HttpClients.createDefault();
    	private Gson gson = new Gson();
    
    	private static final Header[] HEADERS = new Header[] {
    		new BasicHeader("Accept", "application/json"),
    		new BasicHeader("X-CLOVASPEECH-API-KEY", SECRET),
    	};
    
        	public static class Boosting {
    		private String words;
    
    		public String getWords() {
    			return words;
    		}
    
    		public void setWords(String words) {
    			this.words = words;
    		}
    	}
    
    	public static class Diarization {
    		private Boolean enable = Boolean.FALSE;
    		private Integer speakerCountMin;
    		private Integer speakerCountMax;
    
    		public Boolean getEnable() {
    			return enable;
    		}
    
    		public void setEnable(Boolean enable) {
    			this.enable = enable;
    		}
    
    		public Integer getSpeakerCountMin() {
    			return speakerCountMin;
    		}
    
    		public void setSpeakerCountMin(Integer speakerCountMin) {
    			this.speakerCountMin = speakerCountMin;
    		}
    
    		public Integer getSpeakerCountMax() {
    			return speakerCountMax;
    		}
    
    		public void setSpeakerCountMax(Integer speakerCountMax) {
    			this.speakerCountMax = speakerCountMax;
    		}
    	}
    
        public static class Sed {
    		private Boolean enable = Boolean.FALSE;
    
    		public Boolean getEnable() {
    			return enable;
    		}
    
    		public void setEnable(Boolean enable) {
    			this.enable = enable;
    		}
    	}
    
    	public static class NestRequestEntity {
    		private String language = "ko-KR";
    		//completion optional, sync/async
    		private String completion = "sync";
    		//optional, used to receive the analyzed results
    		private String callback;
    		//optional, any data
    		private Map<String, Object> userdata;
    		private Boolean wordAlignment = Boolean.TRUE;
    		private Boolean fullText = Boolean.TRUE;
    		//boosting object array
    		private List<Boosting> boostings;
    		//comma separated words
    		private String forbiddens;
    		private Diarization diarization;
            private Sed sed;
    
            public Sed getSed() {
    			return sed;
    		}
    
    		public void setSed(Sed sed) {
    			this.sed = sed;
    		}
            
    		public String getLanguage() {
    			return language;
    		}
    
    		public void setLanguage(String language) {
    			this.language = language;
    		}
    
    		public String getCompletion() {
    			return completion;
    		}
    
    		public void setCompletion(String completion) {
    			this.completion = completion;
    		}
    
    		public String getCallback() {
    			return callback;
    		}
    
    		public Boolean getWordAlignment() {
    			return wordAlignment;
    		}
    
    		public void setWordAlignment(Boolean wordAlignment) {
    			this.wordAlignment = wordAlignment;
    		}
    
    		public Boolean getFullText() {
    			return fullText;
    		}
    
    		public void setFullText(Boolean fullText) {
    			this.fullText = fullText;
    		}
    
    		public void setCallback(String callback) {
    			this.callback = callback;
    		}
    
    		public Map<String, Object> getUserdata() {
    			return userdata;
    		}
    
    		public void setUserdata(Map<String, Object> userdata) {
    			this.userdata = userdata;
    		}
    
    		public String getForbiddens() {
    			return forbiddens;
    		}
    
    		public void setForbiddens(String forbiddens) {
    			this.forbiddens = forbiddens;
    		}
    
    		public List<Boosting> getBoostings() {
    			return boostings;
    		}
    
    		public void setBoostings(List<Boosting> boostings) {
    			this.boostings = boostings;
    		}
    
    		public Diarization getDiarization() {
    			return diarization;
    		}
    
    		public void setDiarization(Diarization diarization) {
    			this.diarization = diarization;
    		}
    	}
    
    	/**
    	 * recognize media using URL
    	 * @param url required, the media URL
    	 * @param nestRequestEntity optional
    	 * @return string
    	 */
    	public String url(String url, NestRequestEntity nestRequestEntity) {
    		HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/url");
    		httpPost.setHeaders(HEADERS);
    		Map<String, Object> body = new HashMap<>();
    		body.put("url", url);
    		body.put("language", nestRequestEntity.getLanguage());
    		body.put("completion", nestRequestEntity.getCompletion());
    		body.put("callback", nestRequestEntity.getCallback());
    		body.put("userdata", nestRequestEntity.getCallback());
    		body.put("wordAlignment", nestRequestEntity.getWordAlignment());
    		body.put("fullText", nestRequestEntity.getFullText());
    		body.put("forbiddens", nestRequestEntity.getForbiddens());
    		body.put("boostings", nestRequestEntity.getBoostings());
    		body.put("diarization", nestRequestEntity.getDiarization());
            body.put("sed", nestRequestEntity.getSed());
    		HttpEntity httpEntity = new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON);
    		httpPost.setEntity(httpEntity);
    		return execute(httpPost);
    	}
    
    	/**
    	 * recognize media using Object Storage
    	 * @param dataKey required, the Object Storage key
    	 * @param nestRequestEntity optional
    	 * @return string
    	 */
    	public String objectStorage(String dataKey, NestRequestEntity nestRequestEntity) {
    		HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/object-storage");
    		httpPost.setHeaders(HEADERS);
    		Map<String, Object> body = new HashMap<>();
    		body.put("dataKey", dataKey);
    		body.put("language", nestRequestEntity.getLanguage());
    		body.put("completion", nestRequestEntity.getCompletion());
    		body.put("callback", nestRequestEntity.getCallback());
    		body.put("userdata", nestRequestEntity.getCallback());
    		body.put("wordAlignment", nestRequestEntity.getWordAlignment());
    		body.put("fullText", nestRequestEntity.getFullText());
    		body.put("forbiddens", nestRequestEntity.getForbiddens());
    		body.put("boostings", nestRequestEntity.getBoostings());
    		body.put("diarization", nestRequestEntity.getDiarization());
            body.put("sed", nestRequestEntity.getSed());
    		StringEntity httpEntity = new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON);
    		httpPost.setEntity(httpEntity);
    		return execute(httpPost);
    	}
    
    	/**
    	 *
    	 * recognize media using a file
    	 * @param file required, the media file
    	 * @param nestRequestEntity optional
    	 * @return string
    	 */
    	public String upload(File file, NestRequestEntity nestRequestEntity) {
    		HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/upload");
    		httpPost.setHeaders(HEADERS);
    		HttpEntity httpEntity = MultipartEntityBuilder.create()
    			.addTextBody("params", gson.toJson(nestRequestEntity), ContentType.APPLICATION_JSON)
    			.addBinaryBody("media", file, ContentType.MULTIPART_FORM_DATA, file.getName())
    			.build();
    		httpPost.setEntity(httpEntity);
    		return execute(httpPost);
    	}
    
    	private String execute(HttpPost httpPost) {
    		try (final CloseableHttpResponse httpResponse = httpClient.execute(httpPost)) {
    			final HttpEntity entity = httpResponse.getEntity();
    			return EntityUtils.toString(entity, StandardCharsets.UTF_8);
    		} catch (Exception e) {
    			throw new RuntimeException(e);
    		}
    	}
    
    	public static void main(String[] args) {
    		final ClovaSpeechClient clovaSpeechClient = new ClovaSpeechClient();
    		NestRequestEntity requestEntity = new NestRequestEntity();
    		final String result =
    			clovaSpeechClient.upload(new File("/data/sample.mp4"), requestEntity);
    		//final String result = clovaSpeechClient.url("file URL", requestEntity);
    		//final String result = clovaSpeechClient.objectStorage("Object Storage key", requestEntity);
    		System.out.println(result);
    	}
    }
    

    Python

    import requests
    import json
    
    
    class ClovaSpeechClient:
        # Clova Speech invoke URL
        invoke_url = ''
        # Clova Speech secret key
        secret = ''
    
        def req_url(self, url, completion, callback=None, userdata=None, forbiddens=None, boostings=None, wordAlignment=True, fullText=True, diarization=None, sed=None):
            request_body = {
                'url': url,
                'language': 'ko-KR',
                'completion': completion,
                'callback': callback,
                'userdata': userdata,
                'wordAlignment': wordAlignment,
                'fullText': fullText,
                'forbiddens': forbiddens,
                'boostings': boostings,
                'diarization': diarization,
                'sed': sed,
            }
            headers = {
                'Accept': 'application/json;UTF-8',
                'Content-Type': 'application/json;UTF-8',
                'X-CLOVASPEECH-API-KEY': self.secret
            }
            return requests.post(headers=headers,
                                 url=self.invoke_url + '/recognizer/url',
                                 data=json.dumps(request_body).encode('UTF-8'))
    
        def req_object_storage(self, data_key, completion, callback=None, userdata=None, forbiddens=None, boostings=None,
                               wordAlignment=True, fullText=True, diarization=None, sed=None):
            request_body = {
                'dataKey': data_key,
                'language': 'ko-KR',
                'completion': completion,
                'callback': callback,
                'userdata': userdata,
                'wordAlignment': wordAlignment,
                'fullText': fullText,
                'forbiddens': forbiddens,
                'boostings': boostings,
                'diarization': diarization,
                'sed': sed,
            }
            headers = {
                'Accept': 'application/json;UTF-8',
                'Content-Type': 'application/json;UTF-8',
                'X-CLOVASPEECH-API-KEY': self.secret
            }
            return requests.post(headers=headers,
                                 url=self.invoke_url + '/recognizer/object-storage',
                                 data=json.dumps(request_body).encode('UTF-8'))
    
        def req_upload(self, file, completion, callback=None, userdata=None, forbiddens=None, boostings=None,
                       wordAlignment=True, fullText=True, diarization=None, sed=None):
            request_body = {
                'language': 'ko-KR',
                'completion': completion,
                'callback': callback,
                'userdata': userdata,
                'wordAlignment': wordAlignment,
                'fullText': fullText,
                'forbiddens': forbiddens,
                'boostings': boostings,
                'diarization': diarization,
                'sed': sed,
            }
            headers = {
                'Accept': 'application/json;UTF-8',
                'X-CLOVASPEECH-API-KEY': self.secret
            }
            print(json.dumps(request_body, ensure_ascii=False).encode('UTF-8'))
            files = {
                'media': open(file, 'rb'),
                'params': (None, json.dumps(request_body, ensure_ascii=False).encode('UTF-8'), 'application/json')
            }
            response = requests.post(headers=headers, url=self.invoke_url + '/recognizer/upload', files=files)
            return response
    
    if __name__ == '__main__':
        # res = ClovaSpeechClient().req_url(url='http://example.com/media.mp3', completion='sync')
        # res = ClovaSpeechClient().req_object_storage(data_key='data/media.mp3', completion='sync')
        res = ClovaSpeechClient().req_upload(file='/data/media.mp3', completion='sync')
        print(res.text)
    

    PHP

    <?php
    
    $secret = '';
    $invoke_url = '';
    
    function req_url($url, $completion, $callback, $userdata, $forbiddens, $boostings,
                     $wordAlignment, $fullText, $diarization, $sed)
    {
        $object = (object)[
            'language' => 'ko-KR',
            'completion' => $completion,
            'callback' => $callback,
            'url' => $url,
            'userdata' => $userdata,
            'forbiddens' => $forbiddens,
            'boostings' => $boostings,
            'wordAlignment' => $wordAlignment,
            'fullText' => $fullText,
            'diarization' => $diarization,
            'sed' => $sed,
        ];
        return execute('/recognizer/url', json_encode($object), array('Content-Type: application/json'));
    }
    
    function req_object_storage($dataKey, $completion, $callback, $userdata, $forbiddens, $boostings,
                                $wordAlignment, $fullText, $diarization, $sed)
    {
        $object = (object)[
            'language' => 'ko-KR',
            'completion' => $completion,
            'callback' => $callback,
            'dataKey' => $dataKey,
            'userdata' => $userdata,
            'forbiddens' => $forbiddens,
            'boostings' => $boostings,
            'wordAlignment' => $wordAlignment,
            'fullText' => $fullText,
            'diarization' => $diarization,
            'sed' => $sed,
        ];
        return execute('/recognizer/object-storage', json_encode($object), array('Content-Type: application/json'));
    }
    
    function req_upload($filePath, $completion, $callback, $userdata, $forbiddens, $boostings,
                        $wordAlignment, $fullText, $diarization, $sed)
    {
        $object = (object)[
            'language' => 'ko-KR',
            'completion' => $completion,
            'callback' => $callback,
            'userdata' => $userdata,
            'forbiddens' => $forbiddens,
            'boostings' => $boostings,
            'wordAlignment' => $wordAlignment,
            'fullText' => $fullText,
            'diarization' => $diarization,
            'sed' => $sed,
        ];
        $fields = array(
            'media' => new CURLFile($filePath),
            'params' => json_encode($object),
        );
        return execute('/recognizer/upload', $fields, null);
    }
    
    function execute($uri, $postFields, $customHeaders)
    {
        try {
            $ch = curl_init($GLOBALS['invoke_url'] . $uri);
            curl_setopt($ch, CURLOPT_POST, true);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
            curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
            curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
            curl_setopt($ch, CURLOPT_VERBOSE, true);
            curl_setopt($ch, CURLOPT_TIMEOUT, 600);
            $headers = array();
            $headers[] = 'X-CLOVASPEECH-API-KEY: ' . $GLOBALS['secret'];
            if (!is_null($customHeaders)) {
                $headers = array_merge($headers, $customHeaders);
            }
            curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
            $response = curl_exec($ch);
            $err = curl_error($ch);
            curl_close($ch);
            if ($err) {
                echo 'cURL Error #:' . $err;
                return $err;
            }
            return $response;
        } catch (Exception $E) {
            echo 'Response: ' . $E . '\n';
            return $E->lastResponse;
        }
    }
    
    //$response = req_url('https://example.com/sample.mp4', 'sync', null, null, null, null, null, null, null);
    //$response = req_object_storage('data/sample.mp4', 'sync', null, null, null, null, null, null, null);
    $response = req_upload('/data/sample.mp4', 'sync', null, null, null, null, null, null, null);
    echo $response;
    ?>
    

    C#

    using System;
    using System.Globalization;
    using System.Net.Http;
    using System.Net.Http.Headers;
    using System.Text.RegularExpressions;
    using System.Threading.Channels;
    using System.Threading.Tasks;
    using System.Text.Json;
    using System.Text.Json.Serialization;
    using System.Text;
    using System.Diagnostics;
    
    namespace HttpClientStatus
    {
        public class ClovaSpeechRequest
        {
            public string language { get; set; }
            public string completion { get; set; }
            // Other fields are omitted, please refer to: https://api.ncloud-docs.com/release-20230525/docs/en/ai-application-service-clovaspeech-clovaspeech for available fields
        }
        public class Program
        {
            private static readonly string secretKey = "";
            private static readonly string invokeUrl = "";
            public static async Task<string> Upload(ClovaSpeechRequest clovaSpeechRequest, string path)
            {
    
                using (var client = new HttpClient())
                {
                    var multiForm = new MultipartFormDataContent();
                    multiForm.Headers.Add("X-CLOVASPEECH-API-KEY", secretKey);
                    multiForm.Add(new StringContent(JsonSerializer.Serialize(clovaSpeechRequest)), "params");
                    FileStream fs = File.OpenRead(path);
                    Console.WriteLine(Path.GetFileName(path));
                    multiForm.Add(new StreamContent(fs), "media", Path.GetFileName(path));
                    var message = await client.PostAsync(invokeUrl+ "/recognizer/upload", multiForm);
                    return await message.Content.ReadAsStringAsync();
                }
            }
    
            static async Task Main(string[] args)
            {
                var clovaSpeechRequest = new ClovaSpeechRequest
                {
                    language = "ko-KR",
                    completion = "sync"
                };
    
                var result = await Upload(clovaSpeechRequest, @"D:\media\video\\sample.mp3");
                Console.WriteLine(result);
            }
        }
    }
    

    Error codes

    Error Response Body:

    {
      "result": "FAILED",
      "message": "File format is not supported.",
      "token": ''
    }
    
    ResultMessage
    SUCCEEDEDSucceeded
    PROCESSINGProcessing
    ERROR_SERVER_BUSYServer too busy
    ERROR_TOKEN_INVALIDToken does not exist
    ERROR_AUDIO_EMPTYAudio is empty
    ERROR_AUDIO_CONVERSIONAudio conversion has been failed
    ERROR_PARAMS_FORMAT_INVALIDParams must be JSON format
    ERROR_REQUEST_PARAMETERInvalid request parameters
    ERROR_REQUEST_PARAMETERSpeaker detect is off
    ERROR_INVALID_SECRETInvalid secret
    ERROR_DATA_NOT_FOUNDNot found
    ERROR_DATA_CONFLICTData conflict
    ERROR_INTERNAL_ERRORInternal Server Error
    ERROR_EXTERNAL_ERRORService Unavailable
    ERROR_TOO_MANY_JOBSToo many jobs
    ERROR_GATEWAY_TIMEOUTGateway timeout
    FAILEDOther errors

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.