- Print
- PDF
CLOVA Speech Long Text Recognition API
- Print
- PDF
The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.
Available in Classic and VPC
version
Version | Date | Changes |
---|---|---|
v1.0.0 | 2020.9.17. | Initial draft |
V1.1.0 | 2020.11.18. | Added boosting and forbidden keywords |
V1.2.0 | 2021.4.8. | Added speaker recognition function |
V1.3.0 | 2021.5.27. | Added English recognition function |
V1.4.0 | 2021.7.22. | Added Korean/English simultaneous recognition function |
V1.5.0 | 2021.11.25. | Asynchronous Mode supported |
V1.6.0 | 2022.2.17. | Added Japanese recognition function |
V1.7.0 | 2022.6.8. | domain boosting support |
V1.8.0 | 2022.10.20. | Added traditional and simplified Chinese recognition function |
V1.9.0 | 2022.12.15. | Added noise filtering function |
V2.0.0 | 2023.12.21. | Added event detection function |
Requests
Method | Request URI |
---|---|
POST | Calls with InvokeURL of API Gateway created in the CLOVA Speech domain. Creates a unique call URL for each domain. |
How to use CLOVA Speech API
You can select the CLOVA Speech API in one of the following three ways:
Request recognition with Object Storage file's url
: Use the unique url of the file saved in Object Storage. (The file to be recognized must be uploaded on Object Storage in advance.)
Request recognition with external url
: Use the unique url of a file accessible externally.
Request by uploading file from local storage
: Use the file system path.
After recognition request is made, response can be made in one of the two following ways:
sync
If request is made with sync, the response result (json) can be received when recognition is completed.
async
If request is made with async, the recognition result is returned to the Callback url entered for the request or to ResultToObs(ObjectStorage).
Callback url | resultToObs(ObjectStorage) | result |
---|---|---|
URL exists (O) | True | Result is returned both to the Callback url and Object Storage |
URL exists (O) | False | Result is returned to the Callback url only |
URL does not exist (X) | True | Result is returned to Object Storage only |
URL does not exist (X) | False | Error is returned |
1. Request recognition with Object Storage file's url
: Use the unique url of the file saved in Object Storage.
(The file to be recognized must be uploaded on Object Storage in advance.)
POST /recognizer/object-storage
- recognize media from object storage
Method | Request URI |
---|---|
POST | ${Invoke URL}/recognizer/object-storage |
Request headers
Header Name | Description |
---|---|
Content-Type | application/json |
Request bodies
name | desc | type | requirement | value | default |
---|---|---|---|---|---|
dataKey | Key to access the Object Storage path of the file to be recognized | string | required | ||
language | language | string | required | ko-KR, en-US, enko, ja, zh-cn, zh-tw | ko-KR |
completion | Select between sync and async | string | optional | async | |
callback | See the part on Callback | string | optional | ||
userdata | json object | object | optional | ||
wordAlignment | Output word alignment in the recognition result | boolean | optional | true | |
fullText | Output the entire recognition result in text | boolean | optional | true | |
resultToObs | Save the result in Object Storage selected while creating the domain | boolean | optional | false | |
noiseFiltering | Whether to enable noise filtering | boolean | optional | true | |
boostings | boosting object array | array | optional | ||
boostings.words | comma separated words | string | optional | ||
useDomainBoostings | use domain boostings | boolean | optional | false | |
forbiddens | comma separated words | string | optional | ||
diarization | Speaker recognition (diarization) setting | object | optional | ||
diarization.enable | Whether to enable speaker recognition (diarization) | boolean | optional | true | |
sed | event detect | object | optional | ||
sed.enable | event detect | boolean | optional | false |
Example (cURL shell)
curl --location --request POST '${Invoke URL}/recognizer/object-storage' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"language": "ko-KR",
"callback": "http://example/callback",
"userdata": {
"dataId": "1"
},
"boostings": [
{
"words": "comma separated words"
}
],
"forbiddens": "comma separated words",
"completion":"async",
"dataKey": "data/sample.wav"
}'
- Response: refer to Common Response
2. Request recognition with external url
Use the unique url of a file accessible externally.
POST /recognizer/url
- recognize media from URL
Method | Request URI |
---|---|
POST | ${Invoke URL}/recognizer/url |
Request headers
Header Name | Description |
---|---|
Content-Type | application/json |
Request bodies
name | desc | type | requirement | value | default |
---|---|---|---|---|---|
url | the media URL | string | required | ||
language | language | string | required | ko-KR, en-US, enko, ja, zh-cn, zh-tw | ko-KR |
completion | Select between sync and async | string | optional | async | |
callback | See the part on Callback | string | optional | ||
userdata | json object | object | optional | ||
wordAlignment | Output word alignment in the recognition result | boolean | optional | true | |
fullText | Output the entire recognition result in text | boolean | optional | true | |
resultToObs | Save the result in Object Storage selected while creating the domain | boolean | optional | false | |
noiseFiltering | Whether to enable noise filtering | boolean | optional | true | |
boostings | boosting object array | array | optional | ||
boostings.words | comma separated words | string | optional | ||
useDomainBoostings | use domain boostings | boolean | optional | false | |
forbiddens | comma separated words | string | optional | ||
diarization | Speaker recognition (diarization) setting | object | optional | ||
diarization.enable | Whether to enable speaker recognition (diarization) | boolean | optional | true | |
sed | event detect | object | optional | ||
sed.enable | event detect | boolean | optional | false |
- Keyword boosting
- You can include lists of keywords in the API request body to enhance recognition rate.
- This function refers to the
params.boostings
and ,params.boostings.words
fields in the request body. - You can enter up to 1000 keywords to boost.
- Only Korean and English are supported for boosting.
- One-syllable words, such as
네
, ,응
and ,no
, are not supported for boosting since they have the risk of being mis-recognized. - By default, all English letters in the recognition results are changed to lowercase, but if a request is made to boost uppercase keywords, lowercase letters are replaced with uppercase ones.
- Boosting ignores spacing.
For example, you only need to request boosting either for CLOVASpeech or CLOVA Speech. - There is no limit placed on keyword length, but if a phrase consisting of multiple words is boosted, nothing less than the exact phrase can benefit from the boosting. For example, if you boost the keyword "CLOVA Speech," all sentences including "CLOVA Speech" can benefit from the boosting. However, if you boost "Media voice recognition technology of CLOVA Speech," sentences that only include "CLOVA Speech" can hardly benefit from the boosting.
- Sensitive keyword detecting
- You can include in the API request body a list of keywords to hide in the recognition result.
- This function refers to the
params.forbiddens
field in the request body. - There is no limit placed on the number or lengths of sensitive keywords.
- Both spacing and capitalization must match exactly for a keyword to be detected.
Example (cURL shell)
curl --location --request POST '${Invoke URL}/recognizer/url' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"language": "ko-KR",
"callback": "http://example/callback",
"userdata": {
"dataId": "1"
},
"boostings": [
{
"words": "comma separated words"
}],
"forbiddens": "comma separated words",
"completion":"async",
"url": "https://kr.object.ncloudstorage.com/nest/data/IMG_3866.mp4"
}'
- Response: refer to Common Response
3. Request by uploading file from local storage
You can use the path in the local file system.
POST /recognizer/upload
- upload a media for recognize
Method | Request URI |
---|---|
POST | ${Invoke URL}/recognizer/upload |
Request headers
Header Name | Description |
---|---|
Content-Type | multipart/form-data |
Request bodies
name | desc | type | requirement | value | default |
---|---|---|---|---|---|
media | the media file | file | required | ||
params | object | required | |||
params.language | language | string | required | ko-KR, en-US, enko, ja, zh-cn, zh-tw | ko-KR |
params.completion | sync, async | string | optional | async | |
params.callback | refer to Callback | string | optional | ||
params.userdata | json object | object | optional | ||
params.wordAlignment | Output word alignment in the recognition result | boolean | optional | true | |
params.fullText | Output the entire recognition result in text | boolean | optional | true | |
params.resultToObs | Save the result in Object Storage selected while creating the domain | boolean | optional | false | |
params.noiseFiltering | Whether to enable noise filtering | boolean | optional | true | |
params.boostings | boosting object array | array | optional | ||
params.boostings.words | comma separated words | string | optional | ||
params.useDomainBoostings | use domain boostings | boolean | optional | false | |
params.forbiddens | comma separated words | string | optional | ||
params.diarization | Speaker recognition (diarization) setting | object | optional | ||
params.diarization.enable | Whether to enable speaker recognition (diarization) | boolean | optional | true | |
sed | Detect event | object | optional | ||
sed.enable | Whether to enable event detection | boolean | optional | false |
- Keyword boosting
- You can include lists of keywords in the API request body to enhance recognition rate.
- This function refers to the
params.boostings
and ,params.boostings.words
fields in the request body. - You can enter up to 1000 keywords to boost.
- Only Korean, English, Japanese and Chinese letters and numbers are supported for boosting.
- By default, all English letters in the recognition results are changed to lowercase, but if a request is made to boost uppercase keywords, lowercase letters are replaced with uppercase ones.
- Boosting ignores spacing.
For example, you only need to request boosting either for CLOVASpeech or CLOVA Speech. - There is no limit placed on keyword length, but if a phrase consisting of multiple words is boosted, nothing less than the exact phrase can benefit from the boosting. For example, if you boost the keyword "CLOVA Speech," all sentences including "CLOVA Speech" can benefit from the boosting. However, if you boost "Media voice recognition technology of CLOVA Speech," sentences that only include "CLOVA Speech" can hardly benefit from the boosting.
- Sensitive keyword detecting
- You can include in the API request body a list of keywords to hide in the recognition result.
- This function refers to the
params.forbiddens
field in the request body. - There is no limit placed on the number or lengths of sensitive keywords to be detected.
- Both spacing and capitalization must match exactly for a keyword to be detected.
Example (cURL shell)
curl --location --request POST '${Invoke URL}/recognizer/upload' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
--form 'media=@/video/sample.wav' \
--form 'params={"language":"ko-KR","completion":"sync","callback":"http://localhost:9010","forbiddens":"comma separated words","boostings":[{"words": "comma separated words"}]};type=application/json'
- Response: refer to Common Response
Responses
After recognition request is made, response can be made in one of the two following ways:
sync
If request is made with sync, the response result (json) can be received when recognition is completed.
async
If request is made with async, the recognition result is returned to the Callback url entered for the request or to ResultToObs(ObjectStorage).
Callback url | resultToObs(ObjectStorage) | result |
---|---|---|
URL exists (O) | True | Result is returned both to the Callback url and Object Storage |
URL exists (O) | False | Result is returned to the Callback url only |
URL does not exist (X) | True | Result is returned to Object Storage only |
URL does not exist (X) | False | Error is returned |
Callback
Request headers
Header Name | Description |
---|---|
Content-Type | application/application-json; charset=utf-8 |
Method
Method POST Body
- Same as Common Response(sync)
4. Get job status
GET /recognizer/{token}
- Get the status of async request
Method | Request URI |
---|---|
GET | ${Invoke URL}/recognizer/{token} |
Request headers
Header Name | Description |
---|---|
Content-Type | application/json |
Request bodies
name | desc | type | requirement | value | default |
---|---|---|---|---|---|
token | token | string | required |
Example (cURL shell)
curl --location --request GET '${Invoke URL}/recognizer/ceb77af3dae44a6c8c4de3dce519140a' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}'
- Response
{
"token": "ceb77af3dae44a6c8c4de3dce519140a",
"result": "PROCESSING"
}
result:
- WAITING
- PROCESSING
- FAILED
- COMPLETED
- TIMEOUT
Common Response
Response(async)
{ "token": "a951af6a1015466bae2c926177f26310", "result": "SUCCEEDED", "message": "Succeeded" }
Response(sync)
{ "result": "COMPLETED", "message": "Succeeded", "token": "d3bea166039e486abbb90e4a84c3b3a5", "version": "ncp_v2_v2.3.0-aa6cd8d-20231205_231211-3cf30bfc_v0.0.0_", "params": { "service": "ncp", "domain": "general", "lang": "enko", "completion": "sync", "callback": "", "diarization": { "enable": true, "speakerCountMin": -1, "speakerCountMax": -1 }, "sed": { "enable": true }, "boostings": [ { "words": "Hello, test" } ], "forbiddens": "", "wordAlignment": true, "fullText": true, "noiseFiltering": true, "resultToObs": false, "priority": 0, "userdata": { "_ncp_DomainCode": "NEST", "_ncp_DomainId": 1, "_ncp_TaskId": 55442, "_ncp_TraceId": "36a75ce98ec342d8a8c8fe9191cec343", "id": 1 } }, "progress": 100, "keywords": {}, "segments": [ { "start": 5870, "end": 8160, "text": "This is Seoul pool.", "confidence": 0.9626975, "diarization": { "label": "2" }, "speaker": { "label": "2", "name": "B", "edited": false }, "words": [ [ 5871, 6730, "This is" ], [ 6860, 7530, "Seoul pool." ] ], "textEdited": "This is Seoul pool." }, { "start": 8160, "end": 12950, "text": "How much is the entrance fee? It's 5000 won. Thank you.", "confidence": 0.8835926, "diarization": { "label": "1" }, "speaker": { "label": "1", "name": "A", "edited": false }, "words": [ [ 8161, 9220, "How much is" ], [ 9390, 10020, "the entrance fee?" ], [ 10410, 10640, "It's" ], [ 10710, 11140, "5000 won." ], [ 11910, 12500, "Thank you." ] ], "textEdited": "How much is the entrance fee? It's 5000 won. Thank you." } ], "text": "This is Seoul pool. How much is the entrance fee? It's 5000 won. Thank you.", "confidence": 0.9071357, "speakers": [ { "label": "1", "name": "A", "edited": false }, { "label": "2", "name": "B", "edited": false } ], "events": [ { "type": "music", "label": "music", "labelEdited": "music", "start": 1400, "end": 5000 } ], "eventTypes": [ "music" ] }
Body
field desc type result
Result code string message
Result message string token
Result token string version
Engine version string params
Parameter object params: service
Service code string params: domain
Domain string params: lang
Recognition language string params: completion
Request method string params: diarization
Speaker separation data object params: diarization.enable
Whether to use speaker separation boolean params: diarization.speakerCountMin
Minimum number of speakers number params: diarization.speakerCountMax
Maximum number of speakers number params: boostings
Boosting data array params: boostings: words
Boosting keyword string params: forbiddens
Sensitive keyword string params: fullText
Whether the entire recognition result is output in text boolean params: noiseFiltering
Whether noise filtering is enabled boolean params: resultToObs
Whether the result is saved in Object Storage boolean params: segment
Segment string params: morpheme
Morpheme string params: completion
Synchronous or asynchronous string params: userdata
User data object segments
Segment data array segments: start
Segment start time (ms) number segments: end
Segment end time (ms) number segments: text
Segment text string segments: textEdited
Edits made string segments: diarization
Recognized speaker object segments: diarization.label
Recognized speaker number string segments: speaker
Replaced speaker object segments: speaker.label
Replaced speaker number string segments: speaker.name
Replaced speaker name string segments: confidence
Segment confidence (0.0-1.0) number segments: words
Word segment array segments: words: [0]
Word segment start time (ms) number segments: words: [1]
Word segment end time (ms) number segments: words: [2]
Word segment text string text
Entire text string confidence
Total confidence number events
Event array events.type
Event type string events.label
Event name string events.labelEdited
Changed event name string events.start
Event start time number events.end
Event end time number
Examples
Java
dependency
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.12</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpmime</artifactId>
<version>4.3.1</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.5</version>
</dependency>
ClovaSpeechClient
package org.example.clovaspeech.client;
import java.io.File;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicHeader;
import org.apache.http.util.EntityUtils;
import com.google.gson.Gson;
public class ClovaSpeechClient {
// Clova Speech secret key
private static final String SECRET = "";
// Clova Speech invoke URL
private static final String INVOKE_URL = "";
private CloseableHttpClient httpClient = HttpClients.createDefault();
private Gson gson = new Gson();
private static final Header[] HEADERS = new Header[] {
new BasicHeader("Accept", "application/json"),
new BasicHeader("X-CLOVASPEECH-API-KEY", SECRET),
};
public static class Boosting {
private String words;
public String getWords() {
return words;
}
public void setWords(String words) {
this.words = words;
}
}
public static class Diarization {
private Boolean enable = Boolean.FALSE;
private Integer speakerCountMin;
private Integer speakerCountMax;
public Boolean getEnable() {
return enable;
}
public void setEnable(Boolean enable) {
this.enable = enable;
}
public Integer getSpeakerCountMin() {
return speakerCountMin;
}
public void setSpeakerCountMin(Integer speakerCountMin) {
this.speakerCountMin = speakerCountMin;
}
public Integer getSpeakerCountMax() {
return speakerCountMax;
}
public void setSpeakerCountMax(Integer speakerCountMax) {
this.speakerCountMax = speakerCountMax;
}
}
public static class Sed {
private Boolean enable = Boolean.FALSE;
public Boolean getEnable() {
return enable;
}
public void setEnable(Boolean enable) {
this.enable = enable;
}
}
public static class NestRequestEntity {
private String language = "ko-KR";
//completion optional, sync/async
private String completion = "sync";
//optional, used to receive the analyzed results
private String callback;
//optional, any data
private Map<String, Object> userdata;
private Boolean wordAlignment = Boolean.TRUE;
private Boolean fullText = Boolean.TRUE;
//boosting object array
private List<Boosting> boostings;
//comma separated words
private String forbiddens;
private Diarization diarization;
private Sed sed;
public Sed getSed() {
return sed;
}
public void setSed(Sed sed) {
this.sed = sed;
}
public String getLanguage() {
return language;
}
public void setLanguage(String language) {
this.language = language;
}
public String getCompletion() {
return completion;
}
public void setCompletion(String completion) {
this.completion = completion;
}
public String getCallback() {
return callback;
}
public Boolean getWordAlignment() {
return wordAlignment;
}
public void setWordAlignment(Boolean wordAlignment) {
this.wordAlignment = wordAlignment;
}
public Boolean getFullText() {
return fullText;
}
public void setFullText(Boolean fullText) {
this.fullText = fullText;
}
public void setCallback(String callback) {
this.callback = callback;
}
public Map<String, Object> getUserdata() {
return userdata;
}
public void setUserdata(Map<String, Object> userdata) {
this.userdata = userdata;
}
public String getForbiddens() {
return forbiddens;
}
public void setForbiddens(String forbiddens) {
this.forbiddens = forbiddens;
}
public List<Boosting> getBoostings() {
return boostings;
}
public void setBoostings(List<Boosting> boostings) {
this.boostings = boostings;
}
public Diarization getDiarization() {
return diarization;
}
public void setDiarization(Diarization diarization) {
this.diarization = diarization;
}
}
/**
* recognize media using URL
* @param url required, the media URL
* @param nestRequestEntity optional
* @return string
*/
public String url(String url, NestRequestEntity nestRequestEntity) {
HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/url");
httpPost.setHeaders(HEADERS);
Map<String, Object> body = new HashMap<>();
body.put("url", url);
body.put("language", nestRequestEntity.getLanguage());
body.put("completion", nestRequestEntity.getCompletion());
body.put("callback", nestRequestEntity.getCallback());
body.put("userdata", nestRequestEntity.getCallback());
body.put("wordAlignment", nestRequestEntity.getWordAlignment());
body.put("fullText", nestRequestEntity.getFullText());
body.put("forbiddens", nestRequestEntity.getForbiddens());
body.put("boostings", nestRequestEntity.getBoostings());
body.put("diarization", nestRequestEntity.getDiarization());
body.put("sed", nestRequestEntity.getSed());
HttpEntity httpEntity = new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON);
httpPost.setEntity(httpEntity);
return execute(httpPost);
}
/**
* recognize media using Object Storage
* @param dataKey required, the Object Storage key
* @param nestRequestEntity optional
* @return string
*/
public String objectStorage(String dataKey, NestRequestEntity nestRequestEntity) {
HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/object-storage");
httpPost.setHeaders(HEADERS);
Map<String, Object> body = new HashMap<>();
body.put("dataKey", dataKey);
body.put("language", nestRequestEntity.getLanguage());
body.put("completion", nestRequestEntity.getCompletion());
body.put("callback", nestRequestEntity.getCallback());
body.put("userdata", nestRequestEntity.getCallback());
body.put("wordAlignment", nestRequestEntity.getWordAlignment());
body.put("fullText", nestRequestEntity.getFullText());
body.put("forbiddens", nestRequestEntity.getForbiddens());
body.put("boostings", nestRequestEntity.getBoostings());
body.put("diarization", nestRequestEntity.getDiarization());
body.put("sed", nestRequestEntity.getSed());
StringEntity httpEntity = new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON);
httpPost.setEntity(httpEntity);
return execute(httpPost);
}
/**
*
* recognize media using a file
* @param file required, the media file
* @param nestRequestEntity optional
* @return string
*/
public String upload(File file, NestRequestEntity nestRequestEntity) {
HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/upload");
httpPost.setHeaders(HEADERS);
HttpEntity httpEntity = MultipartEntityBuilder.create()
.addTextBody("params", gson.toJson(nestRequestEntity), ContentType.APPLICATION_JSON)
.addBinaryBody("media", file, ContentType.MULTIPART_FORM_DATA, file.getName())
.build();
httpPost.setEntity(httpEntity);
return execute(httpPost);
}
private String execute(HttpPost httpPost) {
try (final CloseableHttpResponse httpResponse = httpClient.execute(httpPost)) {
final HttpEntity entity = httpResponse.getEntity();
return EntityUtils.toString(entity, StandardCharsets.UTF_8);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static void main(String[] args) {
final ClovaSpeechClient clovaSpeechClient = new ClovaSpeechClient();
NestRequestEntity requestEntity = new NestRequestEntity();
final String result =
clovaSpeechClient.upload(new File("/data/sample.mp4"), requestEntity);
//final String result = clovaSpeechClient.url("file URL", requestEntity);
//final String result = clovaSpeechClient.objectStorage("Object Storage key", requestEntity);
System.out.println(result);
}
}
Python
import requests
import json
class ClovaSpeechClient:
# Clova Speech invoke URL
invoke_url = ''
# Clova Speech secret key
secret = ''
def req_url(self, url, completion, callback=None, userdata=None, forbiddens=None, boostings=None, wordAlignment=True, fullText=True, diarization=None, sed=None):
request_body = {
'url': url,
'language': 'ko-KR',
'completion': completion,
'callback': callback,
'userdata': userdata,
'wordAlignment': wordAlignment,
'fullText': fullText,
'forbiddens': forbiddens,
'boostings': boostings,
'diarization': diarization,
'sed': sed,
}
headers = {
'Accept': 'application/json;UTF-8',
'Content-Type': 'application/json;UTF-8',
'X-CLOVASPEECH-API-KEY': self.secret
}
return requests.post(headers=headers,
url=self.invoke_url + '/recognizer/url',
data=json.dumps(request_body).encode('UTF-8'))
def req_object_storage(self, data_key, completion, callback=None, userdata=None, forbiddens=None, boostings=None,
wordAlignment=True, fullText=True, diarization=None, sed=None):
request_body = {
'dataKey': data_key,
'language': 'ko-KR',
'completion': completion,
'callback': callback,
'userdata': userdata,
'wordAlignment': wordAlignment,
'fullText': fullText,
'forbiddens': forbiddens,
'boostings': boostings,
'diarization': diarization,
'sed': sed,
}
headers = {
'Accept': 'application/json;UTF-8',
'Content-Type': 'application/json;UTF-8',
'X-CLOVASPEECH-API-KEY': self.secret
}
return requests.post(headers=headers,
url=self.invoke_url + '/recognizer/object-storage',
data=json.dumps(request_body).encode('UTF-8'))
def req_upload(self, file, completion, callback=None, userdata=None, forbiddens=None, boostings=None,
wordAlignment=True, fullText=True, diarization=None, sed=None):
request_body = {
'language': 'ko-KR',
'completion': completion,
'callback': callback,
'userdata': userdata,
'wordAlignment': wordAlignment,
'fullText': fullText,
'forbiddens': forbiddens,
'boostings': boostings,
'diarization': diarization,
'sed': sed,
}
headers = {
'Accept': 'application/json;UTF-8',
'X-CLOVASPEECH-API-KEY': self.secret
}
print(json.dumps(request_body, ensure_ascii=False).encode('UTF-8'))
files = {
'media': open(file, 'rb'),
'params': (None, json.dumps(request_body, ensure_ascii=False).encode('UTF-8'), 'application/json')
}
response = requests.post(headers=headers, url=self.invoke_url + '/recognizer/upload', files=files)
return response
if __name__ == '__main__':
# res = ClovaSpeechClient().req_url(url='http://example.com/media.mp3', completion='sync')
# res = ClovaSpeechClient().req_object_storage(data_key='data/media.mp3', completion='sync')
res = ClovaSpeechClient().req_upload(file='/data/media.mp3', completion='sync')
print(res.text)
PHP
<?php
$secret = '';
$invoke_url = '';
function req_url($url, $completion, $callback, $userdata, $forbiddens, $boostings,
$wordAlignment, $fullText, $diarization, $sed)
{
$object = (object)[
'language' => 'ko-KR',
'completion' => $completion,
'callback' => $callback,
'url' => $url,
'userdata' => $userdata,
'forbiddens' => $forbiddens,
'boostings' => $boostings,
'wordAlignment' => $wordAlignment,
'fullText' => $fullText,
'diarization' => $diarization,
'sed' => $sed,
];
return execute('/recognizer/url', json_encode($object), array('Content-Type: application/json'));
}
function req_object_storage($dataKey, $completion, $callback, $userdata, $forbiddens, $boostings,
$wordAlignment, $fullText, $diarization, $sed)
{
$object = (object)[
'language' => 'ko-KR',
'completion' => $completion,
'callback' => $callback,
'dataKey' => $dataKey,
'userdata' => $userdata,
'forbiddens' => $forbiddens,
'boostings' => $boostings,
'wordAlignment' => $wordAlignment,
'fullText' => $fullText,
'diarization' => $diarization,
'sed' => $sed,
];
return execute('/recognizer/object-storage', json_encode($object), array('Content-Type: application/json'));
}
function req_upload($filePath, $completion, $callback, $userdata, $forbiddens, $boostings,
$wordAlignment, $fullText, $diarization, $sed)
{
$object = (object)[
'language' => 'ko-KR',
'completion' => $completion,
'callback' => $callback,
'userdata' => $userdata,
'forbiddens' => $forbiddens,
'boostings' => $boostings,
'wordAlignment' => $wordAlignment,
'fullText' => $fullText,
'diarization' => $diarization,
'sed' => $sed,
];
$fields = array(
'media' => new CURLFile($filePath),
'params' => json_encode($object),
);
return execute('/recognizer/upload', $fields, null);
}
function execute($uri, $postFields, $customHeaders)
{
try {
$ch = curl_init($GLOBALS['invoke_url'] . $uri);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 600);
$headers = array();
$headers[] = 'X-CLOVASPEECH-API-KEY: ' . $GLOBALS['secret'];
if (!is_null($customHeaders)) {
$headers = array_merge($headers, $customHeaders);
}
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
$err = curl_error($ch);
curl_close($ch);
if ($err) {
echo 'cURL Error #:' . $err;
return $err;
}
return $response;
} catch (Exception $E) {
echo 'Response: ' . $E . '\n';
return $E->lastResponse;
}
}
//$response = req_url('https://example.com/sample.mp4', 'sync', null, null, null, null, null, null, null);
//$response = req_object_storage('data/sample.mp4', 'sync', null, null, null, null, null, null, null);
$response = req_upload('/data/sample.mp4', 'sync', null, null, null, null, null, null, null);
echo $response;
?>
C#
using System;
using System.Globalization;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text.RegularExpressions;
using System.Threading.Channels;
using System.Threading.Tasks;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Text;
using System.Diagnostics;
namespace HttpClientStatus
{
public class ClovaSpeechRequest
{
public string language { get; set; }
public string completion { get; set; }
// Other fields are omitted, please refer to: https://api.ncloud-docs.com/release-20230525/docs/en/ai-application-service-clovaspeech-clovaspeech for available fields
}
public class Program
{
private static readonly string secretKey = "";
private static readonly string invokeUrl = "";
public static async Task<string> Upload(ClovaSpeechRequest clovaSpeechRequest, string path)
{
using (var client = new HttpClient())
{
var multiForm = new MultipartFormDataContent();
multiForm.Headers.Add("X-CLOVASPEECH-API-KEY", secretKey);
multiForm.Add(new StringContent(JsonSerializer.Serialize(clovaSpeechRequest)), "params");
FileStream fs = File.OpenRead(path);
Console.WriteLine(Path.GetFileName(path));
multiForm.Add(new StreamContent(fs), "media", Path.GetFileName(path));
var message = await client.PostAsync(invokeUrl+ "/recognizer/upload", multiForm);
return await message.Content.ReadAsStringAsync();
}
}
static async Task Main(string[] args)
{
var clovaSpeechRequest = new ClovaSpeechRequest
{
language = "ko-KR",
completion = "sync"
};
var result = await Upload(clovaSpeechRequest, @"D:\media\video\\sample.mp3");
Console.WriteLine(result);
}
}
}
Error codes
Error Response Body:
{
"result": "FAILED",
"message": "File format is not supported.",
"token": ''
}
Result | Message |
---|---|
SUCCEEDED | Succeeded |
PROCESSING | Processing |
ERROR_SERVER_BUSY | Server too busy |
ERROR_TOKEN_INVALID | Token does not exist |
ERROR_AUDIO_EMPTY | Audio is empty |
ERROR_AUDIO_CONVERSION | Audio conversion has been failed |
ERROR_PARAMS_FORMAT_INVALID | Params must be JSON format |
ERROR_REQUEST_PARAMETER | Invalid request parameters |
ERROR_REQUEST_PARAMETER | Speaker detect is off |
ERROR_INVALID_SECRET | Invalid secret |
ERROR_DATA_NOT_FOUND | Not found |
ERROR_DATA_CONFLICT | Data conflict |
ERROR_INTERNAL_ERROR | Internal Server Error |
ERROR_EXTERNAL_ERROR | Service Unavailable |
ERROR_TOO_MANY_JOBS | Too many jobs |
ERROR_GATEWAY_TIMEOUT | Gateway timeout |
FAILED | Other errors |