- Print
- PDF
External file recognition
- Print
- PDF
Available in Classic and VPC
Call the URL of an externally accessible audio/video file to recognize and convert it to text.
Request
The following describes the request format for the endpoint. The request format is as follows:
Method | URI |
---|---|
POST | /recognizer/url |
Request headers
For headers common to all CLOVA Speech APIs, see Common CLOVA Speech headers.
Request body
The following describes the request body.
Field | Type | Required | Description |
---|---|---|---|
url | String | Required | Audio/video file URL |
language | String | Required | Text recognition language
|
completion | String | Optional | Response method after recognition request
|
callback | String | Conditional | Callback URL
|
userdata | Object | Optional | User data details |
wordAlignment | Boolean | Optional | Whether to output speech and text alignment of recognition results
|
fullText | Boolean | Optional | Whether to output full recognition result text
|
resultToObs | Boolean | Conditional | Whether to save results in Object Storage
|
noiseFiltering | Boolean | Optional | Noise filtering
|
boostings | Array | Optional | Keyword boosting details
|
useDomainBoostings | Boolean | Optional | Whether to use domain boosting
|
forbiddens | String | Optional | Sensitive keywords
|
diarization | Object | Optional | Detailed settings for speaker recognition |
diarization.enable | Boolean | Optional | Whether to recognize speaker
|
sed | Object | Optional | Event detection result details |
sed.enable | Boolean | Optional | Whether to detect events
|
format | String | Optional | Response result return format
|
boostings
The following describes boostings
.
Field | Type | Required | Description |
---|---|---|---|
words | String | Optional | List of words to keyword boost |
When requesting completion
(request-and-response method) as async
, the recognition result is returned as follows depending on whether there is a callback URL address or resultToObs(ObjectStorage) entered.
Callback URL | resultToObs(ObjectStorage) | Result |
---|---|---|
URL address exists | True | Return results to both callback URL and Object Storage |
URL address exists | False | Return results only to the callback URL |
URL address doesn't exist | True | Return results only to Object Storage |
URL address doesn't exist | False | Return an error |
Request example
The following is a sample request.
curl --location --request POST 'https://clovaspeech-gw.ncloud.com/external/v1/8881/5f7e1b4c866f1c605946c9236f9aa8************/recognizer/url' \
--header 'Content-Type: application/json' \
--header 'X-CLOVASPEECH-API-KEY: {Secret key issued when registering the app}' \
--data '{
"language": "ko-KR",
"completion":"async",
"url": "{url}",
"resultToObs" : true
}'
Response
The following describes the response format.
Response body
The following describes the response body.
Field | Type | Required | Description |
---|---|---|---|
result | String | - | Response code |
message | String | - | Response message |
token | String | - | Result token |
version | String | - | Engine version |
params | Object | - | Parameter details |
params.service | String | - | Service code |
params.domain | String | - | Domain type
|
params.lang | String | - | Recognition language
|
params.completion | String | - | Request format
|
params.callback | String | - | Callback URL |
params.diarization | Object | - | Speaker recognition (separation) details |
params.diarization.enable | Boolean | - | Whether to recognize (separate) speaker
|
params.diarization.speakerCountMin | Integer | - | Minimum number of speakers |
params.diarization.speakerCountMax | Integer | - | Maximum number of speakers |
params.sed | Object | - | Event detection result |
params.sed.enable | Boolean | - | Whether to detect events
|
params.boostings | Array | - | Keyword boosting details
|
params.forbiddens | String | - | Sensitive keywords
|
params.wordAlignment | Boolean | Optional | Whether to output speech and text alignment of recognition results
|
params.fullText | Boolean | - | Whether to output full recognition result text
|
params.noiseFiltering | Boolean | - | Noise filtering
|
params.resultToObs | Boolean | - | Whether to save results in Object Storage
|
params.priority | Integer | - | Priority
|
params.userdata | Object | - | User data details |
params.userdata._ncp_DomainCode | String | - | Domain code
|
params.userdata._ncp_DomainId | Integer | - | Domain ID |
params.userdata._ncp_TaskId | Integer | - | Task ID
|
params.userdata._ncp_TraceId | String | - | Trace ID
|
progress | Integer | - | Recognition progress |
segments | Array | - | segments details |
text | String | - | Overall text |
confidence | Double | - | Overall accuracy |
speakers | Array | - | All speaker details |
events | Array | - | Event details |
eventTypes | Array | - | Details of all recognized events |
params.boostings
The following describes params.boostings
.
Field | Type | Required | Description |
---|---|---|---|
words | String | - | List of words to keyword boost |
segments
The following describes segments
.
Field | Type | Required | Description |
---|---|---|---|
start | Long | - | Analysis start time (ms) |
end | Long | - | Analysis end time (ms) |
text | String | - | Analyzed text |
confidence | Double | - | Analysis accuracy
|
diarization | Object | - | Recognized speaker details |
diarization.label | String | - | Recognized speaker's number |
speaker | Object | - | Changed speaker's details |
speaker.label | String | - | Changed speaker's number |
speaker.name | String | - | Changed speaker's name |
speaker.edited | Boolean | - | Whether speaker is changed
|
words | Array<Long, Long, String> | - | List of recognized words |
words.[0] | Long | - | Segment start time (ms) |
words.[1] | Long | - | Segment end time (ms) |
words.[2] | String | - | Segment text |
textEdited | String | - | Modification details |
speakers
The following describes speakers
.
Field | Type | Required | Description |
---|---|---|---|
label | String | - | Numbers of all speakers |
name | String | - | Names of all speakers |
edited | Boolean | - | Whether speaker is changed
|
events
The following describes events
.
Field | Type | Required | Description |
---|---|---|---|
type | String | - | Event type |
label | String | - | Event name |
labelEdited | String | - | Event change name |
start | Long | - | Event start time |
end | Long | - | Event end time |
eventTypes
The following describes eventTypes
.
Field | Type | Required | Description |
---|---|---|---|
label | String | - | Recognized event |
Response status codes
For response status codes common to all CLOVA Speech APIs, see Common CLOVA Speech response status codes.
Response example
The following is a sample example.
Request with async
and return in JSON
The following is a sample response requested with async
and returned in JSON format.
{
"token": "*****f6a1015466bae2c926177f26310",
"result": "SUCCEEDED",
"message": "Succeeded"
}
Request with sync
and return in JSON
The following is a sample response requested with sync
and returned in JSON format.
{
"result": "COMPLETED",
"message": "Succeeded",
"token": "*****166039e486abbb90e4a84c3b3a5",
"version": "ncp_v2_v2.3.0-aa6cd8d-20231205_231211-3cf30bfc_v0.0.0_",
"params": {
"service": "ncp",
"domain": "general",
"lang": "enko",
"completion": "sync",
"callback": "",
"diarization": {
"enable": true,
"speakerCountMin": -1,
"speakerCountMax": -1
},
"sed": {
"enable": true
},
"boostings": [
{
"words": "Hello, test"
}
],
"forbiddens": "",
"wordAlignment": true,
"fullText": true,
"noiseFiltering": true,
"resultToObs": false,
"priority": 0,
"userdata": {
"_ncp_DomainCode": "NEST",
"_ncp_DomainId": 1,
"_ncp_TaskId": **442,
"_ncp_TraceId": "*****ce98ec342d8a8c8fe9191cec343",
"id": 1
}
},
"progress": 100,
"keywords": {},
"segments": [
{
"start": 5870,
"end": 8160,
"text": "This is the Seoul swimming pool.",
"confidence": 0.9626975,
"diarization": {
"label": "2"
},
"speaker": {
"label": "2",
"name": "B",
"edited": false
},
"words": [
[
5871,
6730,
"This is the Seoul"
],
[
6860,
7530,
"swimming pool."
]
],
"textEdited": "This is the Seoul swimming pool."
},
{
"start": 8160,
"end": 12950,
"text": "How much is the entry fee? It's 5000 KRW. Thank you.",
"confidence": 0.8835926,
"diarization": {
"label": "1"
},
"speaker": {
"label": "1",
"name": "A",
"edited": false
},
"words": [
[
8161,
9220,
"How much is"
],
[
9390,
10020,
"the entry fee?"
],
[
10410,
10640,
"It's 5000"
],
[
10710,
11140,
"KRW."
],
[
11910,
12500,
"Thank you."
]
],
"textEdited": "How much is the entry fee? It's 5000 KRW. Thank you."
}
],
"text": "This is the Seoul swimming pool. How much is the entry fee? It's 5000 KRW. Thank you.",
"confidence": 0.9071357,
"speakers": [
{
"label": "1",
"name": "A",
"edited": false
},
{
"label": "2",
"name": "B",
"edited": false
}
],
"events": [
{
"type": "music",
"label": "music",
"labelEdited": "music",
"start": 1400,
"end": 5000
}
],
"eventTypes": [
"music"
]
}
Request with sync
and return in SRT
The following is a sample response requested with sync
and returned in SRT format.
1
00:00:00,000 --> 00:00:01,425
A: Not long ago,
2
00:00:02,533 --> 00:00:11,550
A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
3
00:00:11,550 --> 00:00:19,025
A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
4
00:00:19,025 --> 00:00:26,317
C: You thought of saccharin, a bit. You had it super sweet.
5
00:00:26,317 --> 00:00:28,240
A: Is it corn?
6
00:00:28,240 --> 00:00:35,318
B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
7
00:00:35,318 --> 00:00:42,800
A: No, Chodang corn meant super sweet. No one has understood right now.
Request with sync
and return in SMI
The following is a sample response requested with sync
and returned in SMI format.
<SAMI>
<Body>
<SYNC Start=0>
<P>A: Not long ago,
<SYNC Start=2533>
<P>A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
<SYNC Start=11550>
<P>A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
<SYNC Start=19025>
<P>C: You thought of saccharin, a bit. You had it super sweet.
<SYNC Start=26317>
<P>A: Is it corn?
<SYNC Start=28240>
<P>B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
<SYNC Start=35318>
<P>A: No, Chodang corn meant super sweet. No one has understood right now.
</Body>
</SAMI>