- Print
- PDF
Object Storage file recognition
- Print
- PDF
Available in Classic and VPC
Call the unique URL of an audio/video file uploaded to Object Storage on NAVER Cloud Platform to recognize and convert it to text.
Request
The following describes the request format for the endpoint. The request format is as follows:
Method | URI |
---|---|
POST | /recognizer/object-storage |
Request headers
For headers common to all CLOVA Speech APIs, see Common CLOVA Speech headers.
Request body
The following describes the request body.
Field | Type | Required | Description |
---|---|---|---|
datakey | String | Required | Path to Object Storage where the audio/video file is stored
|
language | String | Required | Text recognition language
|
completion | String | Optional | Response method after recognition request
|
callback | String | Conditional | Callback URL
|
userdata | Object | Optional | User data details |
wordAlignment | Boolean | Optional | Whether to output speech and text alignment of recognition results
|
fullText | Boolean | Optional | Whether to output full recognition result text
|
resultToObs | Boolean | Conditional | Whether to save results in Object Storage
|
noiseFiltering | Boolean | Optional | Noise filtering
|
boostings | Array | Optional | Keyword boosting details
|
useDomainBoostings | Boolean | Optional | Whether to use domain boosting
|
forbiddens | String | Optional | Sensitive keywords
|
diarization | Object | Optional | Detailed settings for speaker recognition |
diarization.enable | Boolean | Optional | Whether to recognize speaker
|
sed | Object | Optional | Event detection result details |
sed.enable | Boolean | Optional | Whether to detect events
|
format | String | Optional | Response result return format
|
boostings
The following describes boostings
.
Field | Type | Required | Description |
---|---|---|---|
words | String | Optional | List of words to keyword boost |
When requesting completion
(request-and-response method) as async
, the recognition result is returned as follows depending on whether there is a callback URL address or resultToObs(ObjectStorage) entered.
Callback URL | resultToObs(ObjectStorage) | Result |
---|---|---|
URL address exists | True | Return results to both callback URL and Object Storage |
URL address exists | False | Return results only to the callback URL |
URL address doesn't exist | True | Return results only to Object Storage |
URL address doesn't exist | False | Return an error |
Request example
The following is a sample request.
curl --location --request POST 'https://clovaspeech-gw.ncloud.com/external/v1/88**/5f7e1b4c866f1c60594****************/recognizer/object-storage' \
--header 'Content-Type: application/json' \
--header 'X-CLOVASPEECH-API-KEY: {Secret key issued when registering the app}' \
--data '{
"dataKey": "{file}.mp3",
"language": "ko-KR",
"completion":"sync",
"callback": "",
"fullText": true,
"boostings": [
{
"words": "comma separated words"
}
],
"forbiddens": "comma separated words"
}'
Response
The following describes the response format.
Response body
The following describes the response body.
Field | Type | Required | Description |
---|---|---|---|
result | String | - | Response code |
message | String | - | Response message |
token | String | - | Result token |
version | String | - | Engine version |
params | Object | - | Parameter details |
params.service | String | - | Service code |
params.domain | String | - | Domain type
|
params.lang | String | - | Recognition language
|
params.completion | String | - | Request format
|
params.callback | String | - | Callback URL |
params.diarization | Object | - | Speaker recognition details |
params.diarization.enable | Boolean | - | Whether to recognize speaker
|
params.diarization.speakerCountMin | Integer | - | Minimum number of speakers |
params.diarization.speakerCountMax | Integer | - | Maximum number of speakers |
params.sed | Object | - | Event detection result |
params.sed.enable | Boolean | - | Whether to detect events
|
params.boostings | Array | - | Keyword boosting details
|
params.forbiddens | String | - | Sensitive keywords
|
params.wordAlignment | Boolean | Optional | Whether to output speech and text alignment of recognition results
|
params.fullText | Boolean | - | Whether to output full recognition result text
|
params.noiseFiltering | Boolean | - | Noise filtering
|
params.resultToObs | Boolean | - | Whether to save results in Object Storage
|
params.priority | Integer | - | Priority
|
params.userdata | Object | - | User data details |
params.userdata._ncp_DomainCode | String | - | Domain code
|
params.userdata._ncp_DomainId | Integer | - | Domain ID |
params.userdata._ncp_TaskId | Integer | - | Task ID
|
params.userdata._ncp_TraceId | String | - | Trace ID
|
progress | Integer | - | Recognition progress |
segments | Array | - | segments details |
text | String | - | Overall text |
confidence | Double | - | Overall accuracy |
speakers | Array | - | All speaker details |
events | Array | - | Event details |
eventTypes | Array | - | Details of all recognized events |
params.boostings
The following describes params.boostings
.
Field | Type | Required | Description |
---|---|---|---|
words | String | - | List of words to keyword boost |
segments
The following describes segments
.
Field | Type | Required | Description |
---|---|---|---|
start | Long | - | Analysis start time (ms) |
end | Long | - | Analysis end time (ms) |
text | String | - | Analyzed text |
confidence | Double | - | Analysis accuracy
|
diarization | Object | - | Recognized speaker details |
diarization.label | String | - | Recognized speaker's number |
speaker | Object | - | Changed speaker's details |
speaker.label | String | - | Changed speaker's number |
speaker.name | String | - | Changed speaker's name |
speaker.edited | Boolean | - | Whether speaker is changed
|
words | Array<Long, Long, String> | - | List of recognized words |
words.[0] | Long | - | Segment start time (ms) |
words.[1] | Long | - | Segment end time (ms) |
words.[2] | String | - | Segment text |
textEdited | String | - | Edited content |
speakers
The following describes speakers
.
Field | Type | Required | Description |
---|---|---|---|
label | String | - | Numbers of all speakers |
name | String | - | Names of all speakers |
edited | Boolean | - | Whether speaker is changed
|
events
The following describes events
.
Field | Type | Required | Description |
---|---|---|---|
type | String | - | Event type |
label | String | - | Event name |
labelEdited | String | - | Event change name |
start | Long | - | Event start time |
end | Long | - | Event end time |
eventTypes
The following describes eventTypes
.
Field | Type | Required | Description |
---|---|---|---|
label | String | - | Recognized event |
Response status codes
For response status codes common to all CLOVA Speech APIs, see Common CLOVA Speech response status codes.
Response example
The following is a sample example.
Request with async
and return in JSON
The following is a sample response requested with async
and returned in JSON format.
{
"token": "{token}",
"result": "SUCCEEDED",
"message": "Succeeded"
}
Request with sync
and return in JSON
The following is a sample response requested with sync
and returned in JSON format.
{
"result": "COMPLETED",
"message": "Succeeded",
"token": "{token}",
"version": "ncp_v2_v2.3.0-aa6cd8d-20231205_231211-3cf30bfc_v0.0.0_",
"params": {
"service": "ncp",
"domain": "general",
"lang": "enko",
"completion": "sync",
"callback": "",
"diarization": {
"enable": true,
"speakerCountMin": -1,
"speakerCountMax": -1
},
"sed": {
"enable": true
},
"boostings": [
{
"words": "Hello, test"
}
],
"forbiddens": "",
"wordAlignment": true,
"fullText": true,
"noiseFiltering": true,
"resultToObs": false,
"priority": 0,
"userdata": {
"_ncp_DomainCode": "NEST",
"_ncp_DomainId": 1,
"_ncp_TaskId": **442,
"_ncp_TraceId": "*****ce98ec342d8a8c8fe9191cec343",
"id": 1
}
},
"progress": 100,
"keywords": {},
"segments": [
{
"start": 5870,
"end": 8160,
"text": "This is the Seoul swimming pool.",
"confidence": 0.9626975,
"diarization": {
"label": "2"
},
"speaker": {
"label": "2",
"name": "B",
"edited": false
},
"words": [
[
5871,
6730,
"This is the Seoul"
],
[
6860,
7530,
"swimming pool."
]
],
"textEdited": "This is the Seoul swimming pool."
},
{
"start": 8160,
"end": 12950,
"text": "How much is the entry fee? It's 5000 KRW. Thank you.",
"confidence": 0.8835926,
"diarization": {
"label": "1"
},
"speaker": {
"label": "1",
"name": "A",
"edited": false
},
"words": [
[
8161,
9220,
"How much is"
],
[
9390,
10020,
"the entry fee?"
],
[
10410,
10640,
"It's 5000"
],
[
10710,
11140,
"KRW."
],
[
11910,
12500,
"Thank you."
]
],
"textEdited": "How much is the entry fee? It's 5000 KRW. Thank you."
}
],
"text": "This is the Seoul swimming pool. How much is the entry fee? It's 5000 KRW. Thank you.",
"confidence": 0.9071357,
"speakers": [
{
"label": "1",
"name": "A",
"edited": false
},
{
"label": "2",
"name": "B",
"edited": false
}
],
"events": [
{
"type": "music",
"label": "music",
"labelEdited": "music",
"start": 1400,
"end": 5000
}
],
"eventTypes": [
"music"
]
}
Request with sync
and return in SRT
The following is a sample response requested with sync
and returned in SRT format.
1
00:00:00,000 --> 00:00:01,425
A: Not long ago,
2
00:00:02,533 --> 00:00:11,550
A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
3
00:00:11,550 --> 00:00:19,025
A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
4
00:00:19,025 --> 00:00:26,317
C: You thought of saccharin, a bit. You had it super sweet.
5
00:00:26,317 --> 00:00:28,240
A: Is it corn?
6
00:00:28,240 --> 00:00:35,318
B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
7
00:00:35,318 --> 00:00:42,800
A: No, Chodang corn meant super sweet. No one has understood right now.
Request with sync
and return in SMI
The following is a sample response requested with sync
and returned in SMI format.
<SAMI>
<Body>
<SYNC Start=0>
<P>A: Not long ago,
<SYNC Start=2533>
<P>A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
<SYNC Start=11550>
<P>A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
<SYNC Start=19025>
<P>C: You thought of saccharin, a bit. You had it super sweet.
<SYNC Start=26317>
<P>A: Is it corn?
<SYNC Start=28240>
<P>B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
<SYNC Start=35318>
<P>A: No, Chodang corn meant super sweet. No one has understood right now.
</Body>
</SAMI>