English

HOME
API overview
- API overview
- Calling and authenticating API
- API data types
  - VPC
  - Classic
Platform
- List Price
- Cost And Usage
- Region
  - getRegionList
- Discount
Compute
- Server (VPC)
- Server
- Auto Scaling (VPC)
- Auto Scaling
- Cloud Functions
- Metadata (VPC)
Containers
- Container Registry
- Ncloud Kubernetes Service (VPC)
Storage
- Object Storage
- Archive Storage
- NAS (VPC)
- NAS
Networking
- VPC
- Load Balancer (VPC)
- Load Balancer
- Global DNS
  - Monitoring
    - Monitoring Overview
    - View number of domain queries
  - Record
- Global Traffic Manager
Database
- Cloud DB
- Cloud DB for PostgreSQL (VPC)
- Cloud DB for MySQL (VPC)
- Cloud DB for Redis (VPC)
- Cloud DB for MSSQL (VPC)
- Cloud DB for MongoDB (VPC)
Security
- Secure Zone
- File Safer
- Security Monitoring
- Web Security Checker
- Key Management Service
- Certificate Manager
- PrivateCA
- Webshell Behavior Detector
- Secret Manager
AI Services
- AiTEMS
- CLOVA Chatbot
  - CLOVA Chatbot overview
  - Open
  - Send
  - getPersistentMenu
  - Component
    - Action
    - Basic
    - Composite
    - Flex
  - CLOVA Chatbot examples
- CLOVA OCR
- NCLUE
- CLOVA Speech
- CLOVA Studio
- CLOVA GreenEye
- Papago Translation
- Papago Image Translation
AI·NAVER API
- CLOVA Speech Recognition (CSR)
- CLOVA Voice
- Maps
- CAPTCHA
- nShortURL (deprecated)
- Search Trend
Application Services
- GeoLocation
- Simple & Easy Notification Service
- API Gateway
- Cloud Outbound Mailer
Big Data & Analytics
- Cloud Hadoop (VPC)
- Cloud Hadoop
- Cloud Search
- Search Engine Service (VPC)
- Cloud Data Streaming Service(VPC)
- Data Forest
- Data Box Frame
- Data Catalog
- Data Flow
- Data Query
  - Data Query overview
  - Catalog
    - queries
- Cloud Data Box
Blockchain
- Blockchain Service
Business Applications
- Ncloud Chat
Content Delivery
- CDN Overview
- CDN+ (Deprecated)
- Global CDN
- Global Edge
Developer Tools
- SourceCommit
- SourceBuild
- SourceDeploy
- SourcePipeline
Digital Twin
- ARC eye
  - ARC eye VOT API
  - ARC eye VL API
Gaming
- GAMEPOT
- Game Chat
Hybrid & Private Cloud
- Neurocloud
  - Neurocloud metrics
Management & Governance
- Cloud Log Analytics
- Sub Account
- Secure Token Service
- Web service Monitoring System
- Effective Log Search & Analytics
- Network Traffic Monitoring
- Cloud Activity Tracer
  - Cloud Activity Tracer overview
  - GetActivityList
- Resource Manager
- Cloud Insight
- Ncloud Single Sign-On
- Cloud Advisor (VPC)
- Organization
Media
- Live Station
- VOD Station
- Video Player Enhancement
  - Video Player Enhancement overview
  - Player
- One Click Multi DRM
- B2B Prism Live Studio
- Media Connect Center
Migration
- Object Migration

External file recognition

Print
Share
Twitter
Linkedin
Facebook
Email
PDF

External file recognition

Print
Share
Twitter
Linkedin
Facebook
Email
PDF

Article summary

Did you find this summary helpful?

Thank you for your feedback

Available in Classic and VPC

Call the URL of an externally accessible audio/video file to recognize and convert it to text.

Request

The following describes the request format for the endpoint. The request format is as follows:

Method	URI
POST	/recognizer/url

Request headers

For headers common to all CLOVA Speech APIs, see Common CLOVA Speech headers.

Request body

The following describes the request body.

Field	Type	Required	Description
`url`	String	Required	Audio/video file URL
`language`	String	Required	Text recognition language `ko-KR` (default) \| `en-US` \| `enko` \| `ja` \| `zh-cn` \| `zh-tw` `ko-kR`: Korean `en-US`: English `enko`: Korean/English simultaneous recognition `ja`: Japanese `zh-cn`: Chinese (Simplified) `zh-tw`: Chinese (Traditional)
`completion`	String	Optional	Response method after recognition request `sync` \| `async` (default) `sync`: Return results in JSON format `async`: Return in the form of callback URL or `resultToObs` (ObjectStorage)
`callback`	String	Conditional	Callback URL If `completion` is `async`, either `callback` or `resultToObs` must be entered
`userdata`	Object	Optional	User data details
`wordAlignment`	Boolean	Optional	Whether to output speech and text alignment of recognition results `true` (default) \| `false` `true`: output `false`: no output
`fullText`	Boolean	Optional	Whether to output full recognition result text `true` (default) \| `false` `true`: output `false`: no output
`resultToObs`	Boolean	Conditional	Whether to save results in Object Storage `true` \| `false` (default) `true`: results saved `false`: results not saved If `completion` is `async`, either `callback` or `resultToObs` must be entered
`noiseFiltering`	Boolean	Optional	Noise filtering `true` (default) \| `false` `true`: filtered `false`: not filtered
`boostings`	Array	Optional	Keyword boosting details List of keywords to boost speech recognition for Can't be used concurrently with `useDomainBoostings` Up to 1000 entries allowed Only available in Korean and English English: lowercase conversion by default, capitalize keywords requested for boosting No boosting for single-syllable words due to risk of misidentification <e.g.> `yes`, `yeah`, `no` Boosting is applied regardless of spacing <e.g.> Request boosting for only one keyword between "CLOVA Speech" and "CLOVASpeech" There is no restriction on keyword length, but if the phrase to be boosted is a combination of multiple words, it will not be affected by boosting unless it is that exact phrase <e.g.> If you boost the keyword "CLOVA Speech," all sentences containing "CLOVA Speech" will be affected by boosting <e.g.> If you boost a long keyword such as "CLOVA Speech's media speech recognition technology," sentences that contain only "CLOVA Speech" are unlikely to be affected by boosting
`useDomainBoostings`	Boolean	Optional	Whether to use domain boosting `true` \| `false` (default) `true`: boosting used `false`: boosting not used Can't be used concurrently with `boostings`
`forbiddens`	String	Optional	Sensitive keywords List of keywords to reduce the speech recognition rate (if you don't want them to appear in the recognition results) No limit on the number and length of keywords Spaces and capitalization are required to be matched exactly
`diarization`	Object	Optional	Detailed settings for speaker recognition
`diarization.enable`	Boolean	Optional	Whether to recognize speaker `true` (default) \| `false` `true`: speaker recognized `false`: speaker not recognized
`sed`	Object	Optional	Event detection result details
`sed.enable`	Boolean	Optional	Whether to detect events `true` \| `false` (default) `true`: event detected `false`: event not detected
`format`	String	Optional	Response result return format `JSON` (default) \| `SRT` \| `SMI`

`boostings`

The following describes boostings.

Field	Type	Required	Description
`words`	String	Optional	List of words to keyword boost

Note

When requesting completion (request-and-response method) as async, the recognition result is returned as follows depending on whether there is a callback URL address or resultToObs(ObjectStorage) entered.

Callback URL	resultToObs(ObjectStorage)	Result
URL address exists	True	Return results to both callback URL and Object Storage
URL address exists	False	Return results only to the callback URL
URL address doesn't exist	True	Return results only to Object Storage
URL address doesn't exist	False	Return an error

Request example

The following is a sample request.

curl --location --request POST 'https://clovaspeech-gw.ncloud.com/external/v1/8881/5f7e1b4c866f1c605946c9236f9aa8************/recognizer/url' \
--header 'Content-Type: application/json' \
--header 'X-CLOVASPEECH-API-KEY: {Secret key issued when registering the app}' \
--data '{
  "language": "ko-KR",
  "completion":"async",
  "url": "{url}",
  "resultToObs" : true
}'

Response

The following describes the response format.

Response body

The following describes the response body.

Field	Type	Required	Description
`result`	String	-	Response code
`message`	String	-	Response message
`token`	String	-	Result token
`version`	String	-	Engine version
`params`	Object	-	Parameter details
`params.service`	String	-	Service code
`params.domain`	String	-	Domain type Use when calling the engine `general`
`params.lang`	String	-	Recognition language `ko` \| `en` \| `enko` \| `ja` \| `zh-cn` \| `zh-tw` `ko`: Korean `en`: English `enko`: Korean/English simultaneous translation `ja`: Japanese `zh-cn`: Chinese (Simplified) `zh-tw`: Chinese (Traditional)
`params.completion`	String	-	Request format `sync`: Return results in JSON format `async`: Return in the form of callback URL or `resultToObs` (ObjectStorage)
`params.callback`	String	-	Callback URL
`params.diarization`	Object	-	Speaker recognition (separation) details
`params.diarization.enable`	Boolean	-	Whether to recognize (separate) speaker `true` \| `false` `true`: speaker recognized `false`: speaker not recognized
`params.diarization.speakerCountMin`	Integer	-	Minimum number of speakers
`params.diarization.speakerCountMax`	Integer	-	Maximum number of speakers
`params.sed`	Object	-	Event detection result
`params.sed.enable`	Boolean	-	Whether to detect events `true` \| `false` (default) `true`: event detected `false`: event not detected
`params.boostings`	Array	-	Keyword boosting details For more information, see `boostings` of Request body
`params.forbiddens`	String	-	Sensitive keywords For more information, see `forbiddens` of Request body
`params.wordAlignment`	Boolean	Optional	Whether to output speech and text alignment of recognition results `true` (default) \| `false` `true`: output `false`: no output
`params.fullText`	Boolean	-	Whether to output full recognition result text `true` (default) \| `false` `true`: output `false`: no output
`params.noiseFiltering`	Boolean	-	Noise filtering `true` (default) \| `false` `true`: filtered `false`: not filtered
`params.resultToObs`	Boolean	-	Whether to save results in Object Storage Operate only if `completion` is `async` `true` \| `false` (default) `true`: results saved `false`: results not saved
`params.priority`	Integer	-	Priority 0 - 4 The lower the number, the higher the priority
`params.userdata`	Object	-	User data details
`params.userdata._ncp_DomainCode`	String	-	Domain code `long-speech` \| `short-speech` `long-speech`: long sentence recognition `short-speech`: short sentence recognition
`params.userdata._ncp_DomainId`	Integer	-	Domain ID
`params.userdata._ncp_TaskId`	Integer	-	Task ID Use to track specific recognition tasks
`params.userdata._ncp_TraceId`	String	-	Trace ID Use to track logs
`progress`	Integer	-	Recognition progress
`segments`	Array	-	segments details
`text`	String	-	Overall text
`confidence`	Double	-	Overall accuracy
`speakers`	Array	-	All speaker details
`events`	Array	-	Event details
`eventTypes`	Array	-	Details of all recognized events

params.boostings

The following describes params.boostings.

Field	Type	Required	Description
`words`	String	-	List of words to keyword boost

segments

The following describes segments.

Field	Type	Required	Description
`start`	Long	-	Analysis start time (ms)
`end`	Long	-	Analysis end time (ms)
`text`	String	-	Analyzed text
`confidence`	Double	-	Analysis accuracy 0.0 - 1.0
`diarization`	Object	-	Recognized speaker details
`diarization.label`	String	-	Recognized speaker's number
`speaker`	Object	-	Changed speaker's details
`speaker.label`	String	-	Changed speaker's number
`speaker.name`	String	-	Changed speaker's name
`speaker.edited`	Boolean	-	Whether speaker is changed `true` \| `false` (default) `true`: speaker changed `false`: speaker same
`words`	Array<Long, Long, String>	-	List of recognized words
`words.[0]`	Long	-	Segment start time (ms)
`words.[1]`	Long	-	Segment end time (ms)
`words.[2]`	String	-	Segment text
`textEdited`	String	-	Modification details

speakers

The following describes speakers.

Field	Type	Required	Description
`label`	String	-	Numbers of all speakers
`name`	String	-	Names of all speakers
`edited`	Boolean	-	Whether speaker is changed `true` \| `false` (default) `true`: speaker changed `false`: speaker same

events

The following describes events.

Field	Type	Required	Description
`type`	String	-	Event type
`label`	String	-	Event name
`labelEdited`	String	-	Event change name
`start`	Long	-	Event start time
`end`	Long	-	Event end time

eventTypes

The following describes eventTypes.

Field	Type	Required	Description
`label`	String	-	Recognized event

Response status codes

For response status codes common to all CLOVA Speech APIs, see Common CLOVA Speech response status codes.

Response example

The following is a sample example.

Request with `async` and return in JSON

The following is a sample response requested with async and returned in JSON format.

{
    "token": "*****f6a1015466bae2c926177f26310",
    "result": "SUCCEEDED",
    "message": "Succeeded"
}

Request with `sync` and return in JSON

The following is a sample response requested with sync and returned in JSON format.

{
    "result": "COMPLETED",
    "message": "Succeeded",
    "token": "*****166039e486abbb90e4a84c3b3a5",
    "version": "ncp_v2_v2.3.0-aa6cd8d-20231205_231211-3cf30bfc_v0.0.0_",
    "params": {
        "service": "ncp",
        "domain": "general",
        "lang": "enko",
        "completion": "sync",
        "callback": "",
        "diarization": {
            "enable": true,
            "speakerCountMin": -1,
            "speakerCountMax": -1
        },
        "sed": {
            "enable": true
        },
        "boostings": [
            {
                "words": "Hello, test"
            }
        ],
        "forbiddens": "",
        "wordAlignment": true,
        "fullText": true,
        "noiseFiltering": true,
        "resultToObs": false,
        "priority": 0,
        "userdata": {
            "_ncp_DomainCode": "NEST",
            "_ncp_DomainId": 1,
            "_ncp_TaskId": **442,
            "_ncp_TraceId": "*****ce98ec342d8a8c8fe9191cec343",
            "id": 1
        }
    },
    "progress": 100,
    "keywords": {},
    "segments": [
        {
            "start": 5870,
            "end": 8160,
            "text": "This is the Seoul swimming pool.",
            "confidence": 0.9626975,
            "diarization": {
                "label": "2"
            },
            "speaker": {
                "label": "2",
                "name": "B",
                "edited": false
            },
            "words": [
                [
                    5871,
                    6730,
                    "This is the Seoul"
                ],
                [
                    6860,
                    7530,
                    "swimming pool."
                ]
            ],
            "textEdited": "This is the Seoul swimming pool."
        },
        {
            "start": 8160,
            "end": 12950,
            "text": "How much is the entry fee? It's 5000 KRW. Thank you.",
            "confidence": 0.8835926,
            "diarization": {
                "label": "1"
            },
            "speaker": {
                "label": "1",
                "name": "A",
                "edited": false
            },
            "words": [
                [
                    8161,
                    9220,
                    "How much is"
                ],
                [
                    9390,
                    10020,
                    "the entry fee?"
                ],
                [
                    10410,
                    10640,
                    "It's 5000"
                ],
                [
                    10710,
                    11140,
                    "KRW."
                ],
                [
                    11910,
                    12500,
                    "Thank you."
                ]
            ],
            "textEdited": "How much is the entry fee? It's 5000 KRW. Thank you."
        }
    ],
    "text": "This is the Seoul swimming pool. How much is the entry fee? It's 5000 KRW. Thank you.",
    "confidence": 0.9071357,
    "speakers": [
        {
            "label": "1",
            "name": "A",
            "edited": false
        },
        {
            "label": "2",
            "name": "B",
            "edited": false
        }
    ],
    "events": [
        {
            "type": "music",
            "label": "music",
            "labelEdited": "music",
            "start": 1400,
            "end": 5000
        }
    ],
    "eventTypes": [
        "music"
    ]
}

Request with `sync` and return in SRT

The following is a sample response requested with sync and returned in SRT format.

1
00:00:00,000 --> 00:00:01,425
A: Not long ago,

2
00:00:02,533 --> 00:00:11,550
A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.

3
00:00:11,550 --> 00:00:19,025
A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.

4
00:00:19,025 --> 00:00:26,317
C: You thought of saccharin, a bit. You had it super sweet.

5
00:00:26,317 --> 00:00:28,240
A: Is it corn?

6
00:00:28,240 --> 00:00:35,318
B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?

7
00:00:35,318 --> 00:00:42,800
A: No, Chodang corn meant super sweet. No one has understood right now.

Request with `sync` and return in SMI

The following is a sample response requested with sync and returned in SMI format.

<SAMI>
<Body>
  <SYNC Start=0>
    <P>A: Not long ago,
  <SYNC Start=2533>
    <P>A: I had some corn. It was really sweet and delicious, but I thought it was the name of a neighborhood.
  <SYNC Start=11550>
    <P>A: I didn't know it was "cho" from "Chosaier" and "dang" which meant sweet. I didn't know. I thought chodang was the same word used for Chodang tofu.
  <SYNC Start=19025>
    <P>C: You thought of saccharin, a bit. You had it super sweet.
  <SYNC Start=26317>
    <P>A: Is it corn?
  <SYNC Start=28240>
    <P>B: Where can you find sweet tofu? This do doesn't understand. Isn't Sangdo in the Chodang area?
  <SYNC Start=35318>
    <P>A: No, Chodang corn meant super sweet. No one has understood right now.
</Body>
</SAMI>

Was this article helpful?

Table of contents

Request
Response

External file recognition

Request

Request headers

Request body

boostings

Request example

Response

Response body

params.boostings

segments

speakers

events

eventTypes

Response status codes

Response example

Request with async and return in JSON

Request with sync and return in JSON

Request with sync and return in SRT

Request with sync and return in SMI

`boostings`

Request with `async` and return in JSON

Request with `sync` and return in JSON

Request with `sync` and return in SRT

Request with `sync` and return in SMI