Available in Classic and VPC

Recognize and convert real-time speech data in PCM format (headerless WAV files) at 16 kHz, 1 channel, 16 bits per sample into text. Access is available only via the gRPC protocol.

Request

This section describes the request format. The method and URI are as follows:

Host	Port
clovaspeech-gw.ncloud.com	50051

Request order

The following describes how to request recognition via gRPC.

Note

This guide is based on the Rocky Linux environment.

1. Install and prepare protoc compiler

In preparation for using the API, install the protoc compiler by referring to the gRPC site. After installation, select your preferred language (Python or Java), then Call compiler using the 'nest.proto' file where the API interface is defined.
The following describes how to install the protoc compiler and generate gRPC code.

Connect remotely to the server where you want to install the protoc compiler.

Install the packages and plugins for using gRPC.

Rocky Linux: Python

# Check the latest status
sudo dnf update

# Install Python: Install Python on the Linux server.
sudo dnf install python3

# Install and upgrade pip: pip is a package installer for Python.
sudo dnf install python3-pip
pip3 install --upgrade pip

# Install grpcio-tools: Install "grpcio-tools" using pip.
pip3 install grpcio-tools

# Create nest.proto file
touch nest.proto

# Compile nest.proto file with protoc compiler
python3 -m grpc_tools.protoc -I=. --python_out=. --grpc_python_out=. nest.proto

Rocky Linux: Java

# Download protoc-gen-grpc-java plugin (check https://github.com/grpc/grpc-java/releases for version number)
curl -OL https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.66.0/protoc-gen-grpc-java-1.66.0-linux-x86_64.exe

# Add to path
mv protoc-gen-grpc-java-1.66.0-linux-x86_64.exe /usr/local/bin/protoc-gen-grpc-java

# Change to execute permission
chmod +x /usr/local/bin/protoc-gen-grpc-java

# Confirm installation
protoc-gen-grpc-java --version

# Create nest.proto file
touch nest.proto

# Compile nest.proto file with protoc compiler
protoc --proto_path=. --java_out=output/directory --grpc-java_out=output/directory nest.proto

Open the 'nest.proto' file, enter the following code, and generate the gRPC code.

syntax = "proto3";
option java_multiple_files = true;
package com.nbp.cdncp.nest.grpc.proto.v1;

enum RequestType {
  CONFIG = 0;
  DATA = 1;
}

message NestConfig {
  string config = 1;
}

message NestData {
  bytes chunk = 1;
  string extra_contents = 2;
}
message NestRequest {
  RequestType type = 1;
  oneof part {
    NestConfig config = 2;
    NestData data = 3;
  }
}

message NestResponse {
  string contents = 1;
}
service NestService {
  rpc recognize(stream NestRequest) returns (stream NestResponse){};
}

2. Authorization

After completing gRPC code generation via the protoc compiler, proceed with authorization. Authorization involves including a bearer token in the Authorization header during API calls to verify the client's integrity with the server. Note the following when performing authorization.

Set up the gRPC channel and generate the stub, which is the client-side proxy for nest_grpc_pb2.
After generating the stub, execute the desired function by including metadata containing the authentication key in the recognize method.
- The real-time streaming recognition API is not supported on the Free plan; it is only available on the Basic long sentence recognition plan.

Authentication header

The authorization header is as follows:

Header name	Description
`Authorization`	`Bearer ${secretKey}`

Authorization order

The authorization method is as follows:

Python

The following describes how to authorize using Python in a Rocky Linux environment.

Create a Python file. Here, the file name is specified as main.py.
```
touch main.py
```

Add the following content to main.py.

import grpc
import json

import nest_pb2
import nest_pb2_grpc

channel = grpc.secure_channel(
        'clovaspeech-gw.ncloud.com:50051',
        grpc.ssl_channel_credentials()
)
client = NestServiceStub(channel)
metadata = (("authorization", f"Bearer {secretKey}"),) # Lowercase authorization required, secretkey is verified in the long sentence recognition domain
call = client.YourMethod(YourRequest(), metadata=metadata)

Java

The following describes how to authorize using Java in a Rocky Linux environment.

Create a Java file. Here, the file name is specified as main.java.
```
touch main.java
```

Add the following content to main.java.

ManagedChannel channel = NettyChannelBuilder
            .forTarget("clovaspeech-gw.ncloud.com:50051")
            .useTransportSecurity()
            .build();
NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
Metadata metadata = new Metadata();
metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
             "Bearer ${secretKey}");
client = MetadataUtils.attachHeaders(client, metadata);

3. Config JSON

This section describes the config JSON sent to the streaming endpoint via the NestRequest object generated by nest_pb2 in protoc. The config JSON must be sent during the first call to the real-time streaming recognition API.
The config JSON provides the following fields.

transcription: Set speech recognition language
keywordBoosting: Set to boost the recognition rate for entered words
forbidden: Set banned words
semanticEpd: Set criteria for generating speech recognition results
translationEpd: Set target language, response reception method, etc. during translation

Request body

The following describes the request body for the config JSON.

Transcription

The following describes Transcription.

Field	Type	Required	Description
`language`	String	Required	Language code for speech recognition target `ko` \| `en` \| `ja` `ko`: Korean `en`: English `ja`: Japanese

Note

Transcription is not a required input field, but we recommend setting it up for clear speech recognition.

Keyword Boosting

The following describes the Keyword Boosting field.

Field	Type	Required	Description
`keywordBoosting`	Object	Optional	Keyword boosting information Increase the recognition rate for pre-registered keywords.
`keywordBoosting.boostings`	Array	Optional	Keyword boosting word details

boostings

The following describes keywordBoosting.boostings.

Field	Type	Required	Description
`words`	String	Optional	Keyword boosting word list When multiple entries are provided, separate them with commas (,). Example: `"words": "test,test1,test2"` Include spaces before and after the word.
`weight`	Float	Optional	Keyword boosting weight `0`-`5.0` When the weight is `0`, boosting is not applied. All keywords have the same weight.

Forbidden

The following describes the Forbidden field.

Field	Type	Required	Description
`forbidden`	Object	Optional	Banned word information Decrease the recognition rate for pre-registered keywords.
`forbidden.forbiddens`	String	Optional	Banned word list When multiple entries are provided, separate them with commas (,). Example: `"forbiddens": "Banned word 1, banned word 2"` Include spaces before and after the word. Banned word tag: `<forbidden>Banned word</forbidden>` Add only to the value of the `text` key in the recognition result. Added tags have no effect on the recognition results' `position`, `periodPosition`, and `alignInfo`.

SemanticEPD

The following describes the Semantic EPD field.

Field	Type	Required	Description
`semanticEpd`	Object	Optional	Generation criteria settings information for the speech recognition results
`semanticEpd.skipEmptyText`	Boolean	Optional	Whether to transmit results with no recognition output. If this setting is set to true, results with no recognized syllables will not be transmitted. `true` \| `false` (default) `true`: Do not transmit `false`: Transmit
`semanticEpd.useWordEpd`	Boolean	Optional	Whether to generate recognition results ending with a word. Setting this to true generates recognition results that end with a word. `true` \| `false` (default) `true`: Generate `false`: Do not generate
`semanticEpd.usePeriodEpd`	Boolean	Optional	Whether to generate recognition results ending with punctuation. Setting this to true generates recognition results ending with punctuation. `true` \| `false` (default) `true`: Generate `false`: Do not generate To improve punctuation recognition accuracy, when `usePeriodEpd` is `true`, also set `useWordEpd` to `true`.
`semanticEpd.gapThreshold`	Integer	Optional	Silence duration threshold (ms, milliseconds) for generating recognition results. Recognition results are generated when silence exceeding `gapThreshold` occurs. The default value is `0`. It is unused if the user does not set it or sets a value less than or equal to `0`. It can be set in milliseconds.
`semanticEpd.durationThreshold`	Integer	Optional	Duration threshold (ms, milliseconds) for generating recognition results. Generate recognition results so that the duration is less than the `durationThreshold` value. The default value is `0`. If the user does not set it separately or sets a value less than or equal to `0`, the default value is used. We recommend setting it directly in milliseconds to generate recognition results of an appropriate length.
`semanticEpd.syllableThreshold`	Integer	Optional	Number of syllables used to generate recognition results. Generate recognition results such that the number of syllables composing them is less than the `syllableThreshold` value. Spaces (`" "`) and periods (`"."`) are also treated as one syllable. The default value is `0`. This setting is unused if the user does not set it or sets a value of `0` or lower.

Translation EPD

The following describes the Translation EPD field.

Field	Type	Required	Description
`translationEpd.targets`	string	Required	Enter the language code for the language you want to translate. For information on supported language codes, see Papago Translation's supported languages.
`translationEpd.mergedResult`	Boolean	Optional	Setting to receive recognition results and translation results as a single response `true` \| `false` (default) When set to true, translation results are generated using the EPD settings within the semanticEPD that produces the recognition results. In this case, certain settings in the Translation EPD (usePeriodEpd, gapThreshold, durationThreshold, syllableThreshold) are ignored and do not affect the generation of translation results.
`translationEpd.gapThreshold`	Integer	Optional	Silence duration threshold (ms, milliseconds) for generating recognition results. Recognition results are generated when silence exceeding `gapThreshold` occurs. The default value is `2000`. It is unused if the user does not set it or sets a value less than or equal to `0`. It can be set in milliseconds.
`translationEpd.durationThreshold`	Integer	Optional	Duration threshold (ms, milliseconds) for generating recognition results. Generate recognition results so that the duration is less than the `durationThreshold` value. The default value is `20000`. If the user sets a value less than or equal to `0`, the default value is used. We recommend setting it directly in milliseconds to generate recognition results of an appropriate length.
`translationEpd.honorific`	Boolean	Optional	Whether to apply honorifics `true` \| `false` (default) `true`: Apply honorifics `false`: No honorifics English ⇒ Korean, Japanese ⇒ Korean, Chinese (Simplified/Traditional) ⇒ Korean, Korean ⇒ Japanese, English ⇒ Japanese, Chinese (Simplified/Traditional) ⇒ Japanese
`translationEpd.glossaryKey`	String	Optional	Glossary ID Apply substitution translation based on glossary data. Korean ⇔ English, Japanese, Chinese (Simplified/Traditional), French \| English ⇔ Japanese, Chinese (Simplified/Traditional), Vietnamese, Thai, Indonesian, French \| Japanese ⇔ Chinese (Simplified/Traditional) `honorific` is not applied to terms within the glossary.

Request example

The following is a sample request for the config JSON.

#Semantic EPD    
 {
  "semanticEpd": {
    "skipEmptyText": false,           
    "useWordEpd": false,              
    "usePeriodEpd": true,             
    "gapThreshold": 2000,             
    "durationThreshold": 20000,       
    "syllableThreshold": 0            
  }
}
#Translation Info / EPD
 {
  "translation": {
    "targets": ["en"],         
    "mergedResult": False,
    "gapThreshold": 2000,      
    "durationThreshold": 20000,
    "honorific": False,      
    "glossaryKey": string
            }
         }    
#KeywordBoosting
  {
  "keywordBoosting": {                  
    "boostings": [
      {
        "words": "test,test1,test2",
        "weight": 1
      },
      {
        "words": "Test, test 1, test 2",
        "weight": 0.5
      }
    ],
  },
#Forbidden
    "forbidden": {
    "forbiddens":  "Banned word 1, banned word 2",
  }
}

Response body

The following describes the response body for the config JSON.

Field	Type	Required	Description
`uid`	String	-	UID
`responseType`	Array<String>	-	Response type `transcription` \| `keywordBoosting` \| `Forbidden` \| `semanticEpd`
`config`	Object	-	Config JSON information
`config.status`	String	-	Config JSON request status `Success` \| `Failure` \| `${message}` `Success`: Request successful (gRPC configuration saved successfully) `Failure`: Request failure (See Error message.) `${message}`: `top_level_key` can be omitted.
`config.keywordBoosting`	Object	-	Keyword boosting information
`config.keywordBoosting.status`	String	-	Keyword boosting request status `Success` \| `Failure` \| `${message}` `Success`: Request successful (gRPC configuration saved successfully) `Failure`: Request failure (See Error message.) `${message}`: `top_level_key` can be omitted.
`config.forbidden`	Object	-	Sensitive keyword information
`config.forbidden.status`	String	-	Sensitive keyword request status `Success` \| `Failure` \| `${message}` `Success`: Request successful (gRPC configuration saved successfully) `Failure`: Request failure (See Error message.) `${message}`: `top_level_key` can be omitted.
`config.semanticEpd`	Object	-	Semantic EPD information
`config.semanticEpd.status`	String	-	Semantic EPD request status `Success` \| `Failure` \| `${message}` `Success`: Request successful (gRPC configuration saved successfully) `Failure`: Request failure (See Error message.) `${message}`: `top_level_key` can be omitted.

Error message

The following describes the error messages displayed when a request fails.

Error message	Related field	Description
`Unknown key: ${top_level_key}-${unknown_key}`	Common	Unsupported sub-level key
`Invalid type: ${top_level_key}-${invalid_type_key}`	Common	Unsupported sub-level value type
`Invalid language code: ${invalid_language_code}`	`transcription`	`language` not predefined
`Not Authorized`	`transcription`	`language` not authorized
`Targets are empty`	`translation`	When targets are not set in the config request JSON
`Invalid language code: ${source}:${targets}`	`translation`	When the language code is unsupported or the source and target are identical
`Internal system error`	`keywordBoosting`	Internal server system error
`Invalid request json format`	-	Abnormal JSON format
`Required key is not provided`	-	Mandatory key value defined by the server missing
`No more slot`	-	No available resources on the current server
`ConfigRequest did not complete`	-	Config JSON request processing incomplete when server recognition request was made
`Lifespan expired`	-	gRPC service usage time expired Threshold: 100 hours
`Failed to received request msg`	-	Server failed to properly receive the request message
`Model server is not working`	-	Internal server errors
`Internal server error`	-	Internal server error
`RESOURCE_EXHAUSTED`	-	No available gRPC connection resources Threshold: Exceeding 15 per domain

Response example

The following is a sample response for the config JSON.

Succeeded

The following is a sample response upon a successful call.

{
  "uid": "{uid}",
  "responseType": [ "config" ],
  "config": {
    "status": "Success",
    "keywordBoosting": {
      "status": "Success"
    },
    "forbidden" : {
      "status": "Success"
    }
}

Failure

The following is a sample response when the call fails upon entering hobidden in the request.

{
  "uid": "{uid}",
  "responseType": [ "config" ]
  "config": {
    "status": "Unknown key: hobidden"
  }
}

4. Recognize

After configuring the desired settings via the config JSON, call the speech recognition API using recognize to process and recognize speech data in real time. The NestRequest and authorization metadata in the code generated by the protoc will call the speech recognition API through the recognize method of the stub.

Request body

The following describes the request body for Recognize.

Field	Type	Required	Description
`epFlag`	Boolean	Optional	Buffer and result return timing upon pause or last recognition request `true` \| `false` (default) `true`: Immediately return buffer after recognition request, then return result. `false`: Automatically return buffer after 10 seconds with no additional requests, then return result.
`seqId`	Integer	Conditional	Recognition request ID It is used to check results when `epFlag` is set to `true`. Result value is `0` if `seqId` is set and not transmitted. We recommend setting and transmitting it with a value other than `0`.

Response body

The following describes the response body for Recognize.

Field	Type	Required	Description
`uid`	String	-	UID
`responseType`	Array	-	Response type `transcription` \| `keywordBoosting` \| `Forbidden` \| `semanticEpd` \| `recognize`
`config`	Object	-	Config JSON field information
`config.text`	String	-	Recognition result text
`config.position`	Integer	-	The position of the text received as `text` in the entire text
`config.periodPositions`	Array<Integer>	-	The position of `.` (punctuation) in the entire text Space if there is no `.` in `text`
`config.periodAlignIndices`	Array<Integer>	-	The index `alignInfos` information of `.` Space if there is no `.` in `text`
`config.epFlag`	Boolean	Optional	Whether to include recognition results for the audio sent with `epFlag` set to `true` in the request `true` \| `false` `true`: Include `false`: Do not include
`config.seqId`	Integer	-	Whether it is the last recognition request `true` \| `false` `true`: Return the `seqId` of the last recognition request. `false`: Return `0`.
`config.epdType`	String	-	EPD criteria for generating recognition results `gap` \| `endPoint` \| `durationThreshold` \| `period` \| `syllableThreshold` \| `unvoice` `gap`: Silent `endPoint`: Last speech data segment `durationThreshold`: Playback duration `period`: Punctuation `syllableThreshold`: Number of syllables `unvoice`: Run `unvoiceTime` (server setting).
`config.startTimestamp`	Integer	-	Recognition result start time (ms)
`config.endTimestamp`	Integer	-	Recognition result end time (ms)
`config.confidence`	Float	-	Recognition result confidence Geometric mean of all syllable confidence values (`alignInfos.confidence`) in the recognition result
`config.alignInfos`	Array	-	Align information for syllables in the recognition result
`recognize`	Object	-	Recognize information
`recognize.status`	String	-	Recognize status See Error message for response failures.
`recognize.epFlag`	Object	-	`epFlag` information
`recognize.epFlag.status`	String	-	`epFlag` status If the `extraContents` in the recognize request JSON is an invalid format, the failure details are displayed in `epFlag.status` or `seqId.status`. See Error message for response failures.
`recognize.seqId`	Object	-	`seqId` information
`recognize.seqId.status`	String	-	`seqId` status If the `extraContents` in the recognize request JSON is an invalid format, the failure details are displayed in `epFlag.status` or `seqId.status`. See Error message for response failures.

Note

An example of constructing full text using text and position is as follows:

Received order	Recognition result	full text
1	{text: "ABC", position: 0, ...}	"ABC"
2	{text: "DEFG", position: 3, ...}	"ABCDEFG"

`alignInfos`

The following describes config.alignInfos.

Field	Type	Required	Description
`word`	String	-	Composition syllables
`start`	Integer	-	Composition syllable start time (ms)
`end`	Integer	-	Composition syllable end time (ms)
`confidence`	Float	-	Composition syllable confidence 0-1.0

Error message

The following describes the error messages displayed when a Recognize request fails.

Error message	Related field	Description
`Invalid Type`	`recognize.status`	`epFlag` or `seqId` type does not match predefined type.
`Required key is not provided`	`recognize.status`	`epFlag` value in `extraContents` not provided
`Invalid request json format`	`recognize.status`	`extraContents` is not in JSON format.
`Unknown key`	`recognize.status`	A key not defined in the protocol written in `extraContents`
`ConfigRequest is already called`	`recognize.status`	Duplicate config request to server
`Lifespan expired`	`recognize.status`	gRPC service usage time expired Threshold: 100 hours
`Failed to received request msg`	`recognize.status`	Server failed to properly receive the request message
`Model server is not working`	`recognize.status`	Internal server system error
`Internal server error`	`recognize.status`	Internal server system error
`Failed to translation: ${message}`	-	Translation feature-related error (See Text translation error message.)
`Invalid format`	`recognize.status`	The transmitted audio format is invalid.
`Not found`	`epFlag.status`	`epFlag` value not entered
`Invalid type`	`epFlag.status`, `seqId.status`	Predefined type mismatch
`Invalid format`	`audio.status`	Predefined type in the `audio` field mismatch

Response example

The response example is as follows:

Succeeded

The following is a sample response upon a successful call.

When the response is successful and "responseType": [ "transcription" ]

{
  "uid": "{uid}"
  "responseType": [ "transcription" ]
  "transcription": {
    "text": "This is text.",
    "position": 0,
    "periodPositions": [3],
    "periodAlignIndices": [3],
    "epFlag": false,
    "seqId": 0,
    "epdType": "durationThreshold",
    "startTimestamp": 190,
    "endTimestamp": 840,
    "confidence": 0.997389124199423,
    "alignInfos": [
      {"word":"This","start":190,"end":340,"confidence":0.9988637124943075},
      {"word":"is","start":341,"end":447,"confidence":0.9990018488549978},
      {"word":"text","start":448,"end":580,"confidence":0.9912501264550316},
      {"word":".","start":581,"end":700,"confidence":0.9994397226648595},
      {"word":" ","start":701,"end":840,"confidence":0.9984142043105126}
    ]
  }
}

Failure

The following is a sample response upon a failed call.

{
  "uid": string,                     # required
  "responseType": [ "recognize" ],    # required
  "recognize": {                     # required
    "status": string,                # required
    "epFlag": {                      # optional
      "status": string
    },
    "seqId": {                       # optional
      "status": string
    },
    "audio": {                       # op
      "status": 
    }
  }
}

5. Other API

Get the number of currently active stub calls and the maximum allowed stub count for the domain.

curl --location 'https://clovaspeech-gw.ncloud.com:50051/api/v1/1064/active-calls' \
--header 'Authorization: Bearer ${API_KEY}'

Response

Response parameters

Parameter name	Required	Type	Restrictions	Description
activeCalls	-	Integer	0~999	Number of currently active stub calls
MaxCalls	-	Integer	0~999	Maximum number of stubs allowed within the domain
Timestamp	-	String	yyyyMMddHHmmss	Data creation time

Response example

{
    "activeCalls": 3,
    "maxCalls": 15,
    "timestamp": "2025-04-25T12:09:27.382+09:00"
}