Available in Classic and VPC
Recognize and convert real-time speech data in PCM format (headerless WAV files) at 16 kHz, 1 channel, 16 bits per sample into text. Access is available only via the gRPC protocol.
Request
This section describes the request format. The method and URI are as follows:
| Host | Port |
|---|---|
| clovaspeech-gw.ncloud.com | 50051 |
Request order
To request recognition via gRPC:
This guide is based on the Rocky Linux environment.
1. Install and prepare protoc compiler
In preparation for using the API, install the protoc compiler by referring to the gRPC site. After installation, select your preferred language (Python or Java), then Call compiler using the 'nest.proto' file where the API interface is defined.
To install the protoc compiler and generate gRPC code:
-
Connect remotely to the server where you want to install the protoc compiler.
-
Install the packages and plugins for using gRPC.
-
Rocky Linux: Python
# Check the latest status sudo dnf update # Install Python: Install Python on the Linux server. sudo dnf install python3 # Install and upgrade pip: pip is a package installer for Python. sudo dnf install python3-pip pip3 install --upgrade pip # Install grpcio-tools: Install "grpcio-tools" using pip. pip3 install grpcio-tools # Create nest.proto file touch nest.proto # Compile nest.proto file with protoc compiler python3 -m grpc_tools.protoc -I=. --python_out=. --grpc_python_out=. nest.proto -
Rocky Linux: Java
# Download protoc-gen-grpc-java plugin (check https://github.com/grpc/grpc-java/releases for version number) curl -OL https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.66.0/protoc-gen-grpc-java-1.66.0-linux-x86_64.exe # Add to path mv protoc-gen-grpc-java-1.66.0-linux-x86_64.exe /usr/local/bin/protoc-gen-grpc-java # Change to execute permission chmod +x /usr/local/bin/protoc-gen-grpc-java # Confirm installation protoc-gen-grpc-java --version # Create nest.proto file touch nest.proto # Compile nest.proto file with protoc compiler protoc --proto_path=. --java_out=output/directory --grpc-java_out=output/directory nest.proto
-
-
Open the
'nest.proto'file, enter the following code, and generate the gRPC code.syntax = "proto3"; option java_multiple_files = true; package com.nbp.cdncp.nest.grpc.proto.v1; enum RequestType { CONFIG = 0; DATA = 1; } message NestConfig { string config = 1; } message NestData { bytes chunk = 1; string extra_contents = 2; } message NestRequest { RequestType type = 1; oneof part { NestConfig config = 2; NestData data = 3; } } message NestResponse { string contents = 1; } service NestService { rpc recognize(stream NestRequest) returns (stream NestResponse){}; }
2. Authorization
After completing gRPC code generation via the protoc compiler, proceed with authorization. Authorization involves including a bearer token in the Authorization header during API calls to verify the client's integrity with the server. Note the following when performing authorization.
- Set up the gRPC channel and generate the
stub, which is the client-side proxy fornest_grpc_pb2. - After generating the
stub, execute the desired function by including metadata containing the authentication key in therecognizemethod.- The real-time streaming recognition API is not supported on the Free plan; it is only available on the Basic long sentence recognition plan.
Authentication header
The authorization header is as follows:
| Header name | Description |
|---|---|
Authorization |
Bearer ${secretKey} |
Authorization order
The authorization method is as follows:
Python
To authorize using Python in a Rocky Linux environment:
- Create a Python file. Here, the file name is specified as
main.py.touch main.py - Add the following content to
main.py.import grpc import json import nest_pb2 import nest_pb2_grpc channel = grpc.secure_channel( 'clovaspeech-gw.ncloud.com:50051', grpc.ssl_channel_credentials() ) client = NestServiceStub(channel) metadata = (("authorization", f"Bearer {secretKey}"),) # Lowercase authorization required, secretkey is verified in the long sentence recognition domain call = client.YourMethod(YourRequest(), metadata=metadata)
Java
To authorize using Java in a Rocky Linux environment:
- Create a Java file. Here, the file name is specified as
main.java.touch main.java - Add the following content to
main.java.ManagedChannel channel = NettyChannelBuilder .forTarget("clovaspeech-gw.ncloud.com:50051") .useTransportSecurity() .build(); NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel); Metadata metadata = new Metadata(); metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER), "Bearer ${secretKey}"); client = MetadataUtils.attachHeaders(client, metadata);
3. Config JSON
This section describes the config JSON sent to the streaming endpoint via the NestRequest object generated by nest_pb2 in protoc. The config JSON must be sent during the first call to the real-time streaming recognition API.
The config JSON provides the following fields.
- transcription: Set speech recognition language
- keywordBoosting: Set to boost the recognition rate for entered words
- forbidden: Set banned words
- semanticEpd: Set criteria for generating speech recognition results
- translationEpd: Set target language, response reception method, etc. during translation
Request body
The following describes the request body for the config JSON.
Transcription
The following describes Transcription.
| Field | Type | Required | Description |
|---|---|---|---|
language |
String | Required | Language code for speech recognition target
|
Transcription is not a required input field, but we recommend setting it up for clear speech recognition.
Keyword Boosting
The following describes the Keyword Boosting field.
| Field | Type | Required | Description |
|---|---|---|---|
keywordBoosting |
Object | Optional | Keyword boosting information
|
keywordBoosting.boostings |
Array | Optional | Keyword boosting word details: boostings |
boostings
The following describes boostings.
| Field | Type | Required | Description |
|---|---|---|---|
words |
String | Optional | Keyword boosting word list
|
weight |
Float | Optional | Keyword boosting weight
|
Forbidden
The following describes the Forbidden field.
| Field | Type | Required | Description |
|---|---|---|---|
forbidden |
Object | Optional | Banned word information
|
forbidden.forbiddens |
String | Optional | Banned word list
|
SemanticEPD
The following describes the Semantic EPD field.
| Field | Type | Required | Description |
|---|---|---|---|
semanticEpd |
Object | Optional | Generation criteria settings information for the speech recognition results |
semanticEpd.skipEmptyText |
Boolean | Optional | Whether to transmit results with no recognition output. If this setting is set to true, results with no recognized syllables will not be transmitted.
|
semanticEpd.useWordEpd |
Boolean | Optional | Whether to generate recognition results ending with a word. Setting this to true generates recognition results that end with a word.
|
semanticEpd.usePeriodEpd |
Boolean | Optional | Whether to generate recognition results ending with punctuation. Setting this to true generates recognition results ending with punctuation.
|
semanticEpd.gapThreshold |
Integer | Optional | Silence duration threshold (ms, milliseconds) for generating recognition results. Recognition results are generated when silence exceeding gapThreshold occurs.
|
semanticEpd.durationThreshold |
Integer | Optional | Duration threshold (ms, milliseconds) for generating recognition results. Generate recognition results so that the duration is less than the durationThreshold value.
|
semanticEpd.syllableThreshold |
Integer | Optional | Number of syllables used to generate recognition results. Generate recognition results such that the number of syllables composing them is less than the syllableThreshold value.
|
Translation
The following describes the Translation field.
| Field | Type | Required | Description |
|---|---|---|---|
translation.targets |
string | Required | Enter the language code for the language you want to translate.
|
translation.mergedResult |
Boolean | Optional | Setting to receive recognition results and translation results as a single response
|
translation.gapThreshold |
Integer | Optional | Silence duration threshold (ms, milliseconds) for generating recognition results. Recognition results are generated when silence exceeding gapThreshold occurs.
|
translation.durationThreshold |
Integer | Optional | Duration threshold (ms, milliseconds) for generating recognition results. Generate recognition results so that the duration is less than the durationThreshold value.
|
translation.honorific |
Boolean | Optional | Whether to apply honorifics
|
translation.glossaryKey |
String | Optional | Glossary ID
|
Request example
The following is a sample request for the config JSON.
#Semantic EPD
{
"semanticEpd": {
"skipEmptyText": false,
"useWordEpd": false,
"usePeriodEpd": true,
"gapThreshold": 2000,
"durationThreshold": 20000,
"syllableThreshold": 0
}
}
#Translation Info / EPD
{
"translation": {
"targets": ["en"],
"mergedResult": False,
"gapThreshold": 2000,
"durationThreshold": 20000,
"honorific": False,
"glossaryKey": string
}
}
#KeywordBoosting
{
"keywordBoosting": {
"boostings": [
{
"words": "test,test1,test2",
"weight": 1
},
{
"words": "Test, test 1, test 2",
"weight": 0.5
}
],
},
#Forbidden
"forbidden": {
"forbiddens": "Banned word 1, banned word 2",
}
}
Response body
The following describes the response body for the config JSON.
| Field | Type | Required | Description |
|---|---|---|---|
uid |
String | - | UID |
responseType |
Array<String> | - | Response type
|
config |
Object | - | Config JSON information |
config.status |
String | - | Config JSON request status
|
config.keywordBoosting |
Object | - | Keyword boosting information |
config.keywordBoosting.status |
String | - | Keyword boosting request status
|
config.forbidden |
Object | - | Sensitive keyword information |
config.forbidden.status |
String | - | Sensitive keyword request status
|
config.semanticEpd |
Object | - | Semantic EPD information |
config.semanticEpd.status |
String | - | Semantic EPD request status
|
Error message
The following describes the error messages displayed when a request fails.
| Error message | Related field | Description |
|---|---|---|
Unknown key: ${top_level_key}-${unknown_key} |
Common | Unsupported sub-level key |
Invalid type: ${top_level_key}-${invalid_type_key} |
Common | Unsupported sub-level value type |
Invalid language code: ${invalid_language_code} |
transcription |
language not predefined |
Not Authorized |
transcription |
language not authorized |
Targets are empty |
translation |
When targets are not set in the config request JSON |
Invalid language code: ${source}:${targets} |
translation |
When the language code is unsupported or the source and target are identical |
Internal system error |
keywordBoosting |
Internal server system error |
Invalid request json format |
- | Abnormal JSON format |
Required key is not provided |
- | Mandatory key value defined by the server missing |
No more slot |
- | No available resources on the current server |
ConfigRequest did not complete |
- | Config JSON request processing incomplete when server recognition request was made |
Lifespan expired |
- | gRPC service usage time expired
|
Failed to received request msg |
- | Server failed to properly receive the request message |
Model server is not working |
- | Internal server error |
Internal server error |
- | Internal server error |
RESOURCE_EXHAUSTED |
- | No available gRPC connection resources
|
Response example
The following is a sample response for the config JSON.
Succeeded
The following is a sample response upon a successful call.
{
"uid": "{uid}",
"responseType": [ "config" ],
"config": {
"status": "Success",
"keywordBoosting": {
"status": "Success"
},
"forbidden" : {
"status": "Success"
}
}
Failure
The following is a sample response when the call fails upon entering hobidden in the request.
{
"uid": "{uid}",
"responseType": [ "config" ]
"config": {
"status": "Unknown key: hobidden"
}
}
4. Recognize
After configuring the desired settings via the config JSON, call the speech recognition API using recognize to process and recognize speech data in real time. The NestRequest and authorization metadata in the code generated by the protoc will call the speech recognition API through the recognize method of the stub.
Request body
The following describes the request body for Recognize.
| Field | Type | Required | Description |
|---|---|---|---|
epFlag |
Boolean | Optional | Buffer and result return timing upon pause or last recognition request
|
seqId |
Integer | Conditional | Recognition request ID
|
Response body
The following describes the response body for Recognize.
| Field | Type | Required | Description |
|---|---|---|---|
uid |
String | - | UID |
responseType |
Array | - | Response type
|
config |
Object | - | Config JSON field information |
config.text |
String | - | Recognition result text |
config.position |
Integer | - | The position of the text received as text in the entire text |
config.periodPositions |
Array<Integer> | - | The position of . (punctuation) in the entire text
|
config.periodAlignIndices |
Array<Integer> | - | The index alignInfos information of .
|
config.epFlag |
Boolean | Optional | Whether to include recognition results for the audio sent with epFlag set to true in the request
|
config.seqId |
Integer | - | Whether it is the last recognition request
|
config.epdType |
String | - | EPD criteria for generating recognition results
|
config.startTimestamp |
Integer | - | Recognition result start time (ms) |
config.endTimestamp |
Integer | - | Recognition result end time (ms) |
config.confidence |
Float | - | Recognition result confidence
|
config.alignInfos |
Array | - | Align information for syllables in the recognition result: aligninfos |
recognize |
Object | - | Recognize information |
recognize.status |
String | - | Recognize status
|
recognize.epFlag |
Object | - | epFlag information |
recognize.epFlag.status |
String | - | epFlag status
|
recognize.seqId |
Object | - | seqId information |
recognize.seqId.status |
String | - | seqId status
|
An example of constructing full text using text and position is as follows:
| Received order | Recognition result | full text |
|---|---|---|
| 1 | {text: "ABC", position: 0, ...} | "ABC" |
| 2 | {text: "DEFG", position: 3, ...} | "ABCDEFG" |
alignInfos
The following describes alignInfos.
| Field | Type | Required | Description |
|---|---|---|---|
word |
String | - | Composition syllables |
start |
Integer | - | Composition syllable start time (ms) |
end |
Integer | - | Composition syllable end time (ms) |
confidence |
Float | - | Composition syllable confidence
|
Error message
The following describes the error messages displayed when a Recognize request fails.
| Error message | Related field | Description |
|---|---|---|
Invalid Type |
recognize.status |
epFlag or seqId type does not match predefined type. |
Required key is not provided |
recognize.status |
epFlag value in extraContents not provided |
Invalid request json format |
recognize.status |
extraContents is not in JSON format. |
Unknown key |
recognize.status |
A key not defined in the protocol written in extraContents |
ConfigRequest is already called |
recognize.status |
Duplicate config request to server |
Lifespan expired |
recognize.status |
gRPC service usage time expired
|
Failed to received request msg |
recognize.status |
Server failed to properly receive the request message |
Model server is not working |
recognize.status |
Internal server system error |
Internal server error |
recognize.status |
Internal server system error |
Failed to translation: ${message} |
- | Translation feature-related error |
Invalid format |
recognize.status |
The transmitted audio format is invalid. |
Not found |
epFlag.status |
epFlag value not entered |
Invalid type |
epFlag.status, seqId.status |
Predefined type mismatch |
Invalid format |
audio.status |
Predefined type in the audio field mismatch |
Response example
The response example is as follows:
Succeeded
The following is a sample response upon a successful call.
- When the response is successful and
"responseType": [ "transcription" ]
{
"uid": "{uid}"
"responseType": [ "transcription" ]
"transcription": {
"text": "This is text.",
"position": 0,
"periodPositions": [3],
"periodAlignIndices": [3],
"epFlag": false,
"seqId": 0,
"epdType": "durationThreshold",
"startTimestamp": 190,
"endTimestamp": 840,
"confidence": 0.997389124199423,
"alignInfos": [
{"word":"This","start":190,"end":340,"confidence":0.9988637124943075},
{"word":"is","start":341,"end":447,"confidence":0.9990018488549978},
{"word":"text","start":448,"end":580,"confidence":0.9912501264550316},
{"word":".","start":581,"end":700,"confidence":0.9994397226648595},
{"word":" ","start":701,"end":840,"confidence":0.9984142043105126}
]
}
}
Failure
The following is a sample response upon a failed call.
{
"uid": string, # required
"responseType": [ "recognize" ], # required
"recognize": { # required
"status": string, # required
"epFlag": { # optional
"status": string
},
"seqId": { # optional
"status": string
},
"audio": { # op
"status":
}
}
}
5. Other API
Get the number of currently active stub calls and the maximum allowed stub count for the domain.
curl --location 'https://clovaspeech-gw.ncloud.com:50051/api/v1/${domainId}/active-calls' \
--header 'Authorization: Bearer ${API_KEY}'
Response
This section describes the response format.
Response body
The response body includes the following data:
| Field | Type | Required | Description |
|---|---|---|---|
activeCalls |
Integer | - | Number of currently active stub calls
|
maxCalls |
Integer | - | Maximum number of stubs allowed within the domain
|
timestamp |
String | - | Data creation time
|
Response example
{
"activeCalls": 3,
"maxCalls": 15,
"timestamp": "2025-04-25T12:09:27.382+09:00"
}