CLOVA Speech real-time streaming API

Article summary

Did you find this summary helpful?

Thank you for your feedback

Available in Classic and VPC

Learn how to recognize and transcribe speech to text in real-time with the CLOVA Speech live streaming API.

Version

Version	Date	Changes
v1.0.0	12.2023	Initial creation
v1.1.0	07.2024	Guide reorganization

API URL

Host	Port
clovaspeech-gw.ncloud.com	`50051`

How to use CLOVA Speech gRPC

The CLOVA Speech live streaming API is only accessible via gRPC.
All feature configuration requests and responses provided by the system are in JSON format.
Currently, we only support PCM (headerless raw wave) format at 16 kHz, 1 channel, 16 bits per sample.
The following is a guide to installing the protoc compiler on Rocky Linux and the overall initial setup for using the API.

1. Install and prepare protoc compiler

Install protoc compiler.
Select the desired language via the protoc compiler and Call compiler using the "nest.proto" file where the interface of the CLOVA Speech live streaming API is defined.

Connect remotely to the server where you want to install the protoc compiler.
Install the packages and plugins for using gRPC.

Rocky Linux: Python

  # Check the latest status
  sudo dnf update
  
  # Install Python: Install Python on the Linux server.
  sudo dnf install python3

  # Install and upgrade pip: pip is a package installer for Python.
  sudo dnf install python3-pip
  pip3 install --upgrade pip
  
  # Install grpcio-tools: Install "grpcio-tools" using pip.
  pip3 install grpcio-tools

  # Create nest.proto file
  touch nest.proto

  # Compile nest.proto file with protoc compiler
  python3 -m grpc_tools.protoc -I=. --python_out=. --grpc_python_out=. nest.proto

Rocky Linux: Java

# Download protoc-gen-grpc-java plugin (check https://github.com/grpc/grpc-java/releases for version number)
curl -OL https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.36.0/protoc-gen-grpc-java-1.36.0-linux-x86_64.exe

# Add to path
mv protoc-gen-grpc-java-1.36.0-linux-x86_64.exe /usr/local/bin/protoc-gen-grpc-java

# Change to execute permission
chmod +x /usr/local/bin/protoc-gen-grpc-java

# Confirm installation
protoc-gen-grpc-java --version

# Create nest.proto file
touch nest.proto

# Compile nest.proto file with protoc compiler
protoc --proto_path=. --java_out=output/directory --grpc-java_out=output/directory nest.proto

Open the nest.proto file and enter the following code and save it.

syntax = "proto3";
option java_multiple_files = true;
package com.nbp.cdncp.nest.grpc.proto.v1;

enum RequestType {
  CONFIG = 0;
  DATA = 1;
}

message NestConfig {
  string config = 1;
}

message NestData {
  bytes chunk = 1;
  string extra_contents = 2;
}
message NestRequest {
  RequestType type = 1;
  oneof part {
    NestConfig config = 2;
    NestData data = 3;
  }
}

message NestResponse {
  string contents = 1;
}
service NestService {
  rpc recognize(stream NestRequest) returns (stream NestResponse){};
}

2. Authorization

Header name	Description
`Authorization`	`Bearer ${secretKey}`

Once you're done generating grpc code with the protoc compiler, it's time to proceed with authentication.
Set up a gRPC channel and create a stub, which is a client side proxy located in nest_grpc_pb2.
After creating the stub, you need to include metadata with the authentication key to execute the desired function through the recognize method.
- The secretKey of the live streaming API uses the secretKey of the long sentence recognition API, which can be found in Long sentence recognition domain > Run builder > Settings in the NAVER Cloud Platform console.
- The live streaming API is only supported by the Basic long sentence recognition plan (not available on the Free plan).
Rocky Linux: Python

Create a Python file. (Name it "main" for the demo.)

touch main.py

Add the following in the created Python file.

import grpc
import json

import nest_pb2
import nest_pb2_grpc

channel = grpc.secure_channel(
		'clovaspeech-gw.ncloud.com:50051', 
		grpc.ssl_channel_credentials()
)
client = NestServiceStub(channel)
metadata = (("authorization", f"Bearer {secretKey}"),) # Lowercase authorization required, secretkey is verified in the long sentence recognition domain
call = client.YourMethod(YourRequest(), metadata=metadata)

Rocky Linux: Java

Create a Java file. (Name it "main" for the demo.)

touch main.java

Add the following in the created Java file.

ManagedChannel channel = NettyChannelBuilder
			.forTarget("clovaspeech-gw.ncloud.com:50051")
			.useTransportSecurity()
			.build();
NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
Metadata metadata = new Metadata();
metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
             "Bearer ${secretKey}");
client = MetadataUtils.attachHeaders(client, metadata);

3. Request config JSON

For the first call to the streaming API, you must first send the config JSON as shown below.
Use the NestRequest object of nest_pb2 created in protoc to send the config JSON to the streaming endpoint.
There are four fields in the config JSON and all four are not required data, but for clear speech recognition, we recommend setting transcription to the language to request recognition.
- transcription: Set speech recognition language
- keywordBoosting: Set to boost the recognition rate for entered words
- forbidden: Set banned words
- semanticEpd: Set criteria for generating speech recognition results
The example code is shown below.

{
  # Transcription configuration information
  "transcription": {      # optional, top level key
    "language": string
  }
  # Keyword boosting configuration information
  "keywordBoosting": {    # optional, top level key
    "boostings": [
      {
        "words": string,
        "weight": float64
      }
    ]
  },
  # Forbidden configuration information
  "forbidden": {    # optional, top level key
    "forbiddens":  string
  }
}
  # semanticEpd configuration information
  "semanticEpd": {
    "skipEmptyText": bool,
    "useWordEpd": bool,
    "usePeriodEpd": bool,
    "gapThreshold": int,
    "durationThreshold": int,
    "syllableThreshold": int
  }
}

1) Transcription

language (required) This is the language code for speech recognition target. We recommend setting the language for which you want to request recognition for clear speech recognition.
- Korean (ko)
- English (en)
- Japanese (ja)

Transcription JSON Format

# Transcription configuration request JSON format
{
  "transcription": {    
    "language": string        # required key
  }
}

Configuration example

# Transcription configuration request example
{
  "transcription": {
    "language": "ko"
  }
}

2) Keyword Boosting

You can increase the recognition rate for pre-registered keywords.
- The weight of a keyword is a real number between 0 and 5.0.
  - If the weight is 0, then no boosting is performed.
  - All keywords must have the same weight.
    - A collection of words to be set to the same weight can be sent by concatenating them with commas (,) inside the words key string.
Spaces before and after words are also taken into account when performing keyword boosting.
To learn how to configure the feature, see Keyword boosting configuration JSON format.

Keyword Boosting JSON Format

# Keyword boosting configuration request JSON format
{
  "keywordBoosting": {
    "boostings": [
      {
        "words": string,
        "weight": float64
      }
    ]
  }
}

Configuration example

# Keyword boosting configuration request example
{
  "keywordBoosting": {
    "boostings": [
      {
        "words": "test,test1,test2",
        "weight": 1 
      },
      {
        "words": "Test, test 1, test 2",
        "weight": 0.5
      }
    ]
  }
}

3) Forbidden

This is a feature that provides banned word tags to recognition results for pre-registered keywords.
- Banned word tag: <forbidden>banned word</forbidden>
The banned word tags are added only to the value of the text key in the recognition result.
The added banned word tag does not affect the position, periodPosition, alignInfo of the recognition result.
When registering two or more banned words, you can send them by concatenating them with commas (,) inside the forbiddens key string.
Spaces before and after banned words are also taken into account when processing banned words.
To learn how to configure the feature, see Banned word configuration JSON format.

Forbidden JSON Format

# Forbidden configuration request JSON format
{
  "forbidden": {
    "forbiddens":  string
  }
}

Configuration example

{
  "forbidden": {
    "forbiddens":  "Banned word 1, banned word 2" 
  }
}

4) SemanticEpd

These options allow you to set criteria for generating speech recognition results.
Recognition results are generated based on the criteria for each option, and you can set the options to suit the type of utterance.
To learn how to configure the feature, see SemanticEpd JSON format.
The criteria for each option is as follows.
- skipEmptyText
  - This is an option to select whether to send results with no recognition results.
  - The default setting is false, and if set to true, it will not be sent if there are no recognized results.
- useWordEpd
  - This is an option to generate recognition results on a word-by-word basis.
  - The default setting is false.
- usePeriodEpd
  - This is an option to generate recognition results based on punctuation.
  - The default setting is false.
- gapThreshold
  - This is an option to generate recognition results when silences equal to or greater than gapThreshold occur.
  - The default setting is false and the unit is milliseconds.
- durationThreshold
  - This is an option to generate recognition results based on the duration of the recognition result relative to the durationThreshold.
  - The default setting is false and the unit is milliseconds.
- syllableThreshold
  - This is an option to generate recognition results based on the number of syllables.
  - Spaces (" ") and periods (".") are also treated as one syllable.

SemanticEpd JSON Format

# semanticEpd configuration request JSON format
{
  "semanticEpd": {
    "skipEmptyText": bool,
    "useWordEpd": bool,
    "usePeriodEpd": bool,
    "gapThreshold": int,
    "durationThreshold": int,
    "syllableThreshold": int
  }
}

Configuration example

{
  "semanticEpd": {
    "skipEmptyText": false,
    "useWordEpd": true,
    "usePeriodEpd": true,
    "gapThreshold": 500,
    "durationThreshold": 5000,
    "syllableThreshold": 20
  }
}

4. Config JSON response format

{
  "uid": string,                  # required
  "responseType": [ "config" ],   # required
  "config": {                     # required
    "status":string,              # required
    # Depending on the config settings, the following fields may not exist.
    # E.g., If banned words are not set, the "forbidden" key does not exist.
    "keywordBoosting": {          # optional, top level key
      "status":string,
    },
    "forbidden": {                # optional, top level key
      "status":string,
    },
    "semanticEpd": {                # optional, top level key
      "status":string,
    }
  }
}

1) Supplementary explanation

The config JSON response is organized as follows.
- The value of the config.status key in the config JSON response can have the following values.
  - Success
    - If the config JSON request was successful and the desired settings were successfully saved to the gRPC service
  - Failure
    - If the features included in the config JSON request were recognized by the server, but the detailed configuration failed
  - top_level_key:
    - transcription
    - keywordBoosting
    - forbidden
- The config.${top_level_key}.status in the config JSON response can have the following values.
  - Common
    - Unknown key: ${top_level_key}-${unknown_key}
      - If the config JSON request has a sub-level key that the server doesn't support
    - Invalid type: ${top_level_key}-${invalid_type_key}
      - If the config JSON request has a sub-level value type that the server doesn't support
  - transcription
    - Invalid language code: ${invalid_language_code}
      - If language is not a predefined language code in the config JSON request
  - keywordBoosting
    - Internal system error
      - If an issue occurred inside the server
- "${message}"
  - The config JSON request was not recognized correctly, or the config JSON request could not be processed correctly
  - The following are what ${message} can be.
    - Invalid request json format
      - If the config JSON request is not in a normal JSON format
    - Unknown key: ${unknown_key}
      - If the config JSON request has a top-level key that the server doesn't support
    - Invalid type: ${invalid_type_key}
      - If the config JSON request has a top-level value type that the server doesn't support
    - Required key is not provided
      - If the config JSON request doesn't include the required key defined by the server
    - No more slot
      - If there are no resources available on the current server
    - ConfigRequest did not complete
      - If the server receives an recognition request while processing of the config JSON request is incomplete
    - Lifespan expired
      - If the gRPC service usage time has expired
      - The usage time for the gRPC service is set to 100 hours.
    - Failed to received request msg
      - If the server didn't receive the request message successfully
    - Model server is not working
      - If an issue occurred inside the server
    - Internal server error
      - If an issue occurred inside the server

2) Config JSON example

# Config JSON request example
# When requesting to set up keyword boosting && banned words features
{
  "keywordBoosting": {                  
    "boostings": [
      {
        "words": "test,test1,test2",
        "weight": 1
      },
      {
        "words": "Test, test 1, test 2",
        "weight": 0.5
      }
    ]
  },
  "forbidden": {
    "forbiddens":  "Banned word 1, banned word 2"
  }
}

# Config JSON response example
# When requesting to set up keyword boosting && banned words feature and it is successful
{
  "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58",
  "responseType": [ "config" ],
  "config": {
    "status": "Success",
    "keywordBoosting": {
      "status": "Success"
    },
    "forbidden" : {
      "status": "Success"
    }
}

# When requesting to set up keyword boosting && banned words feature and it fails 1
# When requesting by setting the banned word request JSON to "forbidden" that does not support top level key 
{
  "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58",
  "responseType": [ "config" ]
  "config": {
    "status": "Unknown key: forbidden"
  }
}

5. Recognize request

Once you've specified the desired configuration values via the config JSON, it's time to proceed with speech recognition.
Call the speech recognition API via recognize, a method in the stub, using the NestRequest in the code generated by protoc and the authentication metadata set above.
The example code is shown below, and you can also call the following optional settings when calling the NestRequest.

Recognize Request JSON Format (ExtraContents)

{
  "epFlag": bool    # optional
  "seqId": int      # optional
}

The following are the JSON formats used in the recognize request.

epFlag
- This is a flag used in the pause or last recognize request during a recognize request.
- In case of pause or last request, epFlag must be set to true to immediately return the recognize request buffer accumulated by the engine to end the recognize and receive the recognize result without delay. (Connection is kept open.)
- If sent without setting, it will be set to false, and if there are no recognition requests until unvoiceTime (10 seconds) has passed, the engine will immediately return the buffer of recognition requests it has accumulated to terminate recognition and receive recognition results without delay. (Connection is kept open.)
- This is an optional field.
seqId
- This is a unique ID for each recognize request you send after connecting to recognize.
- It can be used when making a request with epFlag set to true to check if the recognize results you receive afterward are for this request.
- If it is sent without seqId set, the seqId in the recognition results will be set to 0.
- When using seqId, it is recommended to set and send a non-zero value.
- This is an optional field.
Remarks
- If you don't want to use neither epFlag nor seqId, you can set the request JSON to ""(empty string).

6. Recognize response

When you call the streaming API, you may receive the following response from the server.

Transcription JSON Format

{
  "uid": string
  "responseType": [ "transcription" ]
  "transcription": {
    "text": string,
    "position": int,
    "periodPositions": [ int ],
    "periodAlignIndices": [ int ],
    "epFlag": bool,
    "seqId": int,
    "epdType": string,         // epd criteria for which results were generated
    "startTimestamp": int,
    "endTimestamp": int,
    "confidence": float64,
    "alignInfos": [
      {
        "word": string,        // syllable
        "start": int,          // StartTimestamp in ms 
        "end": int,            // EndTimestamp in ms
        "confidence": float64  // Recognition confidence
      }
    ]
  }
}

Configuration example

{
  "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58"
  "responseType": [ "transcription" ]
  "transcription": {
    "text": "This is text.",
    "position": 0,
    "periodPositions": [3],
    "periodAlignIndices": [3],
    "epFlag": false,
    "seqId": 0,
    "epdType": "durationThreshold",
    "startTimestamp": 190,
    "endTimestamp": 840,
    "confidence": 0.997389124199423,
    "alignInfos": [
      {"word":"This","start":190,"end":340,"confidence":0.9988637124943075}, 
      {"word":"is","start":341,"end":447,"confidence":0.9990018488549978},
      {"word":"text","start":448,"end":580,"confidence":0.9912501264550316},
      {"word":".","start":581,"end":700,"confidence":0.9994397226648595},
      {"word":" ","start":701,"end":840,"confidence":0.9984142043105126}
    ]
  }
}

Failed Recognize Response JSON Format

{
  "uid": string,                     # required
  "responseType": [ "recognize" ],    # required
  "recognize": {                     # required
    "status": string,                # required
    "epFlag": {                      # optional
      "status": string
    },
    "seqId": {                       # optional
      "status": string
    },
    "audio": {                       # optional
      "status": string
    }
  }
}

This is the JSON format you receive in response if the recognition request fails, or if the recognition request cannot be processed normally.

The recognize.status is a key that indicates why the recognize request failed or could not be processed, and has the following values.
Invalid Type
- If the epFlag or seqId value type does not match the predefined type
The recognize.status is a key that indicates why the recognize request failed or could not be processed, and has the following values.
- Required key is not provided
  - If the value of epFlag, a required key of extraContents, is not passed
- Invalid request json format
  - If extraContents is not in JSON format
- Unknown key
  - If a key that does not exist in the protocol specification is written in extraContents
  - If an invalid key is entered, the invalid key information is appended to the status for user convenience.
```
# invalid extraContents
{
  "test1": "test-val1",
  "test2": "test-val2"
}
# response msg
{
  "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58"
  "responseType": ["recognize"],
  "recognize": {
    "status": "Unknown key: test1, test2"
  }
}
```
- ConfigRequest is already called
  - If the server receives a config request again while it has finished processing the config request
- Lifespan expired
  - If the gRPC service usage time has expired
  - The usage time for the gRPC service is set to 100 hours.
- Failed to received request msg
  - If the server didn't receive the request message successfully
- Model server is not working
  - If an issue occurred inside the server
- Internal server error
  - If an issue occurred inside the server
- Invalid format
  - If the audio format sent is an invalid format
- Failure
  - If extraContents in the recognize request JSON is an invalid format
  - A detailed failure reason is displayed in either epFlag.status or seqId.status.
The epFlag.status displays the reason why the epFlag input failed, and has the following values.
- Not found
  - If the required key, epFlag, has not been created
- Invalid type
  - If the value type does not match a predefined type
- The seqId.status displays the reason why the seqId input failed, and has the following values.
  - Invalid type
    - If the value type does not match a predefined type
The audio.status displays the reason why the audio data processing failed, and has the following values.
- Invalid format
  - If the audio format does not match a predefined format
Remarks
- epFlag, seqId, and audio key may be omitted depending on the value of recongnize.status.

Analysis method

text
- This is a key responsible for the recognition result.
position
- This is a key responsible for the offset of the text passed in as text for the full text passed in.
- How to construct full text using text and position
  Received order Recognition result full text
  1 {text: "ABC", position: 0, ...}
  2 {text: "DEFG", position: 3, ...}
periodPositions
- This is a key that is responsible for offsetting the . (punctuation) from the text for the full text passed in.
- If text contains no punctuation, it will be passed as an empty list.
periodAlignIndices
- This is a key responsible for the index information in the alignInfos of the . (punctuation) passed in text.
- If text contains no punctuation, it will be passed as an empty list.
epFlag
- This is a key indicating whether the request sets epFlag to True and includes recognition results for the sent sound source.
seqId
- This is a key indicating the seqId of the last request that contains this recognition result.
- If the value of the epFlag key is false, 0 is returned.
- If the value of the epFlag key is true, the seqId of the last recognition request processed is returned.
epdType
- This is a key responsible for the epd criteria used to generate the recognition result.
- Values of epdType based on the epd criteria
  - If the recognition result was generated with the silence criteria, the gap
  - If the recognition result was generated with the last audio chunk included, endPoint
  - If the recognition result was generated with the time criteria, the durationThreshold
  - If the recognition result was generated with the punctuation criteria, the period
  - If the recognition result was generated with the syllable count criteria, the syllableThreshold
  - If the recognition result was generated by performing unvoiceTime (server setting), unvoice
startTimestamp, endTimestamp
- Timestamp information of the recognition result.
- The unit is ms (milliseconds).
confidence
- This is a key indicating the confidence in the recognition result (text).
- It is calculated as the geometric mean of all syllable confidence (alignInfos.confidence) values contained in the recognition result.
alignInfos
- This is a key responsible for the align information of each syllable that makes up text.
- If text contains no recognized syllables, the align information for ""(empty text) is passed in addition.
- Align information description
  - word
    - This is a key responsible for the syllable information.
  - start
    - This is a key responsible for the syllable start timestamp.
    - The unit is ms.
  - end
    - This is a key responsible for the syllable end timestamp.
    - The unit is ms.
  - confidence
    - This is a key indicating the confidence in the result that syllables were recognized.
    - It has a value between 0 and 1.

Received order	Recognition result	full text
	1	{text: "ABC", position: 0, ...}
	2	{text: "DEFG", position: 3, ...}

Demo [Python]

import grpc
import json

import nest_pb2
import nest_pb2_grpc

AUDIO_PATH = "path/to/audio/file"          #Enter the path to the audio file to be recognized. (PCM (headerless raw wave) format at 16 kHz, 1 channel, 16 bits per sample)
CLIENT_SECRET = "Long sentence recognition secretKey"

def generate_requests(audio_path):
    # Initial setup request: set up speech recognition
    yield nest_pb2.NestRequest(
        type=nest_pb2.RequestType.CONFIG,
        config=nest_pb2.NestConfig(
            config=json.dumps({"transcription": {"language": "ko"}})
        )
    )

    # Open an audio file and read 32,000 bytes at a time
    with open(audio_path, "rb") as audio_file:
        while True:
            chunk = audio_file.read(32000)  # Read chunks of an audio file
            if not chunk:
                break  # Exit the loop when there is no more data
            yield nest_pb2.NestRequest(
                type=nest_pb2.RequestType.DATA,
                data=nest_pb2.NestData(
                    chunk=chunk,
                    extra_contents=json.dumps({"seqId": 0, "epFlag": False})
                )
            )

def main():
    # Set up a secure gRPC channel to the CLOVA Speech server
    channel = grpc.secure_channel(
        "clovaspeech-gw.ncloud.com:50051", 
        grpc.ssl_channel_credentials()
    )
    stub = nest_pb2_grpc.NestServiceStub(channel)  # Create a stub for NestService
    metadata = (("authorization", f"Bearer {CLIENT_SECRET}"),)  # Set up metadata with authentication tokens
    responses = stub.recognize(generate_requests(AUDIO_PATH), metadata=metadata)  # Call the recognize method with the generated request

    try:
        # Process responses from the server repeatedly
        for response in responses:
            print("Received response: " + response.contents)
    except grpc.RpcError as e:
        # Handle gRPC errors
        print(f"Error: {e.details()}")
    finally:
        channel.close()  # Close the channel when finished

if __name__ == "__main__":
    main()

Demo [Java]

Project Structure

├───pom.xml
│   │
└───src
│   ├───main
│   │   ├───java
│   │   │   └───com
│   │   │       └───example
│   │   │           └───grpc
│   │   │                   GRpcClient.java
│   │   │
│   │   ├───proto
│   │   │       nest.proto

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>clova-speech-grpc</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <java.version>1.8</java.version>
        <maven.compiler.source>${java.version}</maven.compiler.source>
        <maven.compiler.target>${java.version}</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <netty.version>4.1.52.Final</netty.version>
        <grpc.version>1.35.0</grpc.version>
        <protoc.version>3.14.0</protoc.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-netty</artifactId>
            <version>${grpc.version}</version>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-netty-shaded</artifactId>
            <version>${grpc.version}</version>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-protobuf</artifactId>
            <version>${grpc.version}</version>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-stub</artifactId>
            <version>${grpc.version}</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
            <version>1.18.12</version>
        </dependency>
    </dependencies>

    <build>
        <extensions>
            <extension>
                <groupId>kr.motd.maven</groupId>
                <artifactId>os-maven-plugin</artifactId>
                <version>1.6.1</version>
            </extension>
        </extensions>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <executions>
                    <execution>
                        <id>compile</id>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>testCompile</id>
                        <phase>test-compile</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <showDeprecation>true</showDeprecation>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.xolstice.maven.plugins</groupId>
                <artifactId>protobuf-maven-plugin</artifactId>
                <version>0.6.1</version>
                <configuration>
                    <protocArtifact>
                        com.google.protobuf:protoc:${protoc.version}:exe:${os.detected.classifier}
                    </protocArtifact>
                    <pluginId>grpc-java</pluginId>
                    <pluginArtifact>
                        io.grpc:protoc-gen-grpc-java:${grpc.version}:exe:${os.detected.classifier}
                    </pluginArtifact>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>compile-custom</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

grpcClient.java

package com.example.grpc;

import java.io.FileInputStream;
import java.util.concurrent.CountDownLatch;

import com.google.protobuf.ByteString;
import com.nbp.cdncp.nest.grpc.proto.v1.NestConfig;
import com.nbp.cdncp.nest.grpc.proto.v1.NestData;
import com.nbp.cdncp.nest.grpc.proto.v1.NestRequest;
import com.nbp.cdncp.nest.grpc.proto.v1.NestResponse;
import com.nbp.cdncp.nest.grpc.proto.v1.NestServiceGrpc;
import com.nbp.cdncp.nest.grpc.proto.v1.RequestType;
import io.grpc.ManagedChannel;
import io.grpc.Metadata;
import io.grpc.StatusRuntimeException;
import io.grpc.netty.NettyChannelBuilder;
import io.grpc.stub.MetadataUtils;
import io.grpc.stub.StreamObserver;

public class GRpcClient {
	public static void main(String[] args) throws Exception {

		CountDownLatch latch = new CountDownLatch(1);
		ManagedChannel channel = NettyChannelBuilder
			.forTarget("clovaspeech-gw.ncloud.com:50051")
			.useTransportSecurity()
			.build();
		NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
		Metadata metadata = new Metadata();
		metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
			"Bearer ${secretKey}");
		client = MetadataUtils.attachHeaders(client, metadata);

		StreamObserver<NestResponse> responseObserver = new StreamObserver<NestResponse>() {
			@Override
			public void onNext(NestResponse response) {
				System.out.println("Received response: " + response.getContents());
			}

			@Override
			public void onError(Throwable t) {
				if(t instanceof StatusRuntimeException) {
					StatusRuntimeException error = (StatusRuntimeException)t;
					System.out.println(error.getStatus().getDescription());
				}
				latch.countDown();
			}

			@Override
			public void onCompleted() {
				System.out.println("completed");
				latch.countDown();
			}
		};

		StreamObserver<NestRequest> requestObserver = client.recognize(responseObserver);

		requestObserver.onNext(NestRequest.newBuilder()
			.setType(RequestType.CONFIG)
			.setConfig(NestConfig.newBuilder()
				.setConfig("{\"transcription\":{\"language\":\"ko\"}}")
				.build())
			.build());

		java.io.File file = new java.io.File("~/media/42s.wav");
		byte[] buffer = new byte[32000];
		int bytesRead;
		FileInputStream inputStream = new FileInputStream(file);
		while ((bytesRead = inputStream.read(buffer)) != -1) {
			requestObserver.onNext(NestRequest.newBuilder()
				.setType(RequestType.DATA)
				.setData(NestData.newBuilder()
					.setChunk(ByteString.copyFrom(buffer, 0, bytesRead))
					.setExtraContents("{ \"seqId\": 0, \"epFlag\": false}")
					.build())
				.build());
		}
		requestObserver.onCompleted();
		latch.await();
		channel.shutdown();
	}

}

FAQs

How can I utilize the epFlag, seqId items in the extraContents field of the Recognize API?
- You can use them for pausing purposes, or to check if you have received a complete response to a request you sent.
Does the gRPC service support pausing?
- The gRPC service does not support pausing, but it can be implemented in the Recognize API by setting the epFlag entry in the extraContents field to true, sending a request, and then not making a Recognize request for a period of time. See the description of the epFlag entry.
- If you request Recognize without setting epFlag to true and do not re-request for a certain period of time, the server processes the buffered Recognize request based on the unvoiceTime (10 seconds) set internally and displays the response result.
What sound data formats are supported by the Recognize API?
- Currently, we only support PCM (headerless raw wave) format at 16 kHz, 1 channel, 16 bits per sample.
Is it mandatory to set the epFlag item to true in the Recognize API's extraContents before calling the Close API?
- It is not mandatory to set the epFlag item to true. However, it is recommended to set the epFlag item to true if you want to receive a fast response result for the last Recognize request. See the description of the epFlag entry.
How do I know if I have received all the responses to the requests I sent?
- When calling the Recognize API, you can leverage the epFlag and seqId items in the extraContents field. The result of a Recognize request with epFlag set to true and seqId set to any non-zero value can be verified by comparing epFlag and seqId in the Recognize response. See the response JSON format of Recognize response.
Is there a connection lifetime limit for the gRPC service?
- The gRPC service has a connection lifetime limit of 100 hours, but it may be disconnected due to network problems, so it is recommended to reflect retry logic for stable service use.

Was this article helpful?

What's Next

Long sentence recognition

Table of contents

Version
API URL
How to use CLOVA Speech gRPC
Demo [Python]
Demo [Java]
FAQs