- Print
- PDF
CLOVA Speech real-time streaming API
- Print
- PDF
Available in Classic and VPC
Learn how to recognize and transcribe speech to text in real-time with the CLOVA Speech live streaming API.
Version
Version | Date | Changes |
---|---|---|
v1.0.0 | 12.2023 | Initial creation |
v1.1.0 | 07.2024 | Guide reorganization |
API URL
Host | Port |
---|---|
clovaspeech-gw.ncloud.com | 50051 |
How to use CLOVA Speech gRPC
- The CLOVA Speech live streaming API is only accessible via gRPC.
- All feature configuration requests and responses provided by the system are in JSON format.
- Currently, we only support PCM (headerless raw wave) format at 16 kHz, 1 channel, 16 bits per sample.
- The following is a guide to installing the protoc compiler on Rocky Linux and the overall initial setup for using the API.
1. Install and prepare protoc compiler
- Install protoc compiler.
- Select the desired language via the protoc compiler and Call compiler using the "nest.proto" file where the interface of the CLOVA Speech live streaming API is defined.
- Connect remotely to the server where you want to install the protoc compiler.
- Install the packages and plugins for using gRPC.
- Rocky Linux: Python
# Check the latest status
sudo dnf update
# Install Python: Install Python on the Linux server.
sudo dnf install python3
# Install and upgrade pip: pip is a package installer for Python.
sudo dnf install python3-pip
pip3 install --upgrade pip
# Install grpcio-tools: Install "grpcio-tools" using pip.
pip3 install grpcio-tools
# Create nest.proto file
touch nest.proto
# Compile nest.proto file with protoc compiler
python3 -m grpc_tools.protoc -I=. --python_out=. --grpc_python_out=. nest.proto
- Rocky Linux: Java
# Download protoc-gen-grpc-java plugin (check https://github.com/grpc/grpc-java/releases for version number)
curl -OL https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.36.0/protoc-gen-grpc-java-1.36.0-linux-x86_64.exe
# Add to path
mv protoc-gen-grpc-java-1.36.0-linux-x86_64.exe /usr/local/bin/protoc-gen-grpc-java
# Change to execute permission
chmod +x /usr/local/bin/protoc-gen-grpc-java
# Confirm installation
protoc-gen-grpc-java --version
# Create nest.proto file
touch nest.proto
# Compile nest.proto file with protoc compiler
protoc --proto_path=. --java_out=output/directory --grpc-java_out=output/directory nest.proto
- Open the nest.proto file and enter the following code and save it.
syntax = "proto3";
option java_multiple_files = true;
package com.nbp.cdncp.nest.grpc.proto.v1;
enum RequestType {
CONFIG = 0;
DATA = 1;
}
message NestConfig {
string config = 1;
}
message NestData {
bytes chunk = 1;
string extra_contents = 2;
}
message NestRequest {
RequestType type = 1;
oneof part {
NestConfig config = 2;
NestData data = 3;
}
}
message NestResponse {
string contents = 1;
}
service NestService {
rpc recognize(stream NestRequest) returns (stream NestResponse){};
}
2. Authorization
Header name | Description |
---|---|
Authorization | Bearer ${secretKey} |
Once you're done generating grpc code with the protoc compiler, it's time to proceed with authentication.
Set up a gRPC channel and create a stub, which is a client side proxy located in nest_grpc_pb2.
After creating the stub, you need to include metadata with the authentication key to execute the desired function through the recognize method.
- The secretKey of the live streaming API uses the secretKey of the long sentence recognition API, which can be found in Long sentence recognition domain > Run builder > Settings in the NAVER Cloud Platform console.
- The live streaming API is only supported by the Basic long sentence recognition plan (not available on the Free plan).
Rocky Linux: Python
- Create a Python file. (Name it "main" for the demo.)
touch main.py
- Add the following in the created Python file.
import grpc
import json
import nest_pb2
import nest_pb2_grpc
channel = grpc.secure_channel(
'clovaspeech-gw.ncloud.com:50051',
grpc.ssl_channel_credentials()
)
client = NestServiceStub(channel)
metadata = (("authorization", f"Bearer {secretKey}"),) # Lowercase authorization required, secretkey is verified in the long sentence recognition domain
call = client.YourMethod(YourRequest(), metadata=metadata)
- Rocky Linux: Java
- Create a Java file. (Name it "main" for the demo.)
touch main.java
- Add the following in the created Java file.
ManagedChannel channel = NettyChannelBuilder
.forTarget("clovaspeech-gw.ncloud.com:50051")
.useTransportSecurity()
.build();
NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
Metadata metadata = new Metadata();
metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
"Bearer ${secretKey}");
client = MetadataUtils.attachHeaders(client, metadata);
3. Request config JSON
- For the first call to the streaming API, you must first send the config JSON as shown below.
- Use the NestRequest object of nest_pb2 created in protoc to send the config JSON to the streaming endpoint.
- There are four fields in the config JSON and all four are not required data, but for clear speech recognition, we recommend setting transcription to the language to request recognition.
- transcription: Set speech recognition language
- keywordBoosting: Set to boost the recognition rate for entered words
- forbidden: Set banned words
- semanticEpd: Set criteria for generating speech recognition results
- The example code is shown below.
{
# Transcription configuration information
"transcription": { # optional, top level key
"language": string
}
# Keyword boosting configuration information
"keywordBoosting": { # optional, top level key
"boostings": [
{
"words": string,
"weight": float64
}
]
},
# Forbidden configuration information
"forbidden": { # optional, top level key
"forbiddens": string
}
}
# semanticEpd configuration information
"semanticEpd": {
"skipEmptyText": bool,
"useWordEpd": bool,
"usePeriodEpd": bool,
"gapThreshold": int,
"durationThreshold": int,
"syllableThreshold": int
}
}
1) Transcription
language
(required) This is the language code for speech recognition target. We recommend setting the language for which you want to request recognition for clear speech recognition.- Korean (ko)
- English (en)
- Japanese (ja)
Transcription JSON Format
# Transcription configuration request JSON format
{
"transcription": {
"language": string # required key
}
}
- Configuration example
# Transcription configuration request example
{
"transcription": {
"language": "ko"
}
}
2) Keyword Boosting
- You can increase the recognition rate for pre-registered keywords.
- The weight of a keyword is a real number between 0 and 5.0.
- If the weight is 0, then no boosting is performed.
- All keywords must have the same weight.
- A collection of words to be set to the same weight can be sent by concatenating them with commas (
,
) inside thewords
key string.
- A collection of words to be set to the same weight can be sent by concatenating them with commas (
- The weight of a keyword is a real number between 0 and 5.0.
- Spaces before and after words are also taken into account when performing keyword boosting.
- To learn how to configure the feature, see Keyword boosting configuration JSON format.
Keyword Boosting JSON Format
# Keyword boosting configuration request JSON format
{
"keywordBoosting": {
"boostings": [
{
"words": string,
"weight": float64
}
]
}
}
- Configuration example
# Keyword boosting configuration request example
{
"keywordBoosting": {
"boostings": [
{
"words": "test,test1,test2",
"weight": 1
},
{
"words": "Test, test 1, test 2",
"weight": 0.5
}
]
}
}
3) Forbidden
- This is a feature that provides banned word tags to recognition results for pre-registered keywords.
- Banned word tag:
<forbidden>banned word</forbidden>
- Banned word tag:
- The banned word tags are added only to the value of the
text
key in the recognition result. - The added banned word tag does not affect the
position
,periodPosition
,alignInfo
of the recognition result. - When registering two or more banned words, you can send them by concatenating them with commas (
,
) inside theforbiddens
key string. - Spaces before and after banned words are also taken into account when processing banned words.
- To learn how to configure the feature, see Banned word configuration JSON format.
Forbidden JSON Format
# Forbidden configuration request JSON format
{
"forbidden": {
"forbiddens": string
}
}
- Configuration example
{
"forbidden": {
"forbiddens": "Banned word 1, banned word 2"
}
}
4) SemanticEpd
- These options allow you to set criteria for generating speech recognition results.
- Recognition results are generated based on the criteria for each option, and you can set the options to suit the type of utterance.
- To learn how to configure the feature, see SemanticEpd JSON format.
- The criteria for each option is as follows.
skipEmptyText
- This is an option to select whether to send results with no recognition results.
- The default setting is
false
, and if set totrue
, it will not be sent if there are no recognized results.
useWordEpd
- This is an option to generate recognition results on a word-by-word basis.
- The default setting is
false
.
usePeriodEpd
- This is an option to generate recognition results based on punctuation.
- The default setting is
false
.
gapThreshold
- This is an option to generate recognition results when silences equal to or greater than
gapThreshold
occur. - The default setting is
false
and the unit is milliseconds.
- This is an option to generate recognition results when silences equal to or greater than
durationThreshold
- This is an option to generate recognition results based on the duration of the recognition result relative to the
durationThreshold
. - The default setting is
false
and the unit is milliseconds.
- This is an option to generate recognition results based on the duration of the recognition result relative to the
syllableThreshold
- This is an option to generate recognition results based on the number of syllables.
- Spaces (" ") and periods (".") are also treated as one syllable.
SemanticEpd JSON Format
# semanticEpd configuration request JSON format
{
"semanticEpd": {
"skipEmptyText": bool,
"useWordEpd": bool,
"usePeriodEpd": bool,
"gapThreshold": int,
"durationThreshold": int,
"syllableThreshold": int
}
}
- Configuration example
{
"semanticEpd": {
"skipEmptyText": false,
"useWordEpd": true,
"usePeriodEpd": true,
"gapThreshold": 500,
"durationThreshold": 5000,
"syllableThreshold": 20
}
}
4. Config JSON response format
{
"uid": string, # required
"responseType": [ "config" ], # required
"config": { # required
"status":string, # required
# Depending on the config settings, the following fields may not exist.
# E.g., If banned words are not set, the "forbidden" key does not exist.
"keywordBoosting": { # optional, top level key
"status":string,
},
"forbidden": { # optional, top level key
"status":string,
},
"semanticEpd": { # optional, top level key
"status":string,
}
}
}
1) Supplementary explanation
- The config JSON response is organized as follows.
- The value of the
config.status
key in the config JSON response can have the following values.Success
- If the config JSON request was successful and the desired settings were successfully saved to the gRPC service
Failure
- If the features included in the config JSON request were recognized by the server, but the detailed configuration failed
top_level_key
:transcription
keywordBoosting
forbidden
- The
config.${top_level_key}.status
in the config JSON response can have the following values.- Common
Unknown key: ${top_level_key}-${unknown_key}
- If the config JSON request has a sub-level key that the server doesn't support
Invalid type: ${top_level_key}-${invalid_type_key}
- If the config JSON request has a sub-level value type that the server doesn't support
- transcription
Invalid language code: ${invalid_language_code}
- If language is not a predefined language code in the config JSON request
- keywordBoosting
Internal system error
- If an issue occurred inside the server
- Common
"${message}"
- The config JSON request was not recognized correctly, or the config JSON request could not be processed correctly
- The following are what ${message} can be.
Invalid request json format
- If the config JSON request is not in a normal JSON format
Unknown key: ${unknown_key}
- If the config JSON request has a top-level key that the server doesn't support
Invalid type: ${invalid_type_key}
- If the config JSON request has a top-level value type that the server doesn't support
Required key is not provided
- If the config JSON request doesn't include the required key defined by the server
No more slot
- If there are no resources available on the current server
ConfigRequest did not complete
- If the server receives an recognition request while processing of the config JSON request is incomplete
Lifespan expired
- If the gRPC service usage time has expired
- The usage time for the gRPC service is set to 100 hours.
Failed to received request msg
- If the server didn't receive the request message successfully
Model server is not working
- If an issue occurred inside the server
Internal server error
- If an issue occurred inside the server
- The value of the
2) Config JSON example
# Config JSON request example
# When requesting to set up keyword boosting && banned words features
{
"keywordBoosting": {
"boostings": [
{
"words": "test,test1,test2",
"weight": 1
},
{
"words": "Test, test 1, test 2",
"weight": 0.5
}
]
},
"forbidden": {
"forbiddens": "Banned word 1, banned word 2"
}
}
# Config JSON response example
# When requesting to set up keyword boosting && banned words feature and it is successful
{
"uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58",
"responseType": [ "config" ],
"config": {
"status": "Success",
"keywordBoosting": {
"status": "Success"
},
"forbidden" : {
"status": "Success"
}
}
# When requesting to set up keyword boosting && banned words feature and it fails 1
# When requesting by setting the banned word request JSON to "forbidden" that does not support top level key
{
"uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58",
"responseType": [ "config" ]
"config": {
"status": "Unknown key: forbidden"
}
}
5. Recognize request
- Once you've specified the desired configuration values via the config JSON, it's time to proceed with speech recognition.
- Call the speech recognition API via recognize, a method in the stub, using the NestRequest in the code generated by protoc and the authentication metadata set above.
- The example code is shown below, and you can also call the following optional settings when calling the NestRequest.
Recognize Request JSON Format (ExtraContents)
{
"epFlag": bool # optional
"seqId": int # optional
}
The following are the JSON formats used in the recognize request.
epFlag
- This is a flag used in the pause or last recognize request during a recognize request.
- In case of pause or last request,
epFlag
must be set totrue
to immediately return the recognize request buffer accumulated by the engine to end the recognize and receive the recognize result without delay. (Connection is kept open.) - If sent without setting, it will be set to
false
, and if there are no recognition requests untilunvoiceTime
(10 seconds) has passed, the engine will immediately return the buffer of recognition requests it has accumulated to terminate recognition and receive recognition results without delay. (Connection is kept open.) - This is an optional field.
seqId
- This is a unique ID for each recognize request you send after connecting to recognize.
- It can be used when making a request with
epFlag
set totrue
to check if the recognize results you receive afterward are for this request. - If it is sent without
seqId
set, theseqId
in the recognition results will be set to 0. - When using
seqId
, it is recommended to set and send a non-zero value. - This is an optional field.
Remarks
- If you don't want to use neither
epFlag
norseqId
, you can set the request JSON to""(empty string)
.
- If you don't want to use neither
6. Recognize response
- When you call the streaming API, you may receive the following response from the server.
Transcription JSON Format
{
"uid": string
"responseType": [ "transcription" ]
"transcription": {
"text": string,
"position": int,
"periodPositions": [ int ],
"periodAlignIndices": [ int ],
"epFlag": bool,
"seqId": int,
"epdType": string, // epd criteria for which results were generated
"startTimestamp": int,
"endTimestamp": int,
"confidence": float64,
"alignInfos": [
{
"word": string, // syllable
"start": int, // StartTimestamp in ms
"end": int, // EndTimestamp in ms
"confidence": float64 // Recognition confidence
}
]
}
}
- Configuration example
{
"uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58"
"responseType": [ "transcription" ]
"transcription": {
"text": "This is text.",
"position": 0,
"periodPositions": [3],
"periodAlignIndices": [3],
"epFlag": false,
"seqId": 0,
"epdType": "durationThreshold",
"startTimestamp": 190,
"endTimestamp": 840,
"confidence": 0.997389124199423,
"alignInfos": [
{"word":"This","start":190,"end":340,"confidence":0.9988637124943075},
{"word":"is","start":341,"end":447,"confidence":0.9990018488549978},
{"word":"text","start":448,"end":580,"confidence":0.9912501264550316},
{"word":".","start":581,"end":700,"confidence":0.9994397226648595},
{"word":" ","start":701,"end":840,"confidence":0.9984142043105126}
]
}
}
Failed Recognize Response JSON Format
{
"uid": string, # required
"responseType": [ "recognize" ], # required
"recognize": { # required
"status": string, # required
"epFlag": { # optional
"status": string
},
"seqId": { # optional
"status": string
},
"audio": { # optional
"status": string
}
}
}
This is the JSON format you receive in response if the recognition request fails, or if the recognition request cannot be processed normally.
- The
recognize.status
is a key that indicates why the recognize request failed or could not be processed, and has the following values. Invalid Type
- If the
epFlag
orseqId
value type does not match the predefined type
- If the
- The
recognize.status
is a key that indicates why the recognize request failed or could not be processed, and has the following values.Required key is not provided
- If the value of
epFlag
, a required key ofextraContents
, is not passed
- If the value of
Invalid request json format
- If
extraContents
is not in JSON format
- If
Unknown key
If a key that does not exist in the protocol specification is written in
extraContents
If an invalid key is entered, the invalid key information is appended to the status for user convenience.
# invalid extraContents { "test1": "test-val1", "test2": "test-val2" } # response msg { "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58" "responseType": ["recognize"], "recognize": { "status": "Unknown key: test1, test2" } }
ConfigRequest is already called
- If the server receives a config request again while it has finished processing the config request
Lifespan expired
- If the gRPC service usage time has expired
- The usage time for the gRPC service is set to 100 hours.
Failed to received request msg
- If the server didn't receive the request message successfully
Model server is not working
- If an issue occurred inside the server
Internal server error
- If an issue occurred inside the server
Invalid format
- If the audio format sent is an invalid format
Failure
- If
extraContents
in the recognize request JSON is an invalid format - A detailed failure reason is displayed in either
epFlag.status
orseqId.status
.
- If
- The
epFlag.status
displays the reason why the epFlag input failed, and has the following values.Not found
- If the required key, epFlag, has not been created
Invalid type
- If the value type does not match a predefined type
- The
seqId.status
displays the reason why the seqId input failed, and has the following values.Invalid type
- If the value type does not match a predefined type
- The
audio.status
displays the reason why the audio data processing failed, and has the following values.Invalid format
- If the audio format does not match a predefined format
- Remarks
epFlag
,seqId
, andaudio
key may be omitted depending on the value ofrecongnize.status
.
Analysis method
text
- This is a key responsible for the recognition result.
position
This is a key responsible for the offset of the text passed in as
text
for the full text passed in.How to construct full text using
text
andposition
Received order Recognition result full text 1 {text: "ABC", position: 0, ...} 2 {text: "DEFG", position: 3, ...}
periodPositions
- This is a key that is responsible for offsetting the
.
(punctuation) from thetext
for the full text passed in. - If
text
contains no punctuation, it will be passed as an empty list.
- This is a key that is responsible for offsetting the
periodAlignIndices
- This is a key responsible for the index information in the
alignInfos
of the.
(punctuation) passed intext
. - If
text
contains no punctuation, it will be passed as an empty list.
- This is a key responsible for the index information in the
epFlag
- This is a key indicating whether the request sets epFlag to True and includes recognition results for the sent sound source.
seqId
- This is a key indicating the
seqId
of the last request that contains this recognition result. - If the value of the
epFlag
key is false, 0 is returned. - If the value of the
epFlag
key is true, the seqId of the last recognition request processed is returned.
- This is a key indicating the
epdType
- This is a key responsible for the epd criteria used to generate the recognition result.
- Values of
epdType
based on the epd criteria- If the recognition result was generated with the
silence
criteria, thegap
- If the recognition result was generated with the
last audio chunk
included,endPoint
- If the recognition result was generated with the
time
criteria, thedurationThreshold
- If the recognition result was generated with the
punctuation
criteria, theperiod
- If the recognition result was generated with the
syllable count
criteria, thesyllableThreshold
- If the recognition result was generated by performing
unvoiceTime
(server setting),unvoice
- If the recognition result was generated with the
startTimestamp
,endTimestamp
- Timestamp information of the recognition result.
- The unit is
ms
(milliseconds).
confidence
- This is a key indicating the confidence in the recognition result (
text
). - It is calculated as the geometric mean of all syllable confidence (
alignInfos.confidence
) values contained in the recognition result.
- This is a key indicating the confidence in the recognition result (
alignInfos
- This is a key responsible for the
align information
of each syllable that makes uptext
. - If
text
contains no recognized syllables, thealign information
for""(empty text)
is passed in addition. - Align information description
word
- This is a key responsible for the syllable information.
start
- This is a key responsible for the syllable start timestamp.
- The unit is
ms
.
end
- This is a key responsible for the syllable end timestamp.
- The unit is
ms
.
confidence
- This is a key indicating the confidence in the result that syllables were recognized.
- It has a value between 0 and 1.
- This is a key responsible for the
Demo [Python]
import grpc
import json
import nest_pb2
import nest_pb2_grpc
AUDIO_PATH = "path/to/audio/file" #Enter the path to the audio file to be recognized. (PCM (headerless raw wave) format at 16 kHz, 1 channel, 16 bits per sample)
CLIENT_SECRET = "Long sentence recognition secretKey"
def generate_requests(audio_path):
# Initial setup request: set up speech recognition
yield nest_pb2.NestRequest(
type=nest_pb2.RequestType.CONFIG,
config=nest_pb2.NestConfig(
config=json.dumps({"transcription": {"language": "ko"}})
)
)
# Open an audio file and read 32,000 bytes at a time
with open(audio_path, "rb") as audio_file:
while True:
chunk = audio_file.read(32000) # Read chunks of an audio file
if not chunk:
break # Exit the loop when there is no more data
yield nest_pb2.NestRequest(
type=nest_pb2.RequestType.DATA,
data=nest_pb2.NestData(
chunk=chunk,
extra_contents=json.dumps({"seqId": 0, "epFlag": False})
)
)
def main():
# Set up a secure gRPC channel to the CLOVA Speech server
channel = grpc.secure_channel(
"clovaspeech-gw.ncloud.com:50051",
grpc.ssl_channel_credentials()
)
stub = nest_pb2_grpc.NestServiceStub(channel) # Create a stub for NestService
metadata = (("authorization", f"Bearer {CLIENT_SECRET}"),) # Set up metadata with authentication tokens
responses = stub.recognize(generate_requests(AUDIO_PATH), metadata=metadata) # Call the recognize method with the generated request
try:
# Process responses from the server repeatedly
for response in responses:
print("Received response: " + response.contents)
except grpc.RpcError as e:
# Handle gRPC errors
print(f"Error: {e.details()}")
finally:
channel.close() # Close the channel when finished
if __name__ == "__main__":
main()
Demo [Java]
Project Structure
├───pom.xml
│ │
└───src
│ ├───main
│ │ ├───java
│ │ │ └───com
│ │ │ └───example
│ │ │ └───grpc
│ │ │ GRpcClient.java
│ │ │
│ │ ├───proto
│ │ │ nest.proto
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>clova-speech-grpc</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<java.version>1.8</java.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<netty.version>4.1.52.Final</netty.version>
<grpc.version>1.35.0</grpc.version>
<protoc.version>3.14.0</protoc.version>
</properties>
<dependencies>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
<version>1.18.12</version>
</dependency>
</dependencies>
<build>
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.6.1</version>
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<executions>
<execution>
<id>compile</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>testCompile</id>
<phase>test-compile</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<showDeprecation>true</showDeprecation>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>
com.google.protobuf:protoc:${protoc.version}:exe:${os.detected.classifier}
</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>
io.grpc:protoc-gen-grpc-java:${grpc.version}:exe:${os.detected.classifier}
</pluginArtifact>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
grpcClient.java
package com.example.grpc;
import java.io.FileInputStream;
import java.util.concurrent.CountDownLatch;
import com.google.protobuf.ByteString;
import com.nbp.cdncp.nest.grpc.proto.v1.NestConfig;
import com.nbp.cdncp.nest.grpc.proto.v1.NestData;
import com.nbp.cdncp.nest.grpc.proto.v1.NestRequest;
import com.nbp.cdncp.nest.grpc.proto.v1.NestResponse;
import com.nbp.cdncp.nest.grpc.proto.v1.NestServiceGrpc;
import com.nbp.cdncp.nest.grpc.proto.v1.RequestType;
import io.grpc.ManagedChannel;
import io.grpc.Metadata;
import io.grpc.StatusRuntimeException;
import io.grpc.netty.NettyChannelBuilder;
import io.grpc.stub.MetadataUtils;
import io.grpc.stub.StreamObserver;
public class GRpcClient {
public static void main(String[] args) throws Exception {
CountDownLatch latch = new CountDownLatch(1);
ManagedChannel channel = NettyChannelBuilder
.forTarget("clovaspeech-gw.ncloud.com:50051")
.useTransportSecurity()
.build();
NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
Metadata metadata = new Metadata();
metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
"Bearer ${secretKey}");
client = MetadataUtils.attachHeaders(client, metadata);
StreamObserver<NestResponse> responseObserver = new StreamObserver<NestResponse>() {
@Override
public void onNext(NestResponse response) {
System.out.println("Received response: " + response.getContents());
}
@Override
public void onError(Throwable t) {
if(t instanceof StatusRuntimeException) {
StatusRuntimeException error = (StatusRuntimeException)t;
System.out.println(error.getStatus().getDescription());
}
latch.countDown();
}
@Override
public void onCompleted() {
System.out.println("completed");
latch.countDown();
}
};
StreamObserver<NestRequest> requestObserver = client.recognize(responseObserver);
requestObserver.onNext(NestRequest.newBuilder()
.setType(RequestType.CONFIG)
.setConfig(NestConfig.newBuilder()
.setConfig("{\"transcription\":{\"language\":\"ko\"}}")
.build())
.build());
java.io.File file = new java.io.File("~/media/42s.wav");
byte[] buffer = new byte[32000];
int bytesRead;
FileInputStream inputStream = new FileInputStream(file);
while ((bytesRead = inputStream.read(buffer)) != -1) {
requestObserver.onNext(NestRequest.newBuilder()
.setType(RequestType.DATA)
.setData(NestData.newBuilder()
.setChunk(ByteString.copyFrom(buffer, 0, bytesRead))
.setExtraContents("{ \"seqId\": 0, \"epFlag\": false}")
.build())
.build());
}
requestObserver.onCompleted();
latch.await();
channel.shutdown();
}
}
FAQs
- How can I utilize the epFlag, seqId items in the
extraContents
field of the Recognize API?- You can use them for pausing purposes, or to check if you have received a complete response to a request you sent.
- Does the gRPC service support pausing?
- The gRPC service does not support pausing, but it can be implemented in the Recognize API by setting the
epFlag
entry in theextraContents
field totrue
, sending a request, and then not making a Recognize request for a period of time. See the description of theepFlag
entry. - If you request Recognize without setting epFlag to true and do not re-request for a certain period of time, the server processes the buffered Recognize request based on the
unvoiceTime
(10 seconds) set internally and displays the response result.
- The gRPC service does not support pausing, but it can be implemented in the Recognize API by setting the
- What sound data formats are supported by the Recognize API?
- Currently, we only support PCM (headerless raw wave) format at 16 kHz, 1 channel, 16 bits per sample.
- Is it mandatory to set the
epFlag
item totrue
in the Recognize API's extraContents before calling the Close API?- It is not mandatory to set the
epFlag
item totrue
. However, it is recommended to set theepFlag
item totrue
if you want to receive a fast response result for the last Recognize request. See the description of theepFlag
entry.
- It is not mandatory to set the
- How do I know if I have received all the responses to the requests I sent?
- When calling the Recognize API, you can leverage the
epFlag
andseqId
items in theextraContents
field. The result of a Recognize request withepFlag
set totrue
andseqId
set to any non-zero value can be verified by comparingepFlag
andseqId
in the Recognize response. See the response JSON format of Recognize response.
- When calling the Recognize API, you can leverage the
- Is there a connection lifetime limit for the gRPC service?
- The gRPC service has a connection lifetime limit of 100 hours, but it may be disconnected due to network problems, so it is recommended to reflect retry logic for stable service use.