CLOVA Speech Real-Time Streaming API
    • PDF

    CLOVA Speech Real-Time Streaming API

    • PDF

    Article Summary

    Available in Classic and VPC

    version

    VersionDateChanges
    v1.0.02023.12.21.Initial draft

    API URL

    HostPort
    clovaspeech-gw.ncloud.com50051

    Authorization

    Header NameDescription
    AuthorizationBearer ${secretKey}

    Example:

    ManagedChannel channel = NettyChannelBuilder
    			.forTarget("clovaspeech-gw.ncloud.com:50051")
    			.useTransportSecurity()
    			.build();
    NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
    Metadata metadata = new Metadata();
    metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
                 "Bearer ${secretKey}");
    client = MetadataUtils.attachHeaders(client, metadata);
    

    How to use CLOVA Speech gRPC

    • All requests for setting up functions and corresponding responses on the system are processed in the JSON format.

    Config JSON request format

    {
      # Transcription setup information
      "transcription": {      # optional, top level key
        "language": string
      }
      # Keyword Boosting setup information
      "keywordBoosting": {    # optional, top level key
        "boostings": [
          {
            "words": string,
            "weight": float64
          }
        ]
      },
      # Forbidden setup information
      "forbidden": {    # optional, top level key
        "forbiddens":  string
      }
    }
    

    Transcription

    Transcription JSON Format

    # transcription setup request in json format
    {
      "transcription": {
        # It is recommended to contact the person in charge to check the supported languages before use.
        "language": string        # required key
      }
    }
    
    • language (required) The following are the language codes of the languages available for voice recognition.
      • Korean (ko)
      • English (en)
      • Japanese (ja)
    # transcription setup request example
    {
      "transcription": {
        "language": "ko"
      }
    }
    

    Keyword Boosting

    • Keyword boosting is used to enhance the recognition rate of the keywords entered by the user.
    • You can set the desired keywords by manually entering them.
      • The weight of a keyword is a real number between 0 and 5.0.
        • If weight is 0, boosting is not performed.
        • All keywords must have the same weight.
          • You can group the keywords with the same weight in a words key string, separating them with commas (,).
    • Keyword boosting takes into account the spaces before and after each word.
    • For how to set up this function, see Keyword Boosting Setup JSON Format.

    Keyword Boosting JSON Format

    # keyword boosting setup request in json format
    {
      "keywordBoosting": {
        "boostings": [
          {
            "words": string,
            "weight": float64
          }
        ]
      }
    }
    
    # keyword boosting setup request example
    {
      "keywordBoosting": {
        "boostings": [
          {
            "words": "test,test1,test2",
            "weight": 1 
          },
          {
            "words": "test, test 1, test 2",
            "weight": 0.5
          }
        ]
      }
    }
    

    Forbidden

    • This function adds a forbidden keyword tag for a forbidden keyword entered by the user.
      • Forbidden keyword tag: <forbidden>금칙어</forbidden>
    • You can set the forbidden keywords by manually entering them.
    • Forbidden keyword tag can be added to the value of a text key in the recognition result.
    • The added forbidden keyword tag does not affect position, periodPosition, alignInfo in the recognition result.
    • You can group 2 or more forbidden keywords in a forbiddens key string, separating them with commas (,).
    • This function takes into account the spaces before and after each forbidden keyword.
    • For how to set up this function, see Forbidden Keyword Setup JSON Format.

    Forbidden JSON Format

    # forbidden setup request in json format
    {
      "forbidden": {
        "forbiddens":  string
      }
    }
    
    # forbidden setup request example
    {
      "forbidden": {
        "forbiddens":  "forbidden keyword 1, forbidden keyword 2" 
      }
    }
    

    Config JSON response format

    {
      "uid": string,                  # required
      "responseType": [ "config" ],   # required
      "config": {                     # required
        "status":string,              # required
        # The following fields may not exist depending on the user's config setting.
        # e.g. If there are no forbidden keywords, the "forbidden" key is not created.
        "keywordBoosting": {          # optional, top level key
          "status":string,
        },
        "forbidden": {                # optional, top level key
          "status":string,
        }
      }
    }
    

    Additional information

    • The Config JSON response consists of the following elements:
      • The value of the "config.status" key in the Config JSON response can have the following values:
        • "Success"
          • The Config JSON request is made successfully and the desired setting is saved in gRPC Service
        • "Failure"
          • The function included in the Config JSON request is recognized in the server but its setup has failed
        • top_level_key:
          • transcription
          • keywordBoosting
          • forbidden
      • config.${top_level_key}.status of the Config JSON response can have the following values:
        • Common
    • "Unknown key: ${top_level_key}-${unknown_key}"
    • The config request json has a sub level key that is not supported by the server
    • "Invalid type: ${top_level_key}-${invalid_type_key}"
    • The config request json has a sub level value type that is not supported by the server
      - transcription
      - "Invalid language code: ${invalid_language_code}"
      - The language of the config request json does not have a predefined language code
      - keywordBoosting
      - "Internal system error"
      - There is an internal problem in the server
      • "${message}"
        • The Config JSON request is not properly recognized or cannot be properly processed
        • The following can be ${message}:
          • "Invalid request json format"
            • The Config JSON request is in an invalid JSON format
          • "Unknown key: ${unknown_key}"
            • The Config JSON request has a top level key that is not supported by the server
          • "Invalid type: ${invalid_type_key}"
            • The Config JSON request has a top level value type that is not supported by the server
          • "Required key is not provided"
            • The Config JSON request does not have the required key defined by the server
          • "No more slot"
            • The server currently cannot accommodate any resources
          • "ConfigRequest did not complete"
            • The server has received the Config JSON request before the request was fully processed
          • "Lifespan expired"
            • The gRPC Service use time has expired
            • gRPC Service use time is set to 100 hours.
          • "Failed to received request msg"
            • The server has not properly received the user's request
          • "Model server is not working"
            • There is an internal problem in the server
          • "Internal server error"
            • There is an internal problem in the server

    Config JSON example

    # Config JSON request example
    # Requesting keyword boosting and forbidden keywords
    {
      "keywordBoosting": {                  
        "boostings": [
          {
            "words": "test,test1,test2",
            "weight": 1
          },
          {
            "words": "test, test 1, test 2",
            "weight": 0.5
          }
        ]
      },
      "forbidden": {
        "forbiddens":  "forbidden keyword 1, forbidden keyword 2"
      }
    }
    
    # Config JSON response example
    # Request for keyword boosting and forbidden keywords has been approved
    {
      "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58",
      "responseType": [ "config" ],
      "config": {
        "status": "Success",
        "keywordBoosting": {
          "status": "Success"
        },
        "forbidden" : {
          "status": "Success"
        }
    }
    
    # Request for keyword boosting and forbidden keywords has been rejected
    # The forbidden keyword request json includes "orbidden," which is an unsupported top level key. 
    {
      "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58",
      "responseType": [ "config" ]
      "config": {
        "status": "Unknown key: orbidden"
      }
    }
    

    Recognize Request

    Recognize Request JSON Format (ExtraContents)

    {
      "epFlag": bool,    # optional
      "seqId": int      # optional
    }
    

    This is the JSON format used for the recognize request.

    • epFlag

      • This flag is used for pause or last recognition in a recognize request.
      • For the pause or last recognition request, you need to set epFlag to true to receive the recognize result without delay.
      • This is an optional field.
      • If you send the request without setting this flag, it is automatically set to false.
    • seqId

      • This is the ID of a recognize request.
      • This is an optional field.
      • This flag can be used to check the recognition result after you send a request with epFlag set to true.
      • If you send the request without setting seqId, seqId is automatically set to 0 in the recognition result.
      • It is recommended to set this flag to a value other than 0 before sending a request.
    • Note

      • If you do not wish to use either epFlag or seqId, you can set the request JSON to ""(empty string).

    Recognize Responses

    Transcription JSON Format

    {
      "uid": string
      "responseType": [ "transcription" ]
      "transcription": {
        "text": string,
        "position": int,
        "periodPositions": [ int ],
        "periodAlignIndices": [ int ],
        "epFlag": bool,
        "seqId": int,
        "epdType": string,         // only epd with result generated
        "startTimestamp": int,
        "endTimestamp": int,
        "confidence": float64,
        "alignInfos": [
          {
            "word": string,        // syllable
            "start": int,          // StartTimestamp in ms 
            "end": int,            // EndTimestamp in ms
            "confidence": float64  // recognition reliability
          }
        ]
      }
    }
    
    # Example
    {
      "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58"
      "responseType": [ "transcription" ]
      "transcription": {
        "text": "this is text",
        "position": 0,
        "periodPositions": [3],
        "periodAlignIndices": [3],
        "epFlag": false,
        "seqId": 0,
        "epdType": "durationThreshold",
        "startTimestamp": 190,
        "endTimestamp": 840,
        "confidence": 0.997389124199423,
        "alignInfos": [
          {"word":"this","start":190,"end":340,"confidence":0.9988637124943075}, 
          {"word":"is","start":341,"end":447,"confidence":0.9990018488549978},
          {"word":"text","start":448,"end":580,"confidence":0.9912501264550316},
          {"word":".","start":581,"end":700,"confidence":0.9994397226648595},
          {"word":" ","start":701,"end":840,"confidence":0.9984142043105126}
        ]
      }
    }
    

    Failed Recognize Response JSON Format

    {
      "uid": string,                     # required
      "responseType": [ "recognize" ],    # required
      "recognize": {                     # required
        "status": string,                # required
        "epFlag": {                      # optional
          "status": string
        },
        "seqId": {                       # optional
          "status": string
        },
        "audio": {                       # optional
          "status": string
        }
      }
    }
    

    This the JSON format for a response received when the recognition request has been rejected or could not be properly processed.

    • recognize.status is the key indicating the reason why the recognize request was rejected or could not be processed and includes the following values:

      • Invalid Type

        • The epFlag or seqId value type does not match the predefined type.
      • Required key is not provided

        • The value of epFlag, which is a required key in extraContents, was not provided
      • Invalid request json format

        • extraContents is not in the json format
      • Unknown key

        • extraContents has a key that is not found in the protocol spec
        • If a wrong key has been entered, the relevant information is specified in the status field for convenience.
          # invalid extraContents
          {
            "test1": "test-val1",
            "test2": "test-val2"
          }
          # response msg
          {
            "uid": "2023-03-02_13-13-16_b49f35ec-7cf0-434b-9489-30b0b66f6d58"
            "responseType": ["recognize"],
            "recognize": {
              "status": "Unknown key: test1, test2"
            }
          }
          
      • ConfigRequest is already called

        • The server has completed processing of the config request but received another config request
      • Lifespan expired

        • The gRPC Service use time has expired
        • gRPC Service use time is set to 100 hours.
      • Failed to received request msg

        • The server has not properly received the user's request
      • Model server is not working

        • There is an internal problem in the server
      • Internal server error

        • There is an internal problem in the server
      • Invalid format

        • The audio format sent by the user is invalid
      • Failure

        • extraContents in the recognize request json is in an invalid format
        • epFlag.status or seqId.status provides the cause of failure in detail.
    • epFlag.status shows the cause of epFlag input failure and has the following values:

      • Not found

        • epFlag, which is a required key, has not been filled out.
      • Invalid type

        • The value type does not match the predefined type.
      • seqId.status shows the cause of seqId input failure and has the following values:

        • Invalid type
          • The value type does not match the predefined type.
    • audio.status shows the cause of failure in audio data processing and has the following values:

      • Invalid format
        • The audio format does not match the predefined format.
    • Note

      • epFlag, seqId, audio keys can be omitted depending on the value of recongnize.status.

    Analysis methods

    • text

      • This key is responsible for the recognition result.
    • position

      • This key controls the offset of the full text received through text.

      • How to build a full text using text and position

        Receipt SequenceRecognition Resultfull text
        1{text: "ABC", position: 0, ...}"ABC"
        2{text: "DEFG", position: 3, ...}"ABCDEFG"
    • periodPositions

      • This key controls the offset of the periods (.) in the full text received through text.
      • If text does not include any periods, an empty list is sent.
    • periodAlignIndices

      • This key manages the index information in alignInfos for the period (.) received through text.
      • If text does not include any periods, an empty list is sent.
    • epFlag is

      • the key which indicates whether the sound source sent with epFlag set to True in the request has been recognized.
    • seqId

      • This key shows seqId of the last request included in the recognition result.
      • If the epFlag key is set to false, 0 is returned.
      • If the epFlag key is set to true, seqId of the last processed recognition request is returned.
    • epdType

      • This key manages the epd standard used to generate the recognition result.
      • epdType values according to the epd standard
        • If the recognition result has been generated based on silent syllables (묵음): gap
        • If the recognition result has been generated including the last audio chuck (마지막 audio chunk): endPoint
        • If the recognition result has been generated based on time (시간): durationThreshold
        • If the recognition result has been generated based on period (구두점): period
        • If the recognition result has been generated based on the number of syllables (음절 개수): syllableThreshold
        • If the recognition result has been generated by the running of unvoiceTime (server setting): unvoice
    • startTimestamp, endTimestamp

      • These provide the timestamp data of the recognition result.
      • The unit used is ms.
    • confidence

      • This key represents the reliability of the recognition result (text).
      • It represents the geometric average of confidences of all syllables (alignInfos.confidence) included in the recognition result.
    • alignInfos

      • This key manages the align information (align 정보) of each syllable making up text.
      • If there is no recognized syllable in text, the align information (align 정보) of ""(empty text) is added and sent to the user.
      • Align information descriptions
        • word
          • This key manages syllable information.
        • start
          • This key manages the timestamp for the start of a syllable.
          • The unit used is ms.
        • end
          • This key manages the timestamp for the end of a syllable.
          • The unit used is ms.
        • confidence
          • This key represents the reliability of a syllable's recognition result.
          • It has a value between 0 and 1.

    Demo [Java]

    Project Structure

    ├───pom.xml
    │   │
    └───src
    │   ├───main
    │   │   ├───java
    │   │   │   └───com
    │   │   │       └───example
    │   │   │           └───grpc
    │   │   │                   GRpcClient.java
    │   │   │
    │   │   ├───proto
    │   │   │       nest.proto
    

    nest.proto

    syntax = "proto3";
    option java_multiple_files = true;
    package com.nbp.cdncp.nest.grpc.proto.v1;
    
    import "google/protobuf/any.proto";
    
    enum RequestType {
      CONFIG = 0;
      DATA = 1;
    }
    
    message NestConfig {
      string config = 1;
    }
    
    message NestData {
      bytes chunk = 1;
      string extra_contents = 2;
    }
    message NestRequest {
      RequestType type = 1;
      oneof part {
        NestConfig config = 2;
        NestData data = 3;
      }
    }
    
    message NestResponse {
      string contents = 1;
    }
    service NestService {
      rpc recognize(stream NestRequest) returns (stream NestResponse){};
    }
    

    pom.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.example</groupId>
        <artifactId>clova-speech-grpc</artifactId>
        <version>1.0-SNAPSHOT</version>
        <properties>
            <java.version>1.8</java.version>
            <maven.compiler.source>${java.version}</maven.compiler.source>
            <maven.compiler.target>${java.version}</maven.compiler.target>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
            <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
            <netty.version>4.1.52.Final</netty.version>
            <grpc.version>1.35.0</grpc.version>
            <protoc.version>3.14.0</protoc.version>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>io.grpc</groupId>
                <artifactId>grpc-netty</artifactId>
                <version>${grpc.version}</version>
            </dependency>
            <dependency>
                <groupId>io.grpc</groupId>
                <artifactId>grpc-netty-shaded</artifactId>
                <version>${grpc.version}</version>
            </dependency>
            <dependency>
                <groupId>io.grpc</groupId>
                <artifactId>grpc-protobuf</artifactId>
                <version>${grpc.version}</version>
            </dependency>
            <dependency>
                <groupId>io.grpc</groupId>
                <artifactId>grpc-stub</artifactId>
                <version>${grpc.version}</version>
            </dependency>
            <dependency>
                <groupId>org.projectlombok</groupId>
                <artifactId>lombok</artifactId>
                <optional>true</optional>
                <version>1.18.12</version>
            </dependency>
        </dependencies>
    
        <build>
            <extensions>
                <extension>
                    <groupId>kr.motd.maven</groupId>
                    <artifactId>os-maven-plugin</artifactId>
                    <version>1.6.1</version>
                </extension>
            </extensions>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.1</version>
                    <executions>
                        <execution>
                            <id>compile</id>
                            <phase>compile</phase>
                            <goals>
                                <goal>compile</goal>
                            </goals>
                        </execution>
                        <execution>
                            <id>testCompile</id>
                            <phase>test-compile</phase>
                            <goals>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                    <configuration>
                        <showDeprecation>true</showDeprecation>
                        <encoding>${project.build.sourceEncoding}</encoding>
                    </configuration>
                </plugin>
                <plugin>
                    <groupId>org.xolstice.maven.plugins</groupId>
                    <artifactId>protobuf-maven-plugin</artifactId>
                    <version>0.6.1</version>
                    <configuration>
                        <protocArtifact>
                            com.google.protobuf:protoc:${protoc.version}:exe:${os.detected.classifier}
                        </protocArtifact>
                        <pluginId>grpc-java</pluginId>
                        <pluginArtifact>
                            io.grpc:protoc-gen-grpc-java:${grpc.version}:exe:${os.detected.classifier}
                        </pluginArtifact>
                    </configuration>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>compile-custom</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </project>
    

    grpcClient.java

    package com.example.grpc;
    
    import java.io.FileInputStream;
    import java.util.concurrent.CountDownLatch;
    
    import com.google.protobuf.ByteString;
    import com.nbp.cdncp.nest.grpc.proto.v1.NestConfig;
    import com.nbp.cdncp.nest.grpc.proto.v1.NestData;
    import com.nbp.cdncp.nest.grpc.proto.v1.NestRequest;
    import com.nbp.cdncp.nest.grpc.proto.v1.NestResponse;
    import com.nbp.cdncp.nest.grpc.proto.v1.NestServiceGrpc;
    import com.nbp.cdncp.nest.grpc.proto.v1.RequestType;
    import io.grpc.ManagedChannel;
    import io.grpc.Metadata;
    import io.grpc.StatusRuntimeException;
    import io.grpc.netty.NettyChannelBuilder;
    import io.grpc.stub.MetadataUtils;
    import io.grpc.stub.StreamObserver;
    
    public class GRpcClient {
    	public static void main(String[] args) throws Exception {
    
    		CountDownLatch latch = new CountDownLatch(1);
    		ManagedChannel channel = NettyChannelBuilder
    			.forTarget("clovaspeech-gw.ncloud.com:50051")
    			.useTransportSecurity()
    			.build();
    		NestServiceGrpc.NestServiceStub client = NestServiceGrpc.newStub(channel);
    		Metadata metadata = new Metadata();
    		metadata.put(Metadata.Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER),
    			"Bearer ${secretKey}");
    		client = MetadataUtils.attachHeaders(client, metadata);
    
    		StreamObserver<NestResponse> responseObserver = new StreamObserver<NestResponse>() {
    			@Override
    			public void onNext(NestResponse response) {
    				System.out.println("Received response: " + response.getContents());
    			}
    
    			@Override
    			public void onError(Throwable t) {
    				if(t instanceof StatusRuntimeException) {
    					StatusRuntimeException error = (StatusRuntimeException)t;
    					System.out.println(error.getStatus().getDescription());
    				}
    				latch.countDown();
    			}
    
    			@Override
    			public void onCompleted() {
    				System.out.println("completed");
    				latch.countDown();
    			}
    		};
    
    		StreamObserver<NestRequest> requestObserver = client.recognize(responseObserver);
    
    		requestObserver.onNext(NestRequest.newBuilder()
    			.setType(RequestType.CONFIG)
    			.setConfig(NestConfig.newBuilder()
    				.setConfig("{\"transcription\":{\"language\":\"ko\"}}")
    				.build())
    			.build());
    
    		java.io.File file = new java.io.File("~/media/42s.wav");
    		byte[] buffer = new byte[32000];
    		int bytesRead;
    		FileInputStream inputStream = new FileInputStream(file);
    		while ((bytesRead = inputStream.read(buffer)) != -1) {
    			requestObserver.onNext(NestRequest.newBuilder()
    				.setType(RequestType.DATA)
    				.setData(NestData.newBuilder()
    					.setChunk(ByteString.copyFrom(buffer, 0, bytesRead))
    					.setExtraContents("{ \"seqId\": 0, \"epFlag\": false}")
    					.build())
    				.build());
    		}
    		requestObserver.onCompleted();
    		latch.await();
    		channel.shutdown();
    	}
    
    }
    

    FAQs

    • A confidence value of less than 0.4 means that accuracy is significantly poor. It is recommended to use a recognition result with a confidence value of at least 0.4 or higher.
    • How do I make use of epFlag and seqId in the extraContents field in the Recognize API?
      • You can use them for a pause or to check if all responses have been received to the sent request.
    • Does gRPC Service provide a pause option?
      • No, it does not. However, you can bring about a pause by sending a request with epFlag in the extraContents field set to true on the Recognize API and then not making a Recognize request for a certain period of time. For more information, see the description on epFlag.
      • If the user makes a Recognize request without setting epFlag to true and does not make a re-request for a certain period of time, the server processes the recognition request buffered based on its unvoiceTime setting and sends the response to the user. For more information on the unvoiceTime setting, contact the person in charge.
    • Which data formats does the Recognize API support for sound source?
      • Currently, it only supports the PCM format (raw wave without header) with 16 kHz, 1 channel, 16 bits per sample.
    • Is it required to set epFlag of extraContents to true on the Recognize API before calling the Close API?
      • No, it is not a requirement. However, it is better to set epFlag to true if you want to promptly receive a response to the last Recognize request. For more information, see the description on epFlag.
    • How do I check if all responses have been received to the sent request?
      • When you call the Recognize API, you can make use of epFlag and seqId in the extraContents field. You can set epFlag to true and seqId to a number other than 0 in the Recognize request, and then simply compare the values of epFlag and seqId in the Recognize request and response to check the processing result. For more information, see the description on the JSON format for Recognize response.
    • Does gRPC Service impose a limit on connection lifetime?
      • gRPC Service limits connection lifetime to 100 hours.

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.