Thinking

The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

Available in Classic and VPC

This guide describes Chat Completions v3, which leverages the HCX-007 inference model with complex problem-solving capabilities.

Request

This section describes the request format. The method and URI are as follows:

Method	URI
POST	/v3/chat-completions/{modelName}

Request headers

The following describes the request headers.

Headers	Required	Description
`Authorization`	Required	API key for authentication Example: `Bearer nv-************`
`X-NCP-CLOVASTUDIO-REQUEST-ID`	Optional	Request ID for the request
`Content-Type`	Required	Request data format `application/json`
`Accept`	Conditional	Response data format `text/event-stream`

Note

Response results are returned in JSON by default, but if you specify Accept as text/event-stream, then the response results are returned as a stream.

Request path parameters

You can use the following path parameters with your request:

Field	Type	Required	Description
`modelName`	Enum	Required	Model name Example: `HCX-007`

Note

The HCX-007 inference model is only available in the Chat Completions v3 API and does not support image input and tuning training.

Request body

You can include the following data in the body of your request:

Field	Type	Required	Description
`messages`	Array	Required	Conversation messages
`thinking`	Object	Optional	Inference model configuration information
`thinking.effort`	Enum	Optional	Set whether to infer and the depth of the thought process. `none` \| `low` (default) \| `medium` \|`high` \| `none`: no inference `low`: short inference `medium`: medium-length inference `high`: long inference
`topP`	Double	Optional	Sample generated token candidates based on cumulative probability. 0.00 ＜ `topP` ≤ 1.00 (default: 0.80)
`topK`	Integer	Optional	Sample K high-probability candidates from the pool of generated token candidates 0 ≤ `topK` ≤ 128 (default: 0)
`maxCompletionTokens`	Integer	Optional	Maximum number of generated tokens (conversation message length, including inferences) 1 ≤ `maxCompletionTokens` ≤ 32768 (default: differs depending on `thinking.effort`) `none`: 512 `low`: 5120 `medium`: 10240 `high`: 20480 `maxTokens` unavailable
`temperature`	Double	Optional	Degree of diversity for the generated tokens (higher values generate more diverse sentences) 0.00 ≤ `temperature` ≤ 1.00 (default: 0.50)
`repetitionPenalty`	Double	Optional	Degree of penalty for generating the same token (the higher the setting, the less likely it is to generate the same result repeatedly) 0 ＜ `repetitionPenalty` ≤ 2.0 (default: 1.1)
`seed`	Integer	Optional	Adjust consistency level of output on model iterations. 0: Randomize consistency level (default). 1 ≤ `seed` ≤ 4294967295: `seed` value of result value you want to generate consistently, or a user-specified `seed` value
`includeAiFilters`	Boolean	Optional	Whether to display the AI filter results (degree of the generated results in categories such as profanity, degradation/discrimination/hate, sexual harassment/obscenity, etc.) `true` (default) \| `false` `true`: display `false`: not display

`messages`

The following describes messages.

Field	Type	Required	Description
`role`	Enum	Required	Role of conversation messages `system` \| `user` \| `assistant` \| `system`: directives that define roles `user`: user utterances/questions `assistant`: answers to user utterances/questions
`content`	String \| Array	Required	Conversation message content Enter text (string). Enter text (array)

`content`

The following describes content.

Field	Type	Required	Description
`type`	Enum	Required	Format of the conversation message content `text` (valid value)
`text`	String	Conditional	Conversation message content Enter text. Required if `type` is `text`

Note

When entering some fields, check the following:

role: You can only include one Conversation message that is system per request.
message.thinkingContent can't be included in the request when entering Conversation message whose role is assistant.
Inference can't be requested at the same time as function calling or structured outputs.
You can't use stop in Chat Completions V3 when using inference.
HCX-007
- The sum of the input tokens and the output tokens cannot exceed 128,000 tokens.
- The input tokens can be up to 128,000 tokens.
- The output tokens (maxCompletionTokens) to be requested from the model can be up to 32768 tokens.

Request example

The request example is as follows:

curl --location --request POST 'https://clovastudio.stream.ntruss.com/v3/chat-completions/HCX-007' \
--header 'Authorization: Bearer {CLOVA Studio API Key}' \
--header 'X-NCP-CLOVASTUDIO-REQUEST-ID: {Request ID}' \
--header 'Content-Type: application/json' \
--header 'Accept: text/event-stream' \
--data '{
    "messages": [
      {
        "role": "system",
        "content": "- It is a highly organized analyst and an expert in logic-based problem solving."
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Given a set with n elements, explain that the number of all subsets that can be made from this set is equal to n squared to the power of 2."
          }
        ]
      }
    ],
    "thinking": {
        "effort": "low"
        },
    "topP": 0.8,
    "topK": 0,
    "maxCompletionTokens": 5120,
    "temperature": 0.5,
    "repetitionPenalty": 1.1,
    "stop": []
  }'

Response

This section describes the response format.

Response headers

The following describes the response headers.

Headers	Required	Description
`Content-Type`	-	Response data format `application/json`

Response body

The response body includes the following data:

Field	Type	Required	Description
`status`	Object	-	Response status
`result`	Object	-	Response result
`result.message`	Object	-	Conversation messages
`result.message.role`	Enum	-	Role of conversation messages `system` \| `user` \| `assistant` `system`: directives that define roles `user`: user utterances/questions `assistant`: answers of the model
`result.message.content`	String	-	Content of conversation messages
`result.message.thinkingContent`	String	-	Inferences in conversation messages
`result.finishReason`	String	-	Reason for stopping token generation (generally passed to the last event) `length` \| `stop` `length`: length limit `stop`: character specified in `stop` occurred during answer generation. `tool_calls`: Model successfully completed tool call.
`result.created`	Integer	-	Response date and time Unix timestamp milliseconds format
`result.seed`	Integer	-	Input seed value (Return a random value when 0 is entered or not entered)
`result.usage`	Object	-	Token usage
`result.usage.completionTokens`	Integer	-	Generated token count
`result.usage.promptTokens`	Integer	-	Number of input (prompt) tokens
`result.usage.totalTokens`	Integer	-	Total number of tokens Number of generated tokens + number of input tokens
`result.usage.completionTokensDetails`	Object	-	Additional information about the number of generated tokens
`result.usage.completionTokensDetails.thinkingTokens`	Integer	-	Number of generated tokens
`result.aiFilter`	Array	-	AI Filter result

`aiFilter`

The following describes aiFilter.

Field	Type	Required	Description
`groupName`	String	-	AI Filter category `curse` \| `unsafeContents` `curse`: degradation, discrimination, hate, and profanity `unsafeContents`: sexual harassment, obscenity
`name`	String	-	AI Filter subcategory `discrimination` \| `insult` \| `sexualHarassment` `discrimination`: degradation, discrimination, hate `insult`: profanity `sexualHarassment`: sexual harassment, obscenity
`score`	String	-	AI Filter score `-1` \| `0` \| `1` \| `2` `-1`: AI Filter error occurred. `0`: Conversation messages are more likely to contain sensitive/hazardous language. `1`: Conversation messages are likely to contain sensitive/hazardous language. `2`: Conversation messages are unlikely to contain sensitive/hazardous language.
`result`	String	-	Whether AI Filter is operating properly `OK` \| `ERROR` `OK`: normal operation `ERROR`: error occurred

Note

AI Filter can analyze up to 500 characters. However, if the text being analyzed contains many unusual formats, emojis, or special characters, it may not be analyzed correctly.

Response example

The response example is as follows:

Succeeded

The following is a sample response upon a successful call.

{
    "status": {
        "code": "20000",
        "message": "OK"
    },
    "result": {
        "message": {
            "role": "assistant",
            "content": "The number of subsets is determined by the two choices, with each element either included or excluded.\nFor example, the subset of {a} is ∅, {a} → 2 (2¹),
and the subset of {a,b} is ∅, {a}, {b}, {a,b} → 4 (2²).\nIn a set of n elements, each element has 2 choices (inclusion/exclusion), so the number of all cases is 2 × 2 × ... × 2 (n times) = 2ⁿ.\nAlternatively, the binomial theorem can be used: The number of cases where a subset has size k is the combination **C(n,k)**, so summing over all k (0 ≤ k ≤ n), we have ∑_{k=0}^n C(n,k) = (1+1)^n = 2ⁿ.\nTherefore, the number of subsets of n elements is always 2ⁿ.",
            "thinkingContent": "The question the user asked today is to explain why the number of all possible subsets of a set with n elements is n to the power of 2. Isn't this topic related to combinatorics or binary?\nFirst, consider that for each element, there are two cases: to select or not to select. For example, if there is only one element, there are two subsets: the conjugate and itself, totaling 2^1=2. When there are two elements, each is either included or excluded, so there are 2*2=4 and 2^2=4.\nGeneralizing, since there are two choices (inclusion/exclusion) for each element, the total number of cases is 2 multiplied by n times 2, or 2^n. This is basic combinatorial thinking.\nAlternatively, if we think of the number of cases where the subset is k in size, then C(n,k), and add it up from k=0 to n, we get (1+1)^n = 2^n, which is the binomial theorem, so we can conclude that the number of all cases is 2^n.\nSince the user probably wanted this explanation, I think I'll keep it simple and approach it as whether or not each element is chosen. I'll give an example, and then summarize the generalization."
        },
        "finishReason": "stop",
        "created": 1753362971
        "seed": 1561390649,
        "usage": {
            "promptTokens": 58,
            "completionTokens": 631,
            "totalTokens": 689,
            "completionTokensDetails": {
                "thinkingTokens": 366
            }
}

Failure

The following is a sample response upon a failed call.

Response stream

You can use token streaming to output the tokens as they are generated, one by one. The following describes the token streaming format.

Response body

The response body includes the following data:

StreamingChatCompletionsTokenEvent

The following describes StreamingChatCompletionsTokenEvent.

Field	Type	Required	Description
`message`	Object	-	Conversation messages
`message.role`	Enum	-	Conversation message role `user` \| `assistant` `user`: user's utterance or question `assistant`: model's answer
`message.content`	String	-	Content of conversation messages
`message.thinkingContent`	String	-	Inferences in conversation messages
`finishReason`	String	-	Reason for stopping token generation (typically passed to the last event, otherwise null) `length` \| `stop` `length`: length limit `stop`: The character specified in `stop` occurred during response generation.
`created`	Integer	-	Response timestamp
`seed`	Integer	-	Input seed value (Return a random value when 0 is entered or not entered)
`usage`	Object	-	Token usage (typically passed to the last event, otherwise null)

StreamingChatCompletionsResultEvent

The following describes StreamingChatCompletionsResultEvent.

Field	Type	Required	Description
`message`	Object	-	Conversation messages
`message.role`	Enum	-	Conversation message role `user` \| `assistant` `user`: user's utterance or question `assistant`: model's answer
`message.content`	String	-	Content of conversation messages
`message.thinkingContent`	String	-	Inferences in conversation messages
`finishReason`	String	-	Reason for stopping token generation `length` \| `stop` `length`: length limit `stop`: The character specified in `stop` occurred during response generation.
`created`	Integer	-	Response timestamp
`seed`	Integer	-	Input seed value (Return a random value when 0 is entered or not entered)
`usage`	Object	-	Token usage
`usage.completionTokens`	Integer	-	Generated token count
`usage.promptTokens`	Integer	-	Number of input (prompt) tokens
`usage.totalTokens`	Integer	-	Total number of tokens Number of generated tokens + number of input tokens
`usage.completionTokensDetails`	Object	-	Additional information about the number of generated tokens
`usage.completionTokensDetails.thinkingTokens`	Integer	-	Number of generated tokens
`aiFilter`	Array	-	AI Filter result

ErrorEvent

The following describes ErrorEvent.

Field	Type	Required	Description
`status`	Object	-	Response status
`status.code`	Object	-	Response status code See CLOVA Studio troubleshooting.
`status.message`	Object	-	Response status message See CLOVA Studio troubleshooting.

SignalEvent

The following describes SignalEvent.

Field	Type	Required	Description
`data`	String	-	Signal data information to pass

Response example

The response example is as follows:

Succeeded

The following is a sample response upon a successful call.

id: aabdfe-dfgwr-edf-hpqwd-f3asd-g
event: token
data: {"message": {"role": "assistant", "thinkingContent": "today"},"finishReason": null, "created": 1744710905, "seed": 3284419119, "usage": null} 

...

id: aabdfe-dfgwr-edf-hpqwd-f2asd-g
event: token
data: {"message": {"role": "assistant", "content": "part"},"finishReason": null, "created": 1744710905, "seed": 3284419119, "usage": null} 

...

id: aabdfe-dfgwr-edf-hpqwd-f1asd-g
event: result
data: {
    "message": {
        "role": "assistant",
        "content": "The number of subsets is determined by the **inclusion status** of each element.  \nEach element can choose either **\"Include\" or \"Not include\"**, so there are two cases per element.  \n\nFor example: \n- Set with one element ({a}) → ∅, {a} (**2¹ = 2**) \n- Set with two elements ({a, b}) → ∅, {a}, {b}, {a,b} (**2² = 4**) \n\nBy extension, the **n elements** increase by **2 times** the previous result (2ⁿ-¹) at each step (**2ⁿ-¹ × 2 = 2ⁿ**).  \nIn other words, the number of all subsets is expressed as **2ⁿ**, reflecting the independent selectability of each element.  \n\nSince conjugates and complete sets are also naturally included, the formula **2ⁿ** provides a complete and consistent result.",
        "thinkingContent": "The question the user asked today is to explain why the number of subsets of a set of n elements is 2^n. How can we explain this easily? \n\nLet's start with the basic concept: each element has two choices: either it is in a subset or it is not. For example, if there is one element, then {}, {a} are two subsets, where 2^1=2.\n\nIf we add a second element, b, then for each of the existing subsets, we have to decide whether to add b or not. For each of the original {} and {a}, we either add b or not, for a total of 4. This is 2^2=4. When we expand in this way, for n, each step is repeated by multiplying the number of previous cases by 2, so we end up with 2×2×...×2 (n times) = 2^n\n\nAnother approach is to emphasize independent choices for each element. Since the inclusion or exclusion of each element does not affect each other, we can apply the multiplication law of combinations. So the total number of cases is 2 × 2 × ... × 2 (n times) = 2^n.\n\nDo I need to mention that sometimes conjugates or complete sets are also included? I think it would be clearer if I showed it with an example. But to keep it simple, I think it's enough to emphasize the key idea: the selectivity of each element.\n\nFinally, I could use induction to prove it mathematically rigorously, but since you're asking for a simple explanation, I'll stick to basic logic. I think this is how we'll work it out!"
    },
    "finishReason": "stop",
    "created": 1753363313,
    "seed": 3219533885,
    "usage": {
        "promptTokens": 58,
        "completionTokens": 588,
        "totalTokens": 646,
        "completionTokensDetails": {
            "thinkingTokens": 361
        }
    }
}

Failure

The following is a sample response upon a failed call.

Recommendations

CLOVA Studio inference models do not generate final responses directly, but rather analyze the user query, go through the necessary logical thought process, generate inferences (tokens), and then generate final responses based on them. The recommendations describe how to effectively utilize the inference model for optimized final response generation.

Set whether to infer and how long to infer

The CLOVA Studio inference models are trained so that you can set whether to infer and its length yourself. See the following instructions to set the inference availability and length for your request type.

Whether to infer and depth of thought process for inference
- Set via thinking.effort. Depending on the settings, the following defaults for the maximum number of request tokens to generate are applied.
  
  thinking.effort settings maxCompletionTokens default Description
  
  none 512
  
  low 5120
  
  medium 10240
  
  high 20480
Inference length (maximum number of generated tokens)
- Set via maxCompletionTokens. It includes both inferences and the number of final response tokens.
  - Inference uses maxCompletionTokens, not maxTokens.
- You can set the value yourself, but if it is not large enough, the model may not be able to generate the final response if it reaches that length during the inference process. Therefore, adhere to the default value or try several times to find and set the appropriate length.

`thinking.effort` settings	`maxCompletionTokens` default	Description
	`none`	512
	`low`	5120
	`medium`	10240
	`high`	20480

Note

For relatively long outputs, we recommend the stream output setting, as it can take some time to generate.

Check inference content

When you make an API request using inference, the response result consists of the inference content and the final response. You can check the inference content and token count as follows:

The inference content, which contains the thought process that the inference model generated for the final response, can be found in message.thinkingContent in the response body.
The number of tokens in the inference content can be found in usage.completionTokensDetails.thinkingTokens in the response body.

Caution

Note that the inference content may include some of the context passed to the model, including system prompts, in order to get a better final response. For this reason, it is recommended that the inference content be utilized for development purposes only, and not provided to users without any post-processing.

Input processing in multi-turn conversation

When you want to continue a multi-turn conversation, you should exclude the thinking content (message.thinkingContent) of the CLOVA Studio inference model and include only the final response, the model response (message.content), in the next turn's input. See the following image.

CLOVA Studio_thinking_multiturn.png