RAG Reasoning

Available in Classic and VPC

Generate evidence-based RAG answers by leveraging RAG Reasoning model trained on answer types, such as citation sources that increase credibility and citation source indexing notations. RAG Reasoning calls the engine in the function calling format. You can specify a single or multiple RAG functions, and LLM will autonomously select the best function for the context to generate search augmentation. When used in conjunction with Reranker, you can get more stable results.

Request

This section describes the request format. The method and URI are as follows:

Method	URI
POST	/v1/api-tools/rag-reasoning

Request headers

The following describes the request headers.

Field	Required	Description
`Authorization`	Required	API key for authentication Example: `Bearer nv-************`
`X-NCP-CLOVASTUDIO-REQUEST-ID`	Optional	Request ID for the request
`Content-Type`	Required	Request data format application/json

Request body

You can include the following data in the body of your request:

Field	Type	Required	Description
`messages`	Array	Required	Conversation messages
`topP`	Double	Optional	Sample generated token candidates based on cumulative probability. 0.00 ＜ `topP` ≤ 1.00 (default: 0.8)
`topK`	Integer	Optional	Sample K high-probability candidates from the pool of generated token candidates. 0 ≤ `topK` ≤ 128 (default: 0)
`maxTokens`	Integer	Optional	Maximum number of generated tokens 1024 ≤ `maxTokens` ＜ 4096 (default: 1024)
`temperature`	Double	Optional	Degree of diversity for the generated tokens (higher values generate more diverse sentences) 0.00 ＜ `temperature` ≤ 1.00 (default: 0.50)
`repetitionPenalty`	Double	Optional	Degree of penalty for generating the same token (the higher the setting, the less likely it is to generate the same result repeatedly) 0.0 ＜ `repetitionPenalty` ≤ 2.0 (default: 1.1)
`stop`	Array	Optional	Token generation stop character [] (default)
`seed`	Integer	Optional	Adjust consistency level of output on model iterations. 0: Randomize consistency level (default). 1 ≤ `seed` ≤ 4294967295: `seed` value of result value you want to generate consistently, or a user-specified `seed` value
`includeAiFilters`	Boolean	Optional	Whether to display the AI filter results (degree of the generated results in categories such as profanity, degradation/discrimination/hate, sexual harassment/obscenity, etc.) `true` (default) \| `false` `true`: Display `false`: Not display
`tools`	Array	Required	List of tools available for `Function calling`: tools
`toolChoice`	String \| Object	Optional	`Function calling` tool call behavior `auto` : Model automatically calls tool (string). `none` : Model generates normal answer without calling tool (string). Model forces specific tool call (object).
`toolChoice.type`	String	Optional	Tool type to be called by the `Function calling` model
`toolChoice.function`	Object	Optional	Tool to be called by the `Function calling` model `function` (valid value)
`toolChoice.function.name`	String	Optional	Tool name to be called by the `Function calling` model

Note

The maximum number of input tokens for RAG Reasoning is 128,000, and the maximum number of output tokens is 4,096.

`messages`

The following describes messages.

Field	Type	Required	Description
`role`	Enum	Required	Role of conversation messages `system` \| `user` \| `assistant` \| `tool` `system`: Directives that define roles `user`: User utterances/questions `assistant`: Answers to user utterances/questions `tool`: Results of the function called by the assistant (model)
`content`	String	Required	Conversation message content Enter text (string).
`toolCalls`	Array	Conditional	Assistant call tool information If the `role` is a `tool`, enter as the assistant's toolCalls request.
`toolCallId`	String	Conditional	Tool ID Required if `role` is `tool` Used to connect to the assistant's toolCalls request

Note

If the role is tool, the content of messages should include the list of documents retrieved from the search database or search API (search_result). Include id: {unique ID of the document}, doc: {original document retrieved} in the search result (search_result) so that it can be used for citation marks in RAG answers. See below for an example.

{
    "role": "tool",
    "content": "[
                    {
                        \"search_result\": [{\"id\": \"doc-1493058999\",
                        \"doc\": \"Login with NAVER ID is only available for individual members. Business members can't use this feature.\"
                    },
                    ...
                 ]"
}

`tools`

The following describes tools.

Field	Type	Required	Description
`type`	String	Required	Tool type `function` (valid value)
`function`	Object	Required	Call `function` information
`function.name`	String	Required	`function` name
`function.description`	String	Required	`function` description
`function.parameters`	Object	Required	Parameter passed when using `function` `properties`, `required` Entry: See Operation method. Format: See JSON Schema reference.

`toolCalls`

The following describes toolCalls.

Field	Type	Required	Description
`id`	String	-	Tool identifier
`type`	String	-	Tool type `function` (valid value)
`function`	Object	-	Call `function` information
`function.name`	String	-	`function` name
`function.arguments`	Object	-	Parameter passed when using `function`

Request example

The request example is as follows:

Step 1. Enter your query in role: user and call the best function to generate the answer (Check response)

curl --location --request POST 'https://clovastudio.stream.ntruss.com/v1/api-tools/rag-reasoning' \
--header 'Authorization: Bearer <access_token>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "messages": [
    {
      "content": "How to rent an A100 GPU",
      "role": "user"
    }
  ],
  "tools": [
    {
      "function": {
        "description": "This is the tool you use to do Ncloud-related searches.\nUse the tool by breaking up your query if you need to ask multiple questions.\nIf you can't find information, you can use the tool again with suggested_queries as a reference without giving a final answer.",
        "name": "ncloud_cs_retrieval",
        "parameters": {
          "properties": {
            "query": {
              "description": "Refine and enter the user's search keywords.",
              "type": "string"
            }
          },
          "required": [
            "query"
          ],
          "type": "object"
        }
      },
      "type": "function"
    }
  ],
  "toolChoice": "auto",
  "maxTokens": 1024
}'

Step 2. Request with role: tool to generate the final answer (Check response)

curl --location --request POST 'https://clovastudio.stream.ntruss.com/v1/api-tools/rag-reasoning' \
--header 'Authorization: Bearer <access_token>' \
--header 'Content-Type: application/json' \
--data-raw '{
    "messages": [
        {
            "content": "How to rent an A100 GPU",
            "role": "user"
        },
        {
            "role": "assistant",
            "content": "",
            "toolCalls": [
                {
                    "id": "call_enTEYb0kWBjOwtkngbl7FGTm",
                    "type": "function",
                    "function": {
                        "name": "ncloud_cs_retrieval",
                        "arguments": {
                            "query": "How to rent an A100 GPU"
                        }
                    }
                }
            ]
        },
        {
            "content": "{\"search_result\": [{\"id\": \"doc-179\", \"doc\": \"GPU A100 can only be created in KR-1. When creating an A100, select a subnet in KR-1. Up to 5 GPU servers can be created for corporate members only.\"}, {\"id\": \"doc-248\", \"doc\": \"GPU A100 servers can be created in the Services > Compute > Server menu. For more information, see the Create server guide.\"}, {\"id\": \"doc-156\", \"doc\": \"For individual members who need more GPU servers or need to create a GPU server, please refer to the FAQ and contact Support.\"}]}",
            "role": "tool",
            "toolCallId": "call_enTEYb0kWBjOwtkngbl7FGTm"
        }
    ],
    "tools": [
        {
            "function": {
                "description": "This is the tool you use to do Ncloud-related searches.\nUse the tool by breaking up your query if you need to ask multiple questions.\nIf you can't find information, you can use the tool again with suggested_queries as a reference without giving a final answer.",
                "name": "ncloud_cs_retrieval",
                "parameters": {
                    "properties": {
                        "query": {
                            "description": "Refine and enter the user's search keywords.",
                            "type": "string"
                        }
                    },
                    "required": [
                        "query"
                    ],
                    "type": "object"
                }
            },
            "type": "function"
        }
    ]
}'

Response

This section describes the response format.

Response headers

The following describes the response headers.

Headers	Required	Description
`Content-Type`	-	Response data format `application/json`

Response body

The response body includes the following data:

Field	Type	Required	Description
`status`	Object	-	Response status
`result`	Object	-	Response result
`result.message`	ChatMessage	-	Conversation message list
`result.message.role`	Enum	-	Role of conversation messages `system` \| `user` \| `assistant` `system`: Directives that define roles `user`: User utterances/questions `assistant`: Answers of the model
`result.message.content`	String	-	Content of conversation messages
`result.message.thinkingContent`	String	-	Decision flow in model
`result.message.toolCalls`	Array	-	toolCalls
`result.usage`	Object	-	Token usage
`result.usage.completionTokens`	Integer	-	Generated token count
`result.usage.promptTokens`	Integer	-	Number of input (prompt) tokens
`result.usage.totalTokens`	Integer	-	Total number of tokens Number of generated tokens + number of input tokens

`toolCalls`

The following describes toolCalls.

Field	Type	Required	Description
`id`	String	-	Tool identifier
`type`	String	-	Tool type `function` (valid value)
`function`	Object	-	Call `function` information
`function.name`	String	-	`function` name
`function.arguments`	Object	-	Parameter passed when using `function`

Response example

The response example is as follows:

Example response to Step 1. (Check response)

{
    "status": {
        "code": "20000",
        "message": "OK"
    },
    "result": {
        "message": {
            "role": "assistant",
            "content": "",
            "thinkingContent": "The user has inquired about \"how to rent an A100 GPU\". To find the answer to this question, you need to use the tool "ncloud_cs_retrieval" to retrieve relevant information.",
            "toolCalls": [
                {
                    "id": "call_enTEYb0kWBjOwtkngbl7FGTm",
                    "type": "function",
                    "function": {
                        "name": "ncloud_cs_retrieval",
                        "arguments": {
                            "query": "How to rent an A100 GPU"
                        }
                    }
                }
            ]
        },
        "usage": {
            "promptTokens": 135,
            "completionTokens": 84,
            "totalTokens": 219
        }
    }
}

Example response to Step 2. (LLM returns the final answer) (Check response)

{
    "status": {
        "code": "20000",
        "message": "OK"
    },
    "result": {
        "message": {
            "role": "assistant",
            "content": "To rent an A100 GPU, <doc-248>you can create a GPU A100 server from the Services > Compute > Server menu in the NAVER Cloud Platform console.</doc-248>However, <doc-179>GPU A100 can only be created in KR-1, and you must select a subnet in KR-1 when creating an A100.</doc-179> Also, <doc-179>up to GPU servers can be created for corporate members only.</doc-179> If you need more GPU servers or are an individual member who needs to create a GPU server, <doc-156>please refer to the FAQ and contact Support.</doc-156>"
        },
        "usage": {
            "promptTokens": 332,
            "completionTokens": 146,
            "totalTokens": 478
        }
    }
}

Failure

The following is a sample response upon a failed call.