RAG Reasoning

Prev Next

Available in Classic and VPC

Generate evidence-based RAG answers by leveraging RAG Reasoning model trained on answer types, such as citation sources that increase credibility and citation source indexing notations. RAG Reasoning calls the engine in the function calling format. You can specify a single or multiple RAG functions, and LLM will autonomously select the best function for the context to generate search augmentation. When used in conjunction with Reranker, you can get more stable results.

Request

This section describes the request format. The method and URI are as follows:

Method URI
POST /v1/api-tools/rag-reasoning

Request headers

The following describes the request headers.

Field Required Description
Authorization Required API key for authentication Example: Bearer nv-************
X-NCP-CLOVASTUDIO-REQUEST-ID Optional Request ID for the request
Content-Type Required Request data format
  • application/json
  • Request body

    You can include the following data in the body of your request:

    Field Type Required Description
    messages Array Required Conversation messages
    topP Double Optional Sample generated token candidates based on cumulative probability.
    • 0.00 < topP ≤ 1.00 (default: 0.8)
    topK Integer Optional Sample K high-probability candidates from the pool of generated token candidates
    • 0 ≤ topK ≤ 128 (default: 0)
    maxTokens Integer Optional Maximum number of generated tokens
    • maxTokens < 4096 (default: 1024)
    temperature Double Optional Degree of diversity for the generated tokens (higher values generate more diverse sentences)
  • 0.00 < temperature ≤ 1.00 (default: 0.50)
  • repetitionPenalty Double Optional Degree of penalty for generating the same token (the higher the setting, the less likely it is to generate the same result repeatedly)
  • 0.0 < repetitionPenalty ≤ 2.0 (default: 1.1)
  • stop Array Optional Token generation stop character
  • [] (default)
  • seed Integer Optional Adjust consistency level of output on model iterations.
  • 0: Randomize consistency level (default).
  • 1 ≤ seed ≤ 4294967295: seed value of result value you want to generate consistently, or a user-specified seed value
  • includeAiFilters Boolean Optional Whether to display the AI filter results (degree of the generated results in categories such as profanity, degradation/discrimination/hate, sexual harassment/obscenity, etc.)
    • true (default) | false
      • true: display
      • false: not display
    tools Array Required List of tools available for Function calling: tools
    toolChoice String | Object Optional Function calling tool call behavior
    • auto : Model automatically calls tool (string).
    • none : Model generates normal answer without calling tool (string).
    • Model forces specific tool call (object).
    toolChoice.type String Optional Tool type to be called by the Function calling model
    toolChoice.function Object Optional Tool to be called by the Function calling model
    • function (valid value)
    toolChoice.function.name String Optional Tool name to be called by the Function calling model

    messages

    The following describes messages.

    Field Type Required Description
    role Enum Required Role of conversation messages
    • system | user | assistant | tool
      • system: directives that define roles
      • user: user utterances/questions
      • assistant: answers to user utterances/questions
      • tool: results of the function called by the assistant (model)
    content String Required Conversation message content
    • Enter text (string).
    toolCalls Array Conditional Assistant call tool information
  • If the role is a tool, enter as the assistant's toolCalls request.
  • toolCallId String Conditional Tool ID
    • Required if role is tool
    • Used to connect to the assistant's toolCalls request
    Note

    If the role is tool, the content of messages should include the list of documents retrieved from the search database or search API (search_result). Include id: {unique ID of the document}, doc: {original document retrieved} in the search result (search_result) so that it can be used for citation marks in RAG answers. See below for an example.

    {
        "role": "tool",
        "content": "[
                        {
                            \"search_result\": [{\"id\": \"doc-1493058999\",
                            \"doc\": \"Login with NAVER ID is only available for individual members. Business members can't use this feature.\"
                        },
                        ...
                     ]"
    }
    

    tools

    The following describes tools.

    Field Type Required Description
    type String Required Tool type
    • function (valid value)
    function Object Required Call function information
    function.name String Required function name
    function.description String Required function description
    function.parameters Object Required Parameter passed when using function

    toolCalls

    The following describes toolCalls.

    Field Type Required Description
    id String - Tool identifier
    type String - Tool type
    • function (valid value)
    function Object - Call function information
    function.name String - function name
    function.arguments Object - Parameter passed when using function

    Request example

    The request example is as follows:

    • Step 1. Enter your query in role: user and call the best function to generate the answer (Check response)
      curl --location --request POST 'https://clovastudio.stream.ntruss.com/v1/api-tools/rag-reasoning' \
      --header 'Authorization: Bearer <access_token>' \
      --header 'Content-Type: application/json' \
      --data-raw '{
        "messages": [
          {
            "content": "How to rent an A100 GPU",
            "role": "user"
          }
        ],
        "tools": [
          {
            "function": {
              "description": "This is the tool you use to do Ncloud-related searches.\nUse the tool by breaking up your query if you need to ask multiple questions.\nIf you can't find information, you can use the tool again with suggested_queries as a reference without giving a final answer.",
              "name": "ncloud_cs_retrieval",
              "parameters": {
                "properties": {
                  "query": {
                    "description": "Refine and enter the user's search keywords.",
                    "type": "string"
                  }
                },
                "required": [
                  "query"
                ],
                "type": "object"
              }
            },
            "type": "function"
          }
        ],
        "toolChoice": "auto",
        "maxTokens": 1024
      }'
      
    • Step 2. Request with role: tool to generate the final answer (Check response)
      curl --location --request POST 'https://clovastudio.stream.ntruss.com/v1/api-tools/rag-reasoning' \
      --header 'Authorization: Bearer <access_token>' \
      --header 'Content-Type: application/json' \
      --data-raw '{
          "messages": [
              {
                  "content": "How to rent an A100 GPU",
                  "role": "user"
              },
              {
                  "role": "assistant",
                  "content": "",
                  "toolCalls": [
                      {
                          "id": "call_enTEYb0kWBjOwtkngbl7FGTm",
                          "type": "function",
                          "function": {
                              "name": "ncloud_cs_retrieval",
                              "arguments": {
                                  "query": "How to rent an A100 GPU"
                              }
                          }
                      }
                  ]
              },
              {
                  "content": "{\"search_result\": [{\"id\": \"doc-179\", \"doc\": \"GPU A100 can only be created in KR-1. When creating an A100, select a subnet in KR-1. Up to 5 GPU servers can be created for corporate members only.\"}, {\"id\": \"doc-248\", \"doc\": \"GPU A100 servers can be created in the Services > Compute > Server menu. For more information, see the Create server guide.\"}, {\"id\": \"doc-156\", \"doc\": \"For individual members who need more GPU servers or need to create a GPU server, please refer to the FAQ and contact Support.\"}]}",
                  "role": "tool",
                  "toolCallId": "call_enTEYb0kWBjOwtkngbl7FGTm"
              }
          ],
          "tools": [
              {
                  "function": {
                      "description": "This is the tool you use to do Ncloud-related searches.\nUse the tool by breaking up your query if you need to ask multiple questions.\nIf you can't find information, you can use the tool again with suggested_queries as a reference without giving a final answer.",
                      "name": "ncloud_cs_retrieval",
                      "parameters": {
                          "properties": {
                              "query": {
                                  "description": "Refine and enter the user's search keywords.",
                                  "type": "string"
                              }
                          },
                          "required": [
                              "query"
                          ],
                          "type": "object"
                      }
                  },
                  "type": "function"
              }
          ]
      }'
      

    Response

    This section describes the response format.

    Response headers

    The following describes the response headers.

    Headers Required Description
    Content-Type - Response data format
    • application/json

    Response body

    The response body includes the following data:

    Field Type Required Description
    status Object - Response status
    result Object - Response result
    result.message ChatMessage - Conversation message list
    result.message.role Enum - Role of conversation messages
    • system | user | assistant
      • system: directives that define roles
      • user: user utterances/questions
      • assistant: answers of the model
    result.message.content String - Content of conversation messages
    result.message.thinkingContent String - Decision flow in model
    result.message.toolCalls Array - toolCalls
    result.usage Object - Token usage
    result.usage.completionTokens Integer - Generated token count
    result.usage.promptTokens Integer - Number of input (prompt) tokens
    result.usage.totalTokens Integer - Total number of tokens
    • Number of generated tokens + number of input tokens

    toolCalls

    The following describes toolCalls.

    Field Type Required Description
    id String - Tool identifier
    type String - Tool type
    • function (valid value)
    function Object - Call function information
    function.name String - function name
    function.arguments Object - Parameter passed when using function

    Response example

    The response example is as follows:

    • Example response to Step 1. (Check response)

      {
          "status": {
              "code": "20000",
              "message": "OK"
          },
          "result": {
              "message": {
                  "role": "assistant",
                  "content": "",
                  "thinkingContent": "The user has inquired about \"how to rent an A100 GPU\". To find the answer to this question, you need to use the tool "ncloud_cs_retrieval" to retrieve relevant information.",
                  "toolCalls": [
                      {
                          "id": "call_enTEYb0kWBjOwtkngbl7FGTm",
                          "type": "function",
                          "function": {
                              "name": "ncloud_cs_retrieval",
                              "arguments": {
                                  "query": "How to rent an A100 GPU"
                              }
                          }
                      }
                  ]
              },
              "usage": {
                  "promptTokens": 135,
                  "completionTokens": 84,
                  "totalTokens": 219
              }
          }
      }
      
    • Example response to Step 2. (LLM returns the final answer) (Check response)

      {
          "status": {
              "code": "20000",
              "message": "OK"
          },
          "result": {
              "message": {
                  "role": "assistant",
                  "content": "To rent an A100 GPU, <doc-248>you can create a GPU A100 server from the Services > Compute > Server menu in the NAVER Cloud Platform console.</doc-248>However, <doc-179>GPU A100 can only be created in KR-1, and you must select a subnet in KR-1 when creating an A100.</doc-179> Also, <doc-179>up to GPU servers can be created for corporate members only.</doc-179> If you need more GPU servers or are an individual member who needs to create a GPU server, <doc-156>please refer to the FAQ and contact Support.</doc-156>"
              },
              "usage": {
                  "promptTokens": 332,
                  "completionTokens": 146,
                  "totalTokens": 478
              }
          }
      }
      

    Failure

    The following is a sample response upon a failed call.