Visual Object Tracking

Prev Next

Available in Classic and VPC

Get basic information about objects and pose estimation results.

Preliminary steps

The preliminary steps for using the ARC eye VOT API are as follows:

1. Create object

Create an object in the console. For more information on creation, see the ARC eye User Guides.

2. Deploy API

When the object is created, an API with a Pending status will be created in the ARC eye > Visual Object Tracking > API menu of the NAVER Cloud Platform console. Select the API and deploy it. Once deployment is complete and the status is Completed, you can use the ARC eye VOT API.
For more information on deployment, see the ARC eye User Guides.

Request

This section describes the request format. The method and URI are as follows:

Method URI
POST InvokeURL

Request headers

For information about the headers common to all ARC eye APIs, see ARC eye request headers.

Request body

You can include the following data in the body of your request:

Field Type Required Description
image File Required RGB image for performing object detection
  • Image file extension: JPG, PNG
  • uuid String Required Unique identifier of the device that captured the image
    • UUID format
    timestamp Number Required Image creation time (millisecond)
    • Unix timestamp format
    intrinsic String Required Camera intrinsic parameters used for pose estimation
    distort String Required Camera distortion parameters used for pose estimation
    gravity String Optional Gravity direction used for pose estimation
    • If not entered, use the value calculated as extrinsic.
    extrinsic String Required Position and orientation (rotation and translation) between the camera used for pose estimation and the coordinate system
    videoid String Optional Video ID recognized by the VOT client
    imgid String Optional Image ID recognized by the VOT client
    keyframeid String Optional Key frame ID recognized by the VOT client

    Request example

    The request example is as follows:

    curl --location --request POST '{InvokeURL}' \
    --header 'X-ARCEYE-SECRET: {SecretKey}' \
    --header 'Content-Type: multipart/form-data' \
    --form 'image=@object.jpg \
    --form 'uuid=a4315b63-2d64-11ef-becb-005056a70a22' \
    --form 'timestamp=1718709558076' \
    --form 'intrinsic=1524.3942260742188,0.0,539.0464782714844,0.0,1524.3942260742188,950.2188720703125,0.0,0.0,1.0' \
    --form 'distort=0.0,0.0,0.0,0.0,0.0' \
    --form 'extrinsic=-0.5501050024473961,-0.03056710354932507,-0.834535882070361,-0.2494854635652485-0.42819610313094764,0.8682865385146257,0.25045275861479593,-0.16543658916848714,0.7169606569023306,,0.495120328016226,-0.4907374830184501,0.189104661569504620.0,0.0,0.0,1.0'
    

    Response

    This section describes the response format.

    Response body (success)

    The following describes the response body when the query is successful.

    Field Type Required Description
    result String - Response result
    • SUCCESS | FAILURE | ERROR
      • SUCCESS: succeeded
      • FAILURE: failed
      • ERROR: error
    version String - ARC eye VOT API deployment version
    projectid String - Project ID for which pose estimation was performed
    recvtime Number - When Poser (pose estimation) received the request from Detector (object detection) (millisecond)
    • Unix timestamp format
    timestamp Number - Creation time of the image used for the request (millisecond)
    • Unix timestamp format
    uuid String - Unique identifier of the device used for the request
    • UUID format
    status Number - Detection status code
    message String - Detection status message
    objects Array - Information about the objects used for pose estimation
    flpose Array - User pose calculated using target object

    objects

    The following describes objects.

    Field Type Required Description
    bbox2d Array(4,2) - Detector's detection region of interest (ROI) provided for pose estimation
    • 2D bounding box coordinates representing the boundaries of objects in the image
    conf_thresh Number - Confidence threshold for detected object
    • Closer to 0 means less confidence, closer to 1 means more confidence.
    corners2d Array(9,2) - 2D image coordinates corresponding to 3D object bounding box
    corners3d Array(9,3) - 3D object bounding box coordinates centered at (0, 0) in 3D space
    distort Array(5) - Distortion information of the camera lens used for pose estimation
    extrinsic Array(4,4) - Coordinates indicating the position and orientation (rotation and translation) between the camera used for pose estimation and the coordinate system
    global_pose Array(4,4) - Coordinates representing the position and orientation of the object in the global coordinate system
    global_prob Number - Confidence of global pose estimation
    • Closer to 0 means less confidence, closer to 1 means more confidence.
    intrinsic Array(3,3) - Coordinates representing internal characteristics such as focal length and optical center of the camera used for pose estimation
    message String - Pose estimation status information of the object
    obj_prob Number - Confidence of object detection
    • Closer to 0 means less confidence, closer to 1 means more confidence.
    objid String - Object ID used for pose estimation
    pose Array(4,4) - Object position and orientation coordinates
    poserid String - Video ID for which pose estimation was performed (same as videoid or objectid)
    projectid String - Project ID to which the object belongs
    sim_prob Number - Similarity confidence between object and model
    • Closer to 0 means less confidence, closer to 1 means more confidence.
    size Array(3) - Size information for the object
    • Width, height, and depth in order
    status boolean - Object pose estimation status
    • True | False
      • True: pose estimation successful
      • False: pose estimation failure
    type String - Object type

    Response body (failure)

    The following describes the response body when the query fails.

    Field Type Required Description
    result String - Response result
    • SUCCESS | FAILURE | ERROR
      • SUCCESS: succeeded
      • FAILURE: failed
      • ERROR: error
    running_time Object - Processing time information
    running_time.processing Number - Processing time (millisecond)
    timestamp Number - Creation time of the image used for the request (millisecond)
    • Unix timestamp format
    version String - ARC eye VOT API deployment version
    candidate_obj_id Number - Candidate object ID
    • If the value is 0, it means that there are no candidate objects.

    Response status codes

    For information about the HTTP status codes common to all APIs, see ARC eye response status codes.

    Response example

    The response example is as follows:

    Succeeded

    The following is a sample response upon a successful query.

    {
        "result": "SUCCESS",
        "version": "3.0.0-1",
        "projectid": "123",
        "recvtime": 1712213062.4720848,
        "timestamp": 1712213062.2513185,
        "uuid": "933ec42646f8",
        "status": 0,
        "message": "VOT_DETECTOR_SUCCESS",
        "objects": [
            {
                "bbox2d": [[677.0, 1472.0], [1466.0, 1472.0], [1466.0, 2861.0], [677.0, 2861.0]],
                "conf_thresh": 0.3,
                "corners2d": [[1048, 2119], [746, 2513], [726, 1473], [715, 2864], [686, 1564], [1297, 2500], [1335, 1477], [1408, 2842], [1472, 1568]],
                "corners3d": [[0.0, 0.0, 0.0],
                    [-0.29031120781734804, -0.5381274223327637, -0.3174518426978675],
                    [-0.29031120781734804, 0.5381274223327637, -0.3174518426978675],
                    [-0.29031120781734804, -0.5381274223327637, 0.3174518426978675],
                    [-0.29031120781734804, 0.5381274223327637, 0.3174518426978675],
                    [0.29031120781734804, -0.5381274223327637, -0.3174518426978675],
                    [0.29031120781734804, 0.5381274223327637, -0.3174518426978675],
                    [0.29031120781734804, -0.5381274223327637, 0.3174518426978675],
                    [0.29031120781734804, 0.5381274223327637, 0.3174518426978675]],
                "distort": [0.0, 0.0, 0.0, 0.0, 0.0],
                "extrinsic": [[-0.2429226121477801, -0.2506293885132776, 0.9371091260471003, 0.0],
                    [-0.009911837278829036, 0.9666372782061301, 0.2559572774195017, 0.0],
                    [-0.9699950309094669, 0.05288933725909484, -0.23730225033748942, 0.0],
                    [0.012476532015234023, 0.021933435026800476, 0.024992155870001847, 1.0]],
                "global_pose": [[1.0, 0.0, 0.0, 0.0],
                    [0.0, 1.0, 0.0, 0.0],
                    [0.0, 0.0, 1.0, 0.0],
                    [0.0, 0.0, 0.0, 1.0]],
                "global_prob": 1.0,
                "intrinsic": [[2834.314, 0.0, 1093.1019], [0.0, 2834.314, 1933.7009], [0.0, 0.0, 1.0]],
                "message": "VOT_OBJECT_POSE_SUCCESS",
                "obj_prob": 1.0,
                "objid": "rookie",
                "pose": [[0.9968414902687073, 0.014719659462571144, 0.0780409649014473, -0.03976171463727951],
                    [-0.006204552017152309, -0.9652349948883057, 0.26131001114845276, 0.16608235239982605],
                    [0.07917426526546478, -0.2609688639640808, -0.9620949625968933, 2.5332860946655273],
                    [0.0, 0.0, 0.0, 1.0]],
                "poserid": "14dfc56c-a640-4281-9e95-aa49b61d51bd",
                "projectid": "b385865f-67f3-4341-aca5-166d20df952b",
                "sim_prob": 0.9999999255277782,
                "size": [0.5806224156346961, 1.0762548446655273, 0.634903685395735],
                "status": True,
                "type": "normal"
            }
        ]
        "flpose": [[0.9999973932548811, -0.001259764878431, -0.0019043308258140002, 5.813885266846625], ....], 
    }
    

    Failure

    The following is a sample response upon a failed query.

    {
      "result": "FAILURE",
      "running_time": {
        "processing": 2
      },
      "timestamp": 1718709558076,
      "version": "3.0.0-1",
      "candidate_obj_id": 0
    }
    
    Note

    For response bodies and response examples when API calls fail, see ARC eye overview.