Megvii (旷视科技), previously known as Face++, claims its Nanjing based research center has developed the largest scale of retail product checkout dataset name as PRC for retail ACO (automatic check-out) in terms of the number of images and categories.
This dataset aims at increasing efficiency in the retail industry, which is relatively labor intensive. Megvii suggested that costs occurred during checkout constitute a large proportion of the total operating costs of retail outlets, and they see it is a trend to reduce the checkout costs by integrating computer vision into the retail industry to achieve ACO.
ACO is essentially an automatic solution that will automatically generate a shopping list from the images of products. The supposed benefits from applying an ideal ACO system are shortening the checkout time and lowering staff costs as it only requires computer vision technology to identify total products purchased and sum up the total payment.
Although ACO might sound like simple, it is not well studied in the computer vision community currently and faces the following challenges at the current stage listed as a result of the lack of high-quality dataset :
Lack of large scale of high-quality products images
The number of products could be large, or photos were taken in an environment different from the deployment scenario. Therefore, there is insufficient appropriate training photos pre-product.
Fine-grained nature of the product categories
Sometimes different products could share a great level of similarities, and products are subject to constant package changing. Thus, it will add some difficulties for computer vision to identify correct items.
Difficulties in collecting training images
Since the number of products is huge, it might be impractical to collect all sufficient images of each individual product. Meanwhile, the training images reflecting the realistic checkout scenarios is also insufficient
In order to improve the current ACO technology, Megvii tried to mimics the real-world ACO scenarios.
The first thing, Megvii tried to solve was increasing the number of images and product categories. The new data set proposed by Megvii contains 200 product categories and 83,739 images. To increase image recognition accuracy single-product images were taken in a controlled environment and multi-product checkout images taken at the checkout counter with various annotations are provided.
Secondly, they use exemplar images and checkout images to increase accuracy. As in exemplar images are basically for capturing multi-view appearances for every single product, while checkout images are mainly for gathering realistic checkout scenarios where each image includes multiple products.
Thirdly, in order to mimic the real checkout scenarios, products are randomly chosen, combined, and freely placed on the checkout background with random orientations. Occlusions and complex clutter are widely applied in Megvii’s RPC dataset.
Fourthly, RPC’s in a hierarchy structure meaning their 200 SKUs can be categorized as 17 meta-categories which cover diverse appearance such as bottle-like, box-like, canister-like. The SKUs under each meta-category tend to be fine-grained.
Fifthly, according to the number of products and product instances, images are split into three categories of easy mode, medium mode, and hard mode. For each clutter level, there will be a clutter level annotation attached which allows in-depth inspection of the model capacities.
Sixthly, for each individual RPC image, there are three types of annotations from weak, medium to strong. These three types of annotations are divided into shopping list, point-level, and bounding boxes. Shopping list is said to be the weakest level of annotation and the easiest to obtain in practice. And it records the SKU category and count of each product instance in the checkout image. Point-level annotation, which provides the central position and the SKU category of each product in the checkout image. Bounding boxes, the most labor-intensive annotation, provide bounding box and SKU category for each product.
For the evaluation, it adapts 4 protocols which are Checkout Accuracy (cAcc), Average Counting Distance (ACD), Mean Category Counting Distance (mCCD), Mean Category Intersection of Union (mCIoU).
CAcc is the most important metric for the ACO task according to Megvii team, which is used to measures the pass rate of an ACO system and thus reflects the practicality of the system. Meanwhile, ACD only considers the average number of counting errors, mCCD measures the average ratio of counting errors, and mCIoU measures the compatibility between the predicted shopping list and ground truth.
Moreover, Megvii came up with four baselines, which are Single, Syn, Render and Syn+Render to benchmark the dataset. It is to be noted that the data set came up by Megvii is more advanced than current datasets in this field, while there is substantial room to improve ACO performance.