WiMi to work on multi-channel CCN-based 3D object detection algorithm

Coleda Bureau
January 17, 2024

WiMi Hologram Cloud Inc. (NASDAQ: WIMI) (“WiMi” or the “Company”), a global leader in Hologram Augmented Reality (“AR”) technology, today announced that its R&D team is working on a 3D Object detection algorithm works on multi-channel convolutional neural networks. It uses RGB, depth, and BEV images as inputs to the network to regress the object’s category, 3D size, and spatial location, respectively. The algorithm combines a multi-channel neural network system to achieve 3D object detection.

BEV images provide information perpendicular to the camera’s point of view and can show the spatial distribution of objects. The BEV images are generated using point cloud projection and used as input to the neural network to improve the accuracy of 3D object detection. With CNN processing the input point cloud data directly, the problem of encoding and feature extraction of disordered point clouds can be solved to obtain end-to-end regression of 3D bounding boxes. The algorithm extracts only 3D suggestion frames from monocular images and estimates 3D bounding frames, then combines laser point clouds with visual information and projects the point clouds into the BEV images. The algorithm feeds the information into a CNN and fuses multiple pieces of information to estimate the 3D bounding box. The fusion of multiple pieces of information enables better recognition of objects in 3D space.

WiMi’s 3D object detection algorithm, which can simultaneously identify the category, spatial position and 3D size of objects, greatly improves the accuracy and efficiency of object detection. The multi-channel neural object recognition system enables 3D object recognition and extends the input to RGB, depth and BEV images. First, RGB image, depth image, and BEV image are used as network input, and then the feature map is obtained from CNN. And the feature vector of the proposed region in the feature map is generated using a spatial pyramid pooling layer, and then the classification and positional regression of the object is realized using a classifier and regressor. The classifier is mainly used to determine which class the extracted features in the proposal belong to. Finally, a multitask regression of two fully connected layers is performed to predict object classes and 3D bounding boxes.

The detection and recognition of 3D objects has always been a crucial technology in computer vision. It is the machine’s basis for understanding and interacting with the outside world. 3D object recognition technology can be widely used in navigation, intelligent robotics, unmanned vehicles and security surveillance.