DEVELOPMENT OF A METHOD BASED ON OBJECT DETECTION FOR REAL-TIME PERSON LOCATION DETECTION IN A CONFINED SPACE
DOI:
https://doi.org/10.32782/IT/2024-3-2Keywords:
person indoor localization, person detection, perspective transformation, deep learning, convolutional neural network, YOLO, NVIDIA Jetson NanoAbstract
Real-time person detection provides an opportunity to solve such a complex problem as person location detection in a confined space. The solution to this issue lies in the implementation of an effective method to localize a person inside a confined space (for example, inside the room) since outdoor positioning systems like GPS do not provide high accuracy indoors. Existing computer systems that solve this problem require specialized infrastructure: devices attached to the human body, sensors, etc. This approach is not cheap and does not provide a universal solution. A device that is present in almost every building is a camera. Many existing computer systems that analyze the video stream use Kinect depth cameras, which are outdated and require additional installation. There is a limited number of solutions that analyze video stream from an RGB camera in combination with computer vision methods for person localization. Therefore, research and development of a more effective method for the above-mentioned problem using computer vision is relevant. The aim of the work is to develop a method of localizing a person in a confined space that is efficient in terms of speed and accuracy, which would use the video stream of the camera in combination with the computer vision method – object detection. The method should work on an NVIDIA Jetson Nano microcomputer (which is a relatively cheap and popular solution from NVIDIA) in real-time. The methodology for solving the problem is to leverage a deep neural network to detect the person in realtime and then use a perspective transformation algorithm to estimate the person’s location. A person’s location is the center point of the bottom edge of the bounding box transformed from the camera perspective in a way as if the camera was positioned directly above the floor. YOLOv4-tiny neural network model was trained on the COCO and Open Images datasets using the Darknet deep learning framework. The scientific novelty is that the method for person indoor localization was developed, which is based on the combination of a person detection method using a deep convolutional neural network and perspective transformation algorithm for further location estimation in a confined space. The proposed method is more versatile than known methods that use Kinect depth cameras. The proposed method can work on a microcomputer and estimate the location of several people in one pass with an average error of 23 cm and with a speed of 16 FPS, which is superior to the known alternative approaches. Conclusions. The problem of real-time person location detection in a confined space and means of solving it based on object detection using a deep convolutional neural network are studied. A neural network, based on the YOLOv4-tiny model, was trained using the COCO and Open Images datasets, and showed an accuracy of 55.1% and 71.4%, respectively. A method has been developed that uses a trained neural network to determine a bounding box around a person in the frame, and then determines its position using a perspective transformation algorithm: the method works on an NVIDIA Jetson Nano microcomputer with an average error of 23 cm and a speed of 16 FPS, processing a video stream from the RGB camera.
References
Mautz R. Indoor Positioning Technologies. Zurich:Institute of Geodesy and Photogrammetry, 2012.
Kerdjidj O., Himeur Y., Sohail S. S., Amira A., Fadli F., Atalla S., Mansoor W., Copiaco A., Gawanmeh A., Miniaoui S., Dawoud D. W. Uncovering the Potential of Indoor Localization: Role of Deep and Transfer Learning. IEEE Access. 2024. Вип. 12. С. 73980–74010.
Cosma A., Radoi I. E., Radu V. CamLoc: Pedestrian Location Detection from Pose Estimation on Resource-constrained Smart-cameras. 2018.
Wang H., Wang G., Li X. An RGB-D camera-based indoor occupancy positioning system for complex and densely populated scenarios. Indoor and Built Environment. 2023. Вип. 32, № 6. С. 1198–1212.
Carro-Lagoa Á., Barral V., González-López M., Escudero C. J., Castedo L. Multicamera edge-computing system for persons indoor location and tracking. Internet of Things. 2023. Вип. 24.
Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014.
Girshick R. Fast R-CNN. 2015.
Ren S., He K., Girshick R., Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. 2016.
Redmon J., Divvala S., Girshick R., Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. 2016.
Bochkovskiy A., Wang C.-Y., Liao H.-Y. M. YOLOv4: Optimal Speed and Accuracy of Object Detection. 2020.
Howard A., Sandler M., Chu G., Chen L.-C., Chen B., Tan M., Wang W., Zhu Y., Pang R., Vasudevan V., Le Q. V., Hartwig A. Searching for MobileNetV3. 2019.
Iandola F. N., Han S., Moskewicz M. W., Ashraf K., Dally W. J., Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. 2016.
Yolo v4, v3 and v2 for Windows and Linux. URL: https://github.com/AlexeyAB/darknet (дата звернення: 23.08.2024).
Lin T.-Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C. L., Dollár P. Microsoft COCO: Common Objects in Context. 2015.
Kuznetsova A., Rom H., Alldrin N., Uijlings J., Krasin I., Pont-Tuset J., Kamali S., Popov S., Malloci M., Kolesnikov A., Duerig T., Ferrari V. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. 2020.
OpenCV Perspective Transformation. URL: https://medium.com/analytics-vidhya/opencv-perspectivetransformation-9edffefb2143 (дата звернення: 23.08.2024).