Abstract
Machine learning and deep learning are becoming important tools for processing video in artificial intelligence applications, especially real-time tasks that require speed, accuracy, and flexibility. For this reason, we introduce a crowd counting and detecting system from RTSP video streams using a deep learning model. Our system uses FPGA cards, i.e. Xilinx Alveo U30 and U200, to accelerate the transmission of video streams and the deep learning inference. In the input and output stream, Vitis Video Analysis SDK GStreamer is utilized to leverage the features of Alveo U30 for streaming RTSP videos. In the deep learning inference, we apply the trained YOLOX model to detect and count people from video frames. YOLOX is accelerated by Alveo U200 based on the Mipsology Zebra framework. The proposed system not only processes multiple streams but also achieves faster inference and lower CPU usage than the system that just uses CPU for deep learning inference.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2024 Array