Abstract
Advancements in deep learning and vision-based activity recognition development have significantly improved the safety, continuous monitoring, productivity, and cost of the earthwork site. The construction industry has adopted the CNN and RNN models to classify the different activities of construction equipment and automate the construction operations. However, the currently available methods in the industry classify the activities based on the visual information of current frames. To date, the adjacent visual information of current frames has not been simultaneously examined to recognize the activity in the construction industry. This paper proposes a novel methodology to classify the activities of the excavator by processing the visual information of video frames adjacent to the current frame. This paper follows the CNN-BiLSTM standard deep learning pipeline for excavator activity recognition. First, the pre-trained CNN model extracted the sequential pattern of visual features from the video frames. Then BiLSTM classified the different activities of the excavator by analyzing the output of the pre-trained convolutional neural network. The forward and backward LSTM layers stacked on help the algorithm compute the output by considering previous and upcoming frames’ visual information. Experimental results have shown the average precision and recall to be 87.5% and 88.52%, respectively.
Original language | English |
---|---|
Article number | 272 |
Journal | Applied Sciences (Switzerland) |
Volume | 13 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2023 |
Keywords
- activity recognition
- computer vision
- convolution neural network (CNN)
- Googlenet
- long short-term memory (LSTM)
- visual features