Mahmudul Haque -

Automatic Violence Detection and Classification (AVDC) with deep learning has garnered significant attention in computer vision research. This paper presents a novel approach to combining a custom Deep Convolutional Neural Network (DCNN) with a Gated Recurrent Unit (GRU) in developing a new AVDC model called BrutNet. Specifically, we develop a time-distributed DCNN (TD-DCNN) to generate a compact 2D representation with 512 spatial features per frame from a set of equally-spaced frames of dimension 160×90 in short video segments. Further to leverage the temporal information, a GRU layer is utilised, generating a condensed 1D vector that enables binary classification of violent or non-violent content through multiple dense layers. Overfitting is addressed by incorporating dropout layers with a rate of 0.5, while the hidden and output layers employ rectified linear unit (ReLU) and sigmoid activations, respectively. The model is trained on the NVIDIA Tesla K80 GPU through Google Colab, demonstrating superior performance compared to existing models across various video datasets, including hockey fights, movie fights, AVD, and RWF-2000. Notably, our model stands out by requiring only 3.416 million parameters and achieving impressive test accuracies of 97.62%, 100%, 97.22%, and 86.43% on the respective datasets. Thus, BrutNet exhibits the potential to emerge as a highly efficient and robust AVDC model in support of greater public safety, content moderation and censorship, computer-aided investigations, and law enforcement.