Yilin Zhang -

In recent years,artificial intelligence technologies have been widely used, bringing security, convenience, and certain risks. Deepfake techniques raise significant security concerns by manipulating facial images to create convincing but false representations. We introduce a novel mask-supervision-based deep forgery detection method to improve the detection performance in this context. Our approach corrects the model’s focus on irrelevant regions through mask supervision, using pixel-level labels to guide the model towards synthetic regions of the face, ensuring more accurate extraction of spatial features. In addition, we incorporate a frequency-domain feature extraction module that exploits the robustness of frequency-domain cues to compression artifacts. We first preprocess the input image and then feed it into the mask supervision and frequency domain feature extraction modules. The mask supervision module extracts intermediate features using the HRNet network and refines the spatial features by guiding the prediction mask with the true mask. The frequency domain module extracts the features by applying the $DCT$ transform and filtering in different frequency bands. Finally, the spatial and frequency domain features are connected to the classification network to output the final prediction results. Experimental results show that our method maintains good robustness in compressed scenarios.