Abstract
The reconstruction of 3D face shapes and expressions from single 2D
images remains unconquered due to the lack of detailed modeling of human
facial movements such as the correlation between the different parts of
faces. Facial action units (AUs), which represent detailed taxonomy of
the human facial movements based on observation of activation of muscles
or muscle groups, can be used to model various facial expression types.
We present a novel 3D face reconstruction framework called AU
feature-based 3D FAce Reconstruction using Transformer (AUFART) that can
generate a 3D face model that is responsive to AU activation given a
single monocular 2D image to capture expressions. AUFART leverages
AU-specific features as well as facial global features to achieve
accurate 3D reconstruction of facial expressions using transformers. We
also introduce a loss function which is to force the learning toward the
minimal discrepancy in AU activations between the input and rendered
reconstruction. The proposed framework achieves an average F1 score of
0.39, outperforming state-of-the-art methods.