Validation of SealNet
In this biometric system, the probe set refers to the collection of biometric identities to be recognized, while the gallery set refers to identities that have been previously enrolled into the system. The gallery set acts as a database from which each probe identity will be searched. We measured the accuracy of SealNet with two standard recognition tasks: closed-set and open-set identification. Both closed-set and open-set refer to 1:N matching scenarios where each identity in the probe set will be searched against multiple identities in the gallery. The SealNet face recognition software produces a similarity score for each probe-gallery pair and the result will be sorted in descending order so that the identity with the highest score will be the most likely matched candidate. In closed-set identification, it is guaranteed that the identity in the probe is present in the gallery; whereas in open-set identification, it is uncertain whether that is the case.
We validated SealNet’s face recognition capabilities using k-fold cross-validation with k = 5. That is, we divide the training data into 5 sections/folds, and at some point, use each section as a test set. We filtered the data to only include seals with more photos than the number of folds.
For closed-set identification, in each fold, we summarized the accuracy of our system using a Cumulative Match Characteristic (CMC) curve which plots the True Positive Identification Rate (TPIR) against the ranking of seals. TPIR measures the probability of observing a correct match within each rank. A correct match between probe p and an identityg in the gallery has rank k if the similarity score between p and g is the k th largest score .
For open-set identification, prior to splitting the dataset into 5 folds, we randomly select half of the seals with enough photos to be eligible for training, and put them, along with all seals lacking sufficient data to be included in training, into each of the testing sets as new seals. This method of training exclusions provided the best balance between the quantities of open-set testing photos and training photos. Whenever a probe-gallery pair’s similarity score exceeded our acceptance threshold, we “accepted” that individual; i.e., marked it as having been seen before (i.e the probe has a match in the gallery), while any probe with a similarity score tha was less than the threshold value was rejected as an imposter.
Using the information on whether the probe was truly an imposter or not, all probes were categorized as follows: True Positives (TP) scored above the threshold and correct match was predicted within top “Rank” similarity scores. False Positives (FP) scored above the threshold but had no true match in gallery. False Negatives (FN) contained a match in gallery but had a top similarity score below the threshold, or the correct prediction for gallery member was not within the top “Rank” similarity scores. True Negatives (TN) had no match in the gallery and top predicted match had a similarity score below the threshold. Accuracy is measured as the ratio between the sum of TP and TN over all queries, which is equivalent to \(\frac{TP+TN}{TP+TN+FP+FN}\). This formula is identical for open and closed set, but since the closed set inherently has no ‘True Negatives’ or ‘False Positives’, the closed set accuracy computation can be simplified to \(\frac{\text{TP}}{TP+FN}\).