Validation of SealNet
In this biometric system, the probe set refers to the collection of
biometric identities to be recognized, while the gallery set refers to
identities that have been previously enrolled into the system. The
gallery set acts as a database from which each probe identity will be
searched. We measured the accuracy of SealNet with two standard
recognition tasks: closed-set and open-set identification. Both
closed-set and open-set refer to 1:N matching scenarios where each
identity in the probe set will be searched against multiple identities
in the gallery. The SealNet face recognition software produces a
similarity score for each probe-gallery pair and the result will be
sorted in descending order so that the identity with the highest score
will be the most likely matched candidate. In closed-set identification,
it is guaranteed that the identity in the probe is present in the
gallery; whereas in open-set identification, it is uncertain whether
that is the case.
We validated SealNet’s face recognition capabilities using k-fold
cross-validation with k = 5. That is, we divide the training data into 5
sections/folds, and at some point, use each section as a test set. We
filtered the data to only include seals with more photos than the number
of folds.
For closed-set identification, in each fold, we summarized the accuracy
of our system using a Cumulative Match Characteristic (CMC) curve which
plots the True Positive Identification Rate (TPIR) against the ranking
of seals. TPIR measures the probability of observing a correct match
within each rank. A correct match between probe p and an identityg in the gallery has rank k if the similarity score
between p and g is the k th largest score .
For open-set identification, prior to splitting the dataset into 5
folds, we randomly select half of the seals with enough photos to be
eligible for training, and put them, along with all seals lacking
sufficient data to be included in training, into each of the testing
sets as new seals. This method of training exclusions provided the best
balance between the quantities of open-set testing photos and training
photos. Whenever a probe-gallery pair’s similarity score exceeded our
acceptance threshold, we “accepted” that individual; i.e., marked it
as having been seen before (i.e the probe has a match in the gallery),
while any probe with a similarity score tha was less than the threshold
value was rejected as an imposter.
Using the information on whether the probe was truly an imposter or not,
all probes were categorized as follows: True Positives (TP) scored above
the threshold and correct match was predicted within top “Rank”
similarity scores. False Positives (FP) scored above the threshold but
had no true match in gallery. False Negatives (FN) contained a match in
gallery but had a top similarity score below the threshold, or the
correct prediction for gallery member was not within the top “Rank”
similarity scores. True Negatives (TN) had no match in the gallery and
top predicted match had a similarity score below the threshold. Accuracy
is measured as the ratio between the sum of TP and TN over all queries,
which is equivalent to \(\frac{TP+TN}{TP+TN+FP+FN}\). This
formula is identical for open and closed set, but since the closed set
inherently has no ‘True Negatives’ or ‘False Positives’, the closed set
accuracy computation can be simplified to \(\frac{\text{TP}}{TP+FN}\).