Table 2. Comparison of open-set performance between SealNet and PrimNet for key metrics of model evaluation. In open-set evaluation, any probe with a similarity score for its best match in the gallery less than the value of the threshold was rejected as an ‘imposter’. True Positives scored above the threshold and correct match was predicted within top “Rank” similarity scores (TPR). False Positives scored above the threshold but had no true match in gallery (FPR). False Negatives contained a match in gallery but had a top similarity score below the threshold, or the correct prediction for gallery member was not within the top “Rank” similarity scores (FNR). True Negatives had no match in the gallery and top predicted match had a similarity score below the threshold (TNR). Baseline accuracy is the accuracy score of the model assuming all probes were rejected. F1-Score provides a better measure of propensity for incorrect classifications than accuracy, suited to unbalanced datasets.