Recogniser evaluation and detection of false positives
Recogniser performance was evaluated on a manually verified and balanced
subsample of 1-minute sound recordings that were categorised for each
species’ presence or absence. The sample size of evaluation files varied
among species. Any detection returned in a recording where the
species was present was taken to be a true positive detection. All other
detections were deemed false positives. For each template, we quantified
the number of call detections in sound files where the species was
present (true positive count; Count TP) and absent (false positive
count; Count FP) and the number of sound files in which the
presence/absence (PA) of species was correctly detected (true presence:
PA TP), incorrectly detected (false presence; PA FP), missed (false
absence; PA FN) or correctly undetected (true absence; PA TN). Using
these values, for each template we calculated precision, recall and ROC
value, which are given by the formulas: