Consensus between the six call quality filtered datasets was relatively low, except when considering the two Manta datasets (~76%, n = 29,219 SVs). The next two tools with the highest proportion of agreement were the two long-read based call sets for CuteSV and Sniffles (~17 - 49% agreement, n = 1,099 SVs). The overall agreement between datasets tends to decrease as more tools are included in comparisons, leaving only 94 SVs (90 deletions, 4 duplications) overlapping in all six datasets (Figure 1). These SVs, ranging in size from 314bp to more than 20kb, were challenging to consistently genotype. Few passed genotype thresholds in each dataset, this included twelve deletions and two duplications in both Manta datasets, five deletions in the Smoove dataset and one deletion in the CuteSV dataset. It is challenging to glean a pattern in the overall agreement between datasets given the variability in the number of SVs passing call quality thresholds. For example, Sniffles tended to have more calls overlap with short-read based call sets than CuteSV. However, the filtered Sniffles call set was more than twice the size of the filtered CuteSV call set.