Wildfire Classification using PETSc-based Support Vector Machines on
Distributed-Memory GPU-based Parallel Computers
Abstract
As high-resolution geospatiotemporal data sets from observatory
networks, remote sensing platforms, and computational Earth systems
increase in abundance, fidelity, and richness, machine learning
approaches that can fully utilize increasingly powerful parallel
computing resources are becoming essential for analysis and exploration
of such data sets. We explore one such approach, applying a
state-of-the-art distributed memory parallel implementation of Support
Vector Machine (SVM) classification to large remote-sensing data sets.
We have used MODIS 8-day surface reflectance (MOD09A1) and land surface
temperature (MOD11A2) for classifying wildfires over Alaska and
California. Monitoring Trends in Burn Severity (MTBS) burn perimeter
data was used to set boundaries of burned and unburned areas for our
two-class problem. MTBS covers years from 1984-2019, recording only
fires over 1000 acres or greater in the western United States. We seek
to find a parallel computing solution (using the PermonSVM solver,
described below) to accurately classify wildfires and find smaller
unrecorded wildfires. An initial assessment for wildfire classification
over interior Alaska shows that PermonSVM has an accuracy of 96% and
over 5000 false positives (i.e., fires unrecorded in MTBS). Next steps
include mapping larger regions over Alaska and California and
understanding the tradeoffs of scalability and accuracy. The parallel
tool we employ is PermonSVM, which is built on top of the widely-used
open source toolkit PETSc, the Portable, Extensible Toolkit for
Scientific Computation. Recent developments in PETSc have focused on
supporting cutting-edge GPU-based high-performance computing (HPC)
architectures, and these can be easily leveraged in PermonSVM by using
appropriate GPU-enabled matrix and vector types in PETSc. We achieve
significant GPU speedup for the SVM calculations on the Summit
supercomputer at Oak Ridge National Laboratory – currently one of the
best available “at scale” proxies for upcoming exascale-class
supercomputers – and are actively working to further improve
computational efficiency on Summit as well as on prototype exascale node
architectures.