Wenjun Cui

and 2 more

Previous tornado climatology studies often rely on tornado reports, which can suffer from underreporting bias due to heavy reliance on human reporting. This study aims to develop a new tornado climatology based on storm object data built from the Multi-Radar/Multi-Sensor reanalysis and Rapid Refresh modelsimulated near storm environment (NSE) variables. The dataset includes millions of storm clusters from 2005 to 2011, each characterized by over 1,000 radar and NSE variables and derivatives, making it ideal for developing machine learning models to classify tornadic storms. By using storm cluster data collocated with tornado reports, ML models are built using two methods: Random Forest (RF) and TabNet, a deep learning method. Three sub-models for each method, with different tornado damage-rating predictions are developed to understand the effects of sample size and model setup on performance. Both methods produce similar results, with overall Critical Success Index scores ranging from 0.720 to 0.826. TabNet predictions emphasize radar-related predictors, whereas RF predictions rely more on environmental variables with physical relationships to rotating thunderstorms. The RF model slightly outperforms TabNet and is therefore used to assign tornado probabilities to all remaining clusters. The new tornado climatology based on this RF model aligns well with observed data in terms of the magnitude of weak and strong tornadoes occurring each year. The spatial distribution of tornado days also resembles observations, providing confidence in the model's use for future applications.