In order to resolve the contradiction between the rapid growth of wind turbines installed capacity and the lagging operation and maintenance technology. This article uses Supervisory Control And Data Acquisition (SCADA) system data to realize wind turbines fault diagnosis and early warning research. First, this article proposes a method for cleaning abnormal data in SCADA systems. It combines Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Median Absolute Deviation (MAD) to estimate the normal power range. It provides high-quality health data samples for further research, based on the results obtained. Then, for wind turbines fault diagnosis, this article designs a Lightweight Gradient Boosting Machine (LightGBM) model based on a Genetic Algorithm (GA). This model can effectively diagnose different types of faults in wind turbines. Finally, this article proposes a Multivariate State Estimation Technique (MSET) normal model based on the DBSCAN algorithm for fault warning in wind turbines. This article designs two threshold setting methods: fixed threshold and adaptive dynamic threshold. Fault warning is conducted by calculating the residuals between predicted and actual values.