Ci Lin

and 4 more

In traditional approaches to time series data augmentation, the focus has largely been on refining the architecture of Generative Adversarial Networks (GANs) to better align with the original data distribution while attempting to preserve the dynamic behavior inherent in the data. However, GANs inherently struggle to accurately retain the temporal dynamics of the original time series, primarily due to the absence of first-order difference information. To address this gap, we propose a novel framework, titled the GAN-MCMC approach, for generating multivariate time series data that integrates two key modules: (a) a GAN-based module for generating multivariate time series, and (b) an MCMC-based module designed to preserve the first-order difference distribution. This integrated approach ensures that the synthetic data not only replicates the original data distribution but also retains its dynamic properties.In this study, the multivariate time series data used were collected from Area X.O, which was employed to predict N2O emissions from farming. This dataset is ideal for our analysis because it originates from a complex dynamic system, and the equipment used to gather the data is prohibitively expensive to deploy on a wide scale. Therefore, data augmentation techniques to generate synthetic agricultural data are both necessary and valuable for improving the predictive models. A central aspect of the GAN-MCMC approach is adjusting the β factor in the modified Metropolis-Hastings algorithm, which is the core algorithm in the MCMC module. The β factor controls the extent to which information from the original time series is preserved. Our experiments demonstrate that small values of β effectively retain periodic information, and the joint distribution of the firstorder differences in the synthetic data remains consistent when the same β is used in the algorithm. Additionally, the memoryless property of the Markov Chain is preserved in the generated data, and we employ an exponential moving average (EMA) technique to simulate the long-term relationships present in the original time series.Finally, we use the synthetic time series data to train Long Short-Term Memory networks (LSTMs). Our results show that LSTMs trained on synthetic data generated by the GAN-MCMC framework outperform those trained on synthetic data produced by other GANs. The link to source code:  https://github.com/Developer2046/GAN-MCMC  

Ci Lin

and 4 more