Bahareh Javid

and 1 more

Text generation leverages artificial intelligence to generate natural language texts automatically. One of the applications of text generation is in text classification. Many of the real-world problems involve imbalanced text data, which may reduce the accuracy and efficiency of classification. A common approach to address the problem of imbalanced data is oversampling the minority class. Considering the success and progress of generative adversarial networks (GANs) in data generation, these models can be used to generate text samples in oversampling. Text generation using GANs is a complex issue due to the discrete nature of text. Despite their potentials, the use of generative methods in resolving the problem of imbalanced text data is rarely studied. We investigate, compare and analyze the use of GANs in improving the efficiency of text classification in presence of class imbalance. We apply three models based on the state-of-art GANs to resolve the imbalanced text problem, and compare the classification results with oversampling based on traditional methods such as SMOTE and ADASYN. The experiments on different datasets reveal that oversampling with GANs results in a larger classification improvement. We also investigate and compare the quality and diversity of the text generated by GANs, and further analyze the classification performance in accordance to these aspects. The results of this study can also aid in analyzing the current capacities and deficiencies of generating text with GANs.