On-site earthquake early warning techniques, which issue alerts based on seismic waves measured at a single station, are promising, and have performed quite successfully during some damaging earthquakes. Conventionally, most existing techniques extract several P-wave features from the first few seconds of seismic waves after the trigger to predict the intensity or destructiveness of an incoming earthquake. This type of technique neglects the behavior of temporal varying features within P waves. In other words, the characteristics of data sequences are not considered. In this study, a long short-term memory (LSTM) neural network, which was capable of learning order dependence in seismic waves, was employed to predict the PGA of the coming earthquake. A dense LSTM architecture was proposed and a large data set of earthquakes was used to train the LSTM model. The general performance of the LSTM model indicated that the predicted PGA values were quite promising but were generally overestimated. However, the predicted PGA of the Chi-Chi earthquake data set, whose fault rupture was complex and long, using the proposed LSTM model was more accurate than the PGA predicted in a previous study using a support vector regression approach. In addition, an alternative alert criterion, which issues alerts when the predicted PGA exceeds the threshold in successive time windows, is presented, and the performance of the proposed LSTM model when different PGA thresholds are considered is also discussed.