In this paper, we propose the EISATC-Fusion model for MI EEG decoding, consisting of inception block, multi-head self-attention (MSA), temporal convolutional network (TCN), and layer fusion. The inception block extracts multi-scale temporal features, MSA increases the global time-dependence of features, and then TCN extracts high-level temporal features. The layer fusion consists of feature fusion and decision fusion, fully utilizing the features output by the model and enhances the robustness of the model. We improve the two-stage training strategy for model training. Early stopping is used to prevent model overfitting, and the accuracy and loss of the validation set are used as indicators for early stopping. The proposed model achieves within-subject classification accuracies of 83.18\% and 87.44\% on BCI Competition IV Datasets 2a and 2b, respectively. And the model achieves cross-subject classification accuracies of 65.37\% and 65.62\% (by transfer learning) when training the model with two sessions and one session of Dataset 2a, respectively. The model code can be obtained at https://github.com/LiangXiaohan506/EISATC-Fusion.