The micro-expression recognition (MER) has gained high attention in real-world applications: human-computer interaction, depression estimation, virtual reality, etc. However, MER systems still face difficulty in capturing the subtle information with spatial appearance of micro expressions. In this paper, an efficient and robust motion flow-guided MicroNet framework is proposed for MER. The proposed MicroNet consists: motion flow generator (MFGen) and avalanche feature (AFeat) block. The MFGen is introduced to extract the temporal changes at expressive regions by analyzing the motion intensity of pixels among the frames. Further, the AFeat block is designed to capture the spatiotemporal feature using a multi-lateral complementary feature (MCFeat) block. The MCFeat block is proposed to elicit the coarse and deep edge responses from multi-scale receptive fields. Thus, the proposed MicroNet estimates not only momentary variations but also learns affective appearance features of micro expressions. Furthermore, the efficacy of the proposed MicroNet is examined by conducting experiments on two experimental setups using six comprehensive datasets: CASME-I, CASME-II, CAS(ME)2, SAMM, SMIC and COMPOSITE, with two validation schemes: leave-one-subject-out and cross-domain to demonstrate the generalization capacity and robustness of the proposed framework. Also, eight ablation experiments have been executed to validate each module’s role in the MicroNet framework.