As virtual reality display technologies advance, resolutions and refresh rates continue to approach human perceptual limits, presenting a challenge for real-time rendering algorithms. Neural super-resolution is promising in reducing the computation cost and boosting the visual experience by scaling up low-resolution renderings. However, the added workload of running neural networks cannot be neglected. In this paper, we try to alleviate the burden by exploiting the foveated nature of the human visual system, where acuity decreases rapidly from the focal point to the periphery. With the help of dynamic and geometric information (i.e.,pixel-wise motion vectors, depth, and camera transformation) available inherently in the real-time rendering content, we propose a neural accumulator to effectively aggregate the amortizedly rendered low-resolution visual information from frame to frame recurrently. By leveraging a partition-assemble scheme, we use a neural super-resolution module to upsample the low-resolution image tiles to different qualities according to their perceptual importance and reconstruct the final output heterogeneously. Perceptually high-fidelity foveated high-resolution frames are generated in real-time, surpassing the quality of other foveated super-resolution methods.