Radar-based human activity recognition (HAR) is a popular area of research. In this paper, we investigate methods to improve the generalization of micro-Doppler-based swimming activity recognition. We identify three main challenges to this task: a small dataset lacking motion diversity, inaccurate period estimation, and inefficient network design that does not take into account the unique characteristics of spectrograms. To address the limited motion diversity, we propose a spectral data augmentation tailored for micro-Doppler spectrograms, including positive augmentations that account for physical fidelity and negative augmentations that penalize the unrealistic examples. We also investigate self-supervised pre-training to effectively use these negative augmentations. To address inaccurate period estimation, we introduce a segmentation approach based on energy distribution to handle temporal period variation. To exploit the spreading pattern of limb motion in the Doppler dimension and the continuous properties of torso motion in the temporal dimension, we design a module consisting of both 2D convolution and 1D temporal dynamic convolution to serve as a feature extractor. Our evaluation on a self-collected swimming activity recognition dataset shows that our model achieves the best classification accuracy and robustness to corruptions, even compared to much larger models and multi-domain fusion models.