Finger vein recognition, like control systems, requires harmonizing local and global dynamics for optimal performance. To address limitations in existing methods, we propose the Wavelet-Transformer algorithm, combining CNNs for local feature extraction, Vision Transformers (ViT) for global dependency modeling, and discrete wavelet transforms (DWT) for time-frequency analysis. This modular design mirrors control theory principles, ensuring stability and adaptability. Experiments on FV210 and FV618 datasets show the algorithm’s superior performance, achieving recognition accuracies of 99.53% and 97.62% with equal error rates of 0.35% and 0.71%, highlighting its robustness for intelligent recognition and control applications.