With the swift progression of information technology, time series forecasting has become increasingly vital across various domains including finance, energy, and transportation. Traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks face challenges with long sequences, necessitating more advanced analytical approaches. Capitalizing on the efficiency of the Transformer architecture in processing lengthy sequences, this study introduces the FCA-Transformer model. This novel model integrates a feature pyramid and cross-attention mechanism to enhance predictive accuracy and computational efficiency. The FCA-Transformer constructs a feature pyramid to capture multi-scale time series features, ensuring comprehensive feature representation. Its cross-attention mechanism promotes deep interaction across sequences, effectively harnessing complex time series correlations, thus surpassing traditional models’ analytical constraints. The model’s cross-scale integration strategy further deepens the understanding and utilization of diverse time series information. Through rigorous ablation and comparative studies, the abstract verifies the pivotal role of the feature pyramid and cross-attention in improving forecasting performance. The FCA-Transformer has proven to outperform current top models, showcasing its exceptional predictive capabilities. Additionally, its successful application to an atypical physical examination dataset demonstrates the model’s robustness and broad applicability, signifying its potential for a wide range of forecasting tasks.