Signal as Token: Robust DOA Estimation in Complex Environments Aidded by
Transformer
Abstract
Traditional DOA estimation methods include beamforming, maximum
likelihood estimation, subspace-based methods and the sparsity-inducing
methods, and DOA estimation is made by establishing the relationship
between the received signal and the geometric characteristics of the
array. However, factors such as low signal-to-noise ratio, low snapshot,
array errors, coherent signals, and broadband signals can seriously
affect the performance of these methods. Existing improved methods, such
as spatial smoothing and compressed sensing to deal with coherent signal
sources, and band division technology to deal with broadband signal
sources, are often at the expense of resolution. Besides that,
traditional methods tend to be poorly extrapolated and fail to make
satisfactory estimates in complex situations. In order to deal with the
above problems, some studies have proposed machine learning methods and
deep learning methods to estimate DOA. However, the generalization
ability of machine learning methods is weaker than that of deep learning
methods, and most of them only use synthetic data for experiments, which
cannot guarantee the performance in practical applications. Most deep
learning methods model DOA estimation as a classification problem on
grids, which limits the accuracy of estimation results. If the accuracy
is to be increased, the grids have to be finer, which significantly
increases the computational cost. Like the above machine learning
methods, most deep learning methods do not give experimental results on
measured data.
This paper proposes a novel DOA estimation method based on the
Transformer model to solve the DOA estimation problem. Firstly, compared
with the traditional Transformer, the model in this paper adds a
sensor-based attention mechanism specially designed for DOA estimation.
This method abandons the previous grid classification, and directly
regards the DOA estimation problem as a regression problem to minimize
the error. It can be proved through strict mathematical derivation that
its output can be decomposed by pseudo-singular value, and the
eigenvalue matrix is the same as that of the MUSIC method, which means
that the output of the proposed attention module is in the space spanned
by the (projected) signal and noise eigenvectors. If the eigenvalue is
large, the spanned space is dominated by the corresponding eigenvector,
which forces the model to concentrate on the vital eigenvectors.
Secondly, the complexity of the sensor-based attention mechanism is
significantly reduced compared with the original attention mechanism,
from O(N2) to O(M2), where N is the number of
snapshots, M is the number of sensors. Thirdly, we conducted
simulation experiments including low signal-to-noise ratio, low
snapshot, array errors, coherent signal and broadband signal scenarios,
and the results show that our method has good adaptability to various
scenarios. Fourthly, in order to verify the practical application
ability of our model, we carried out migration and testing on the
measured data, and the results show that our method still has a good
effect. Fifthly, in order to cope with possible environmental changes in
practical applications, we specially set up a generalization setting
experiment. This experiment mainly explores the generalization ability
of the model for unknown scenarios, including the generalization
situation under different signal-to-noise ratios and different array
error strengths, and satisfactory results have been achieved. Finally,
since our model needs to know the number of sources in advance, and the
number of sources is sometimes unknown in reality, we slightly modify
the DOA estimation model, changing the regression head to the
classification head to realize the estimation of the number of sources.
The results show that the average estimation accuracy is about 98%,
which further enhance the application capabilities.