loading page

SVIT-SSR:A sEMG-based Vision Transformer Approach for silent speech recognition
  • +3
  • Zhao Li,
  • Bin Ma,
  • Weifan Mao,
  • Jianxing Zhang,
  • zhuting yu,
  • Lu Yizhou
Zhao Li
Shanghai Advanced Research Institute Chinese Academy of Sciences
Author Profile
Bin Ma
Shanghai Advanced Research Institute Chinese Academy of Sciences

Corresponding Author:mab@sari.ac.cn

Author Profile
Weifan Mao
Shanghai Advanced Research Institute Chinese Academy of Sciences
Author Profile
Jianxing Zhang
Shanghai Advanced Research Institute Chinese Academy of Sciences
Author Profile
zhuting yu
SARI
Author Profile
Lu Yizhou
Shanghai Advanced Research Institute Chinese Academy of Sciences
Author Profile

Abstract

Silent Speech Recognition (SSR) based on Surface Electromyography (sEMG) is a voice interaction technology proposed for scenarios requiring silent operations. In this article, we abstract the SSR task based on sEMG into a short-term image sequence classification task. We perform time-frequency domain feature extraction and data reconstruction on the muscle activity segment data. Additionally, we analyze the temporal and spatial dimensions to capture the intrinsic correlation representation of muscle activity. We propose the SVIT-SSR model based on the Vision Transformer (VIT) framework. Finally, we design experiments to identify 33 types of typical silent speech commands in the SSR dataset. The results demonstrate that the proposed model achieves an accuracy of 96.67±1.15%, outperforming similar algorithms.
16 May 2024Submitted to Electronics Letters
29 May 2024Review(s) Completed, Editorial Evaluation Pending
29 May 2024Reviewer(s) Assigned
17 Jun 2024Editorial Decision: Revise Minor
02 Jul 2024Review(s) Completed, Editorial Evaluation Pending
06 Jul 2024Editorial Decision: Accept