Sign language is a non-verbal, visual means of communication for people with hearing and speech disabilities. It involves using of hand signals, gestures, facial expressions, and body movements. Although most people without hearing and speech disabilities do not understand sign language, those who do, use it to communicate with the deaf and mute. Deaf and mute individuals are active users of various social media platforms; therefore, inclusiveness requires developing technological tools to facilitate communication between them and non-disabled individuals. One such technological tool is a sign language-based sentiment analysis tool, which allows both deaf and mute individuals and non-disabled individuals to understand the polarity expressed using sign language. This study adopts a multimodal approach to train two models: a deep convolutional neural network model, VGG16, for the visual modality, and a Bidirectional Encoder Representation from Transformer (BERT) for the textual modality. The models were trained using a multimodal sign language dataset consisting of video clips of sentence-level sign language with their textual equivalents. The results showed that the multimodal approach performed better than the single-modality text-based approach.