1. Introduction
In recent years, the incidence of food allergies has been on the rise, and about 10% of people in the world are suffering from food allergies [1]. So far, 390 proteins have been identified as food protein allergens by the World Health Organization and the International Union of Immunological Societies (WHO/IUIS) Allergen Nomenclature Sub-Committee, and more than 50% of the food protein allergens are found in eggs, milk, nuts, wheat, crustaceans, beans, fish, and peanuts [2]. Soybean is a common source for legume allergen, and about 25% of food allergies are caused by soybeans [3, 4], which not only induce a variety of pathological reactions such as intestinal injury, stomach discomfort, or allergic dermatitis, anaphylactic shock, [3], but also limit the development and application of soybean products [4]. So far, eight soybean protein allergens have been identified and submitted to the allergen database (http://www.allergen.org/), namely hydrophobic protein (Gly m 1), defensin (Gly m 2), profilin (Gly m 3), pathogenesis-related protein (Gly m 4), β-conglycinin (Gly m 5), glycinin (Gly m 6), seed biotinylated protein (Gly m 7) and 2S albumin ( Gly m 8). Besides, Gly m Bd 28k, Gly m Bd 30k, trypsin inhibitor, lectin, and Gly 50 kDa have been also confirmed as the common soybean allergens [5]. However, among these soybean protein allergens, only seven of them, i.e., Gly m 4.0101 (Uniprot ID: P26987), Gly m 5.0201 (Uniprot ID: P11827), Gly m 5.03 (Uniprot ID: P25974), Gly m 6.0101 (Uniprot ID: P04776), Gly m 6.0501 (Uniprot ID: P04347), trypsin inhibitor (Uniprot ID: P01070) and lectin (Uniprot ID: P05046) (Table 1), have been characterized with crystal structures as seen in the Uniprot database (https://www.uniprot.org/).
Previous studies have indicated that soybean allergy is usually mediated by type 2 CD4+ T cells (Th2) [5]. In detail, the T cell epitopes of soybean allergens combine with major histocompatibility complex (MHC) class II proteins, which can be recognized by Th2 cells and induce the release of IL-4, IL-5, IL-13 and other interleukins that promote B cell proliferation and differentiation to produce antibodies, and play an important role in activating mast cells, basophils, and eosinophils [2]. Since antigen recognition by T cells is a critical step, the analyses of T cell epitopes and related cytokines are of vital importance for understanding the mechanism of anaphylaxis. Generally, T cell epitopes typically consist of 12-20 amino acids, and the T cell epitope chain less than 12 amino acids would not efficiently stimulate CD4 cells [6]. For the peptide fragment that are bound by MHC class II molecules, they are usually longer (13-15 amino acids) where the core sequences have length of 9 amino acids [7]. MHC molecules are known as peptide-binding glycoproteins, a complex that affect the immune response. For major histocompatibility antigen system in human body, the histocompatibility leukocyte antigens (HLAs) encoded by the MHC genes produce the Class I and class II molecules, and HLA class II molecules (HLA-DP, HLA-DQ, HLA-DR) alleles are gene complexes encoding human MHC class II proteins [8]. However, the alleles of HLA class II alleles are not clear, and the epitope peptides fragment bound to HLA class II molecules as well as the binding affinity are rarely described.
The identification of T cell epitopes of food allergens helps with the understanding of the allergy mechanisms and the development of hypoallergenic foods and immunotherapy design. As previously reported, employing appropriate proteases to pre-hydrolyze food allergens are effective to disrupt the allergenicity of food proteins, but the protease enzymes need to be selecgted based on the anti-digestion area of epitopes [9]. For example, there are many hypoallergenic children’s formula milk products that are developed via enzymatic pre-hydrolysis method, in order to effectively prevent children from milk allergy, and improve taste and nutritional value [10]. Furthermore, T cell epitopes have been considered as safe and effective immunotherapy modulators [2]. For example, the allergic reactions from shrimp tropomyosin and egg ovalbumin studied on mice could be effectively reduced by T cell epitopes oral immunotherapy [11, 12]. The knowledge of allergen-specific T cell epitopes is also useful to develop T cell epitope targeted vaccines for immunotherapy [13].
Allergen T cell epitopes are conventionally identified using peptide scanning technology via synthesizing multiple overlapping peptides that span the length of the protein in the combination with in vitro T cell stimulation tests, but this approach was costly in the peptide synthesis and serum preparation [14]. Although the lymphocyte proliferation assay in the T cell test is simple, inexpensive and sensitive, it takes long time and can not identify and quantify the cytokines. Flow cytometry is a novel T cell epitope identification method with high throughput screening to save time, but the equipment is expensive [15]. In this context, bioinformatics has been applied as an initial screening methoda and widely used in the prediction of food allergen epitopes with advantages including high throughput, fast analysis speed, and low cost, benefiting from database development [16]. In our previous studies, seven different bioinformatics tools, ProPred, SYFPEITHI, NetMHCII, NN-align, SMM-align, NetMHCIIpan and RANKPEP, were employed to predict the T cell epitopes of a black kidney bean, from which two of three potential T cell epitopes were confirmed using cytokine and lymphocyte proliferation experiments [17], suggesting the prediction accuracy of bioinformatics analysis. Besides, thirty-six T cell epitopes of the peanut allergen Ara h 1 were obtained by NetMHCIIpan 2.0 tool, and 14 of predicted T cell epitopes binded to HLA molecules in vitro , and 35 of T cell epitopes induced T cell proliferation differentiation and the production of IL13 [14]. By now, only one soybean allergen, Gly m 6, with identified epitopes, is recorded in SDAP database, and most of the relevant studies pay attention to the exploration of B cell epitopes from soybean allergens. For instance, Zeece et al. [18] identified the IgE epitope (aa192-306) in the region of soybean globulin G1 acidic subunit by western blot. Helm et al. [19] used the peptide scanning technique to identify 11 linear B cell epitopes of soybean globulin G1, and the epitopes AGVALSRCTLN (aa 62-72) had cross-reactivity with a peanut allergen. Saeed et al. [20] identified 9 glycinin epitopes and found that glycinin cross-reacted with peanuts, almonds, and walnuts. However, there is a lack of research on the T cell epitopes of soybean allergens.
In this study, in order to make a comprehensive understanding of T cell epitopes of soybean allergens, all the seven soybean protein allergens characterized with crystal structures were subjected for bioinformatic analyses. T cell epitopes of soybean allergens were predicted byin silico tools, and the binding affinity with different HLA II alleles and the ability to induce IL-4 as well as the allergenicity of epitopes were also evaluated to understand the allergenicity mechanisms. Amino acid composition and the quantitative structure - activity relationship as well as the pepsin hydrolysis sites of epitopes were analyzed.
2.