1. Introduction
In recent years, the incidence of food allergies has been on the rise,
and about 10% of people in the world are suffering from food allergies
[1]. So far, 390 proteins have been identified as food protein
allergens by the World Health Organization and the International Union
of Immunological Societies (WHO/IUIS) Allergen Nomenclature
Sub-Committee, and more than 50% of the food protein allergens are
found in eggs, milk, nuts, wheat, crustaceans, beans, fish, and peanuts
[2]. Soybean is a common source for legume allergen, and about 25%
of food allergies are caused by soybeans [3, 4], which not only
induce a variety of pathological reactions such as intestinal injury,
stomach discomfort, or allergic dermatitis, anaphylactic shock, [3],
but also limit the development and application of soybean products
[4]. So far, eight soybean protein allergens have been identified
and submitted to the allergen database (http://www.allergen.org/),
namely hydrophobic protein (Gly m 1), defensin (Gly m 2), profilin (Gly
m 3), pathogenesis-related protein (Gly m 4), β-conglycinin (Gly m 5),
glycinin (Gly m 6), seed biotinylated protein (Gly m 7) and 2S albumin (
Gly m 8). Besides, Gly m Bd 28k, Gly m Bd 30k, trypsin inhibitor,
lectin, and Gly 50 kDa have been also confirmed as the common soybean
allergens [5]. However, among these soybean protein allergens, only
seven of them, i.e., Gly m 4.0101 (Uniprot ID: P26987), Gly m 5.0201
(Uniprot ID: P11827), Gly m 5.03 (Uniprot ID: P25974), Gly m 6.0101
(Uniprot ID: P04776), Gly m 6.0501 (Uniprot ID: P04347), trypsin
inhibitor (Uniprot ID: P01070) and lectin (Uniprot ID: P05046) (Table
1), have been characterized with crystal structures as seen in the
Uniprot database (https://www.uniprot.org/).
Previous studies have indicated that soybean allergy is usually mediated
by type 2 CD4+ T cells (Th2) [5]. In detail, the T cell epitopes of
soybean allergens combine with
major
histocompatibility complex
(MHC)
class II proteins, which can be recognized by Th2 cells and induce the
release of IL-4, IL-5, IL-13 and other interleukins that promote B cell
proliferation and differentiation to produce antibodies, and play an
important role in activating mast cells, basophils, and eosinophils
[2]. Since antigen recognition by T cells is a critical step, the
analyses of T cell epitopes and related cytokines are of vital
importance for understanding the mechanism of anaphylaxis. Generally, T
cell epitopes typically consist of 12-20 amino acids, and the T cell
epitope chain less than 12 amino acids would not efficiently stimulate
CD4 cells [6]. For the peptide
fragment that are bound by MHC class II molecules, they are usually
longer (13-15 amino acids) where the core sequences have length of 9
amino acids [7]. MHC molecules are known as peptide-binding
glycoproteins, a complex that affect the immune response. For major
histocompatibility antigen system in human body, the histocompatibility
leukocyte antigens (HLAs) encoded by the MHC genes produce the Class I
and class II molecules, and HLA class II molecules (HLA-DP, HLA-DQ,
HLA-DR) alleles are gene complexes encoding human MHC class II proteins
[8]. However, the alleles of HLA class II alleles are not clear, and
the epitope peptides fragment bound to HLA class II molecules as well as
the binding affinity are rarely described.
The identification of T cell epitopes of food allergens helps with the
understanding of the allergy mechanisms and the development of
hypoallergenic foods and immunotherapy design. As previously reported,
employing appropriate proteases to pre-hydrolyze food allergens are
effective to disrupt the allergenicity of food proteins, but the
protease enzymes need to be selecgted based on the anti-digestion area
of epitopes [9]. For example, there are many hypoallergenic
children’s formula milk products that are developed via enzymatic
pre-hydrolysis method, in order to effectively prevent children from
milk allergy, and improve taste and nutritional value [10].
Furthermore, T cell epitopes have been considered as safe and effective
immunotherapy modulators [2]. For example, the allergic reactions
from shrimp tropomyosin and egg ovalbumin studied on mice could be
effectively reduced by T cell epitopes oral immunotherapy [11, 12].
The knowledge of allergen-specific T cell epitopes is also useful to
develop T cell epitope targeted vaccines for immunotherapy [13].
Allergen T cell epitopes are conventionally identified using peptide
scanning technology via synthesizing multiple overlapping peptides that
span the length of the protein in the combination with in vitro T
cell stimulation tests, but this approach was costly in the peptide
synthesis and serum preparation [14]. Although the lymphocyte
proliferation assay in the T cell test is simple, inexpensive and
sensitive, it takes long time and can not identify and quantify the
cytokines. Flow cytometry is a novel T cell epitope identification
method with high throughput screening to save time, but the equipment is
expensive [15]. In this context, bioinformatics has been applied as
an initial screening methoda and widely used in the prediction of food
allergen epitopes with advantages including high throughput, fast
analysis speed, and low cost, benefiting from database development
[16]. In our previous studies, seven different bioinformatics tools,
ProPred, SYFPEITHI, NetMHCII, NN-align, SMM-align, NetMHCIIpan and
RANKPEP, were employed to predict the T cell epitopes of a black kidney
bean, from which two of three potential T cell epitopes were confirmed
using cytokine and lymphocyte proliferation experiments [17],
suggesting the prediction accuracy of bioinformatics analysis. Besides,
thirty-six T cell epitopes of the peanut allergen Ara h 1 were obtained
by NetMHCIIpan 2.0 tool, and 14 of predicted T cell epitopes binded to
HLA molecules in vitro , and 35 of T cell epitopes induced T cell
proliferation differentiation and the production of IL13 [14]. By
now, only one soybean allergen, Gly m 6, with identified epitopes, is
recorded in SDAP database, and most of the relevant studies pay
attention to the exploration of B cell epitopes from soybean allergens.
For instance, Zeece et al. [18] identified the IgE epitope
(aa192-306) in the region of soybean globulin G1 acidic subunit by
western blot. Helm et al. [19] used the peptide scanning technique
to identify 11 linear B cell epitopes of soybean globulin G1, and the
epitopes AGVALSRCTLN (aa 62-72) had cross-reactivity with a peanut
allergen. Saeed et al. [20] identified 9 glycinin epitopes and found
that glycinin cross-reacted with peanuts, almonds, and walnuts. However,
there is a lack of research on the T cell epitopes of soybean allergens.
In this study, in order to make a comprehensive understanding of T cell
epitopes of soybean allergens, all the seven soybean protein allergens
characterized with crystal structures were subjected for bioinformatic
analyses. T cell epitopes of soybean allergens were predicted byin silico tools, and the binding affinity with different HLA II
alleles and the ability to induce IL-4 as well as the allergenicity of
epitopes were also evaluated to understand the allergenicity mechanisms.
Amino acid composition and the quantitative structure - activity
relationship as well as the pepsin hydrolysis sites of epitopes were
analyzed.
2.