Results

Database searches found 288 results in total (Figure 1). Citation manager automatically removed 27 duplicates, leaving 261 titles and abstracts which were independently reviewed by two authors. This resulted in assessment of 58 full texts. Six authors were contacted to clarify key information and of which, three authors responded to provide clarification. This gave 22 publications with sufficient detail to be included in the final analysis and risk of bias assessment (Figure 2) (39-60). No amendments were made to the registered protocol.
Characteristics of all included publications are summarized in Table 1. Across the 22 included manuscripts there were results for 897 penicillin allergy cases (median 28, range 2 – 158), Table 1. The majority of cohorts were from Europe (n=20, 91%) and two (9%) from the USA. Nearly all studies, 95% (20 of 21 that included this information) were based in dedicated specialist Allergy Centers/Units. Time interval from most recent reaction to time of BAT was reported in 19 (83%) studies, with the maximum time for any one study up to 540 months. Time from sample collection to sample processing was only reported in nine (41%) of the studies. Of these, one (11%) reported “immediate” analysis, one (11%) reported “<2 hours”, and six (67%) reported <24 hours. Penicillin allergy definition was based on European allergy diagnostic criteria as outlined by EAACI or ENDA in eight (36%) of studies. Clinical history and at least one of skin test results or sIgE or drug provocation tests was used in a further 11 studies (50%). History alone was used in three (14%).
Sensitivity and specificity values for all 22 included studies, their risk of bias and applicability concerns are presented in Figure 2. The SI threshold for positivity varied across the publications (2, 2.5 and 3 were all used). An estimation of a summary receiver operator characteristic (SROC) curve was generated using results from all 22 studies (Figure 3). The Higgins’ I2 of heterogeneity was 55.3% with a 95% Confidence Interval (CI) 27.9% - 72.4%, indicating moderate between-study heterogeneity, and tau2 equal to 0.2522 with a p-value <0.0001 of the Cochrane Q statistics suggests the result is statistically significant. Twelve of the studies used an SI of 2 as positive threshold for the diagnostic test. This allowed calculation of a summary point sensitivity of 51% (95% CI, 46% – 56%), and specificity of 89% (95% CI, 85% – 93%), AUC 0.666, I2 14.4% (95% CI, 0% - 54%), tau2 0, p =0.30 (Figure 4).
From the 22 manuscripts reporting both on sensitivity and specificity, six reported results for two different BAT assay types. Of the 28 assay types reported, 18 (64%) used flow cytometric analysis of activation of basophils collected directly from the patient. Four (14%) measured sulfidoleukotriene production. Two (7%) manuscripts measured histamine release, indirect and a direct observation of where basophils morphology was examined under a microscope to determine if basophils had been activeted. The different methods had similar sensitivity and specificity profiles as can be seen in comparison of SROC curves (Figure 5) and as seen by an even spread across the SROC curve of all 22 studies in Figure 3.
The minimum number of basophils required for a sample to be analyzed was reported in 16 studies (73%) and was lowest in the very earliest BAT studies (from 1963 (54) , 1964 (59) and 1986 (53)) where only 20 basophils were required to be seen per mm2 of the microscope field. In the immediately analyzed whole blood assays that used flow cytometry there was a median value of 500 basophils required per sample, with a range from 200 -1000. Eleven studies that used an SI threshold of 2, and had details of the minimum number of basophils used in their assay, allowed an estimated summary points for sensitivity and specificity to be generated, Figure 6. The use of a minimum of 1000 basophils (sensitivity 0.47 (95% CI, 0.39-0.56) and specificity 0.89 (95% CI, 0.78 - 0.95)) per test did not seem to confer any improvement in sensitivity or specificity over the use of a lower minimum of 500 (sensitivity 0.47 (95% CI, 0.22 – 0.73) and specificity 0.91(95% CI, 0.84 – 0.95)).
All studies were of high or at least unclear risk of bias. The most frequent source of potential bias was due to the patient selection process with 14 of 22 studies (64%) rated as high risk in this domain (Figure 2). This was largely due to the fact that most studies did not specify how patients were identified, or if consecutive patients were used, and several only looked at a very few selected patients. Although careful patient selection may induce bias to the results, as a result of this most of the studies did accurately identify patient with immediate penicillin allergy.
In keeping with GRADE guidance on grading the certainty of evidence in diagnostic test accuracy, we have considered the domains of imprecision and publication bias (32). There was considerable inconsistency in the reported sensitivity (ranging from 0.23 to 0.94) with minimal overlapping of the 95% CI (Figure 2). This did however improve when we considered only those studies with a positive SI threshold of 2 (Figure 4). Specificity was found to be fairly consistent (ranging from 0.67 to 0.99). The specificity also demonstrated extensive overlapping of 95% CI (Figure 2), suggesting good consistency. Although there was variation in CI width for the reported sensitivity, the majority of studies (16 of 22, 73%) showed a 95% CI that was entirely above the sensitivity of 0.19 seen with sIgE, which is the clinical comparison which we hope to improve upon with BAT. The 95% CI for specificity were much narrower than for sensitivity, demonstrating no need to lower the grading of the certainty of the evidence based on imprecision.
Publication bias was assessed for all 22 using a funnel plot, Figure 7. Subjectively this does not seem to be symmetrical, which suggests that there may be evidence of publication bias. However it is recognized that funnel plots may overestimate publication bias in meta-analyses of diagnostic test accuracy (36). Given that papers both for and against the use of BAT in clinical practice were published, studies were in the majority not funded by organizations with for-profit interest, and the authors are unaware of any unpublished studies in this field, it was felt that publication bias was not a reason to downgrade the certainty of evidence. Although one study showed BAT was more likely to be positive in those with a severe reaction (41), this is no clear relationship between the degree of BAT positivity and the severity of the index allergic reaction across other studies. We felt this work did not show any sensitivity-specificity relationship, and have therefore not upgraded the certainty of evidence. Overall GRADE certainty of the evidence for sensitivity is “very low”, and for specificity is “low”, suggesting “the true effect might be markedly different from the estimated effect”. This was deemed as grounds of marked inconsistency in the sensitivity grading.

Discussion

This work primarily highlights the significant heterogeneity of methods used in BAT and results gained by the use of BAT in penicillin allergy. As a summary point should only be completed using methods with the same positive threshold, our primary finding from this work is the flow cytometric analysis with an SI threshold of 2, BAT in penicillin has an estimated summary point sensitivity of 51% (46% – 56%) and specificity of 89% (85% –93%). When compared to sIgE, anotherin vitro diagnostic recognized for use in penicillin allergy diagnosis, BAT shows improved sensitivity (sIgE sensitivity of 19.3% (95% CI, 12.0%-29.4%)) but less specificity (sIgE specificity of 97.4% (95% CI, 95.2%-98.6%)) (13).
Limitations
One recurring theme across all 22 papers included was that there was a significant risk of bias through patient selection (Figure 2). The majority of papers only included final results on patients with definite immediate allergy compared to control groups with no history of allergy and able to tolerate oral penicillin. This aids clarity in understanding what a diagnostic test is showing, but it is not applicable to clinical practice, where indeterminate results and alternate diagnosis, such as delayed drug hypersensitivity and chronic spontaneous urticaria, complicate the clinical picture. Future work to overcome this issue should be undertaken, with prospectively collected consecutive samples from participants with suspected penicillin allergy who are all offered the gold standard specialist work up.
Another limitation is that, while many of the studies confirmed that patients were classified according to the European Academy of Allergy and Clinical Immunology (EAACI) or European Network for Drug Allergy (ENDA) guidelines, not all participants will have had exactly the same series of tests as part of this assessment. For example, a participant with a positive skin prick test or sIgE will not have gone on to undergone a DPT. Furthermore, it is now well documented that skin testing can also lead to false positives with a recent meta-analysis reporting a summary sensitivity of 31% (95% CI, 19%-46%) and a specificity of 97% (95% CI, 94%-98%), (13). It is also relevant the definition of an “immediate reaction”, ranging from any reaction within 30 minutes of drug administration (45, 50) to those occurring within 24 hours (61).
It is understandable that the majority (91%) of these participants were recruited from Allergy Centers, when they have had an outpatient referral for assessment. While there has been work looking at de-labelling inpatients with DPT, no studies reported BAT results from an inpatient setting. Future work is required to explore if BAT can be used in different clinical settings, such as an emergency department, or admissions ward, or other outpatient facility, other than a highly specialized allergy clinic.
The time between the last reaction and BAT assessment also varied widely between studies, and also within studies. It was therefore not possible to undertake any sub-group analysis to comment how time from the last BAT reaction may have influenced the BAT outcome. As one potential use for BAT would be to “rule-out” penicillin allergy in a person with a distant history of reaction, it would be important to know if a BAT result is reliable many years after the last penicillin exposure. A study published by Fernandez et al. showed that BAT reactivity decreased significantly even over a four-year study period (62). Only 1 of 41 patients was BAT positive at the four-year mark. It does suggest that perhaps the clinical utility of BAT as a “rule-out” test may be limited to those that have been referred to an allergy service as soon as possible after the reaction. This may well compliment the current shift in practice toward direct DPT in low-risk patients with a distant history of penicillin reaction. BAT could be used in more severe reaction settings, such as severe intraoperative reactions where multiple drugs are given at the same time. If a BAT is negative, it may provide reassurance to allow a patient to undergo DPT and be de-labelled. However, with its high specificity, BAT may be a good “rule-in” test and, if positive, could save patients from having a potentially harmful positive DPT. Future studies looking at the use of BAT as a diagnostic test should be clear about the time from reaction for the samples analyzed, as this may have a significant effect on the BAT outcome. Further work exploring the clinical relevance of the negativisation rates of BAT is warranted.
Analysis of methods
To denote a positive BAT result, Salas in 2018 used an SI of 1.5 based on a ROC curve analysis comparing penicillin allergic and control results. However, Dreborg commented in 2018 that this was a concern, as SI should be at least 2 (63). The EAACI 2015 position paper calls for an effort to be made to standardize the BAT assay, and as such future work should keep an SI of 2 as the positive threshold to allow comparison of results across different groups.
Information on minimum number of identified basophils required for any single BAT test was not available for all studies. The subgroup analysis showed the summary sensitivity and specificity were extremely similar, suggesting no difference between the use of 500 or 1000 basophils. Using a lower minimum would be much more efficient when working on a precious resource, such as basophils, isolated from whole blood. This is clinically useful information as using a lower minimum required number of basophils will increase the chances of collecting a usable sample from a patient.
Abauf et al, in 2008, compared CD63 and CD203c as markers of basophil activation and suggested that CD203c was potentially a better marker (39). This was repeated more recently by Heremans et al in 2022, who also showed similar results, suggesting CD203c may give a slightly improved sensitivity (60). The subgroup analysis comparing these two methods against results from CD63 showed no statistically significant difference between the methods.
The study by Molina et al. (49) looked at the use of a novel dendrimeric antigens (DeAns) as carrier molecules for benzylpenicilloyl and amoxicilloyl in dense and stable hapten-carrier conjugates. This did not provide any diagnostic benefit above the use of benzylpenicilloyl, amoxicilloyl or free penicillin in BAT in this small sample.
Clinical use
A questionnaire sent out in 2007 to allergists across the world suggested 54% of responders used BAT in the work up of drug allergy hypersensitivity (64). A 2018 world-wide survey of the cost of allergy assessment found the median cost for BAT at \euro90, with only DPT costing more than BAT at \euro190 (25). A cost analysis from the same group concluded that, despite the cost, widespread penicillin allergy testing with ST and DPT would be cost saving due to the use of more targeted antibiotics, fewer courses of antibiotics, fewer outpatient visits and fewer hospital days on those admitted (65). The current role for BAT in clinical practice would therefore be to decrease the number of DPT that need to be performed with their associated cost and risk.
The current order in which BAT is suggested to be used in penicillin allergy is before ST for patients with a high-risk history, and after ST for low-risk patients (28). As the sensitivity of BAT was better than skin prick testing (51% vs 30%), and the specificity slightly lower (89% vs 97%), this paper would support the use of BAT to improve he sensitivity of allergy investigations, and reduce the number of patients requiring DPT to exclude penicillin allergy (Figure 8).
Some studies suggest that the use of both sIgE and BAT together improves sensitivity (41, 66). However, this opinion is not universally held, as some groups have shown no improvement in sensitivity with the use of sIgE and BAT together, and do not support the use of both methods (67). The 2020 EAACI position paper suggests that “it is advisable to performin vitro tests in addition to ST in high-risk patients in order to improve the sensitivity of the allergy workup and thus reduce the need for DPT (moderate/strong)”, but does not clarifying if one or both tests should be done, or which test is preferred (16). BAT shows clearly improved sensitivity above sIgE (51% vs 19%), (13). However, including BAT and sIgE with their respective specificity of 89% and 97%, would still mean a small proportion of patients may erroneously be considered positive for penicillin allergy after optimal assessment, despite being able to tolerate penicillin. For BAT to become a routine part of the diagnostic work up for penicillin, it must either have a sensitivity that is high enough for it to be used as a screening test, or a specificity higher than skin test or sIgE (>97%).
Alternatively, another potential use of BAT could be as in vitrodiagnostic option for identifying clavulanic acid-specific allergy. Since hypersensitivity reactions to amoxicillin-clavulanic acid co-association is very common, being able to determine if it is clavulanic acid eliciting the allergic reaction would rescue amoxicillin use as single drug formulation. To date there is no commercially available sIgE to clavulanic acid. In two recent studies BAT was able successfully diagnose clavulanic acid allergy in an adult population (41, 68). This is another way that BAT can be used to accurately determine true amoxicillin or clavulanic acid allergy.