Main text: 3,656 words

Introduction

Between 6 – 10% of the general population in high-income countries carry a label of penicillin allergy (1, 2). It is estimated that around 90-95% of those with a label of penicillin allergy are misdiagnosed and could safely use penicillin antibiotics (3). Patients with a label of penicillin allergy who require antibiotic treatment are often prescribed second line antimicrobial regimens, resulting in sub-optimal medical management (4). There is also a risk to population health through the unnecessary use of broad spectrum antibiotics in place of penicillin, which adds to rising antimicrobial resistance (5).
The current process for assessing IgE-mediated penicillin allergy varies between healthcare systems across the globe. For example, in the United States (US) there are many non-specialists undertaking skin tests (ST) and oral drug provocation tests (DPT). In vitro testing may not be included in this work up. However, in Europe, assessment involves referral to a tertiary allergy centre for specialist review, which may include detailed clinical history, ST, specific immunoglobulin E (sIgE) testing, and an oral/intravenous DPT.  To streamline the process of de-labelling, there has been increased use of direct DPT, without prior ST in low-risk patients (6, 7). One British study estimated 65% of people with a label of penicillin allergy could be deemed “low-risk”, and hence suitable for a direct DPT (7). Comprehensive specialist allergy assessment is still required for patients that do not meet low-risk selection criteria or have unclear results.
A recent meta-analysis showed ST alone has a sensitivity of around 30% and a specificity of 97% (8). Both the ST and DPT come with a small but significant risk of a systemic reaction, with rates reported between 0.12% – 11 % (9, 10). The risk of a systemic reaction in DPT was 0.06%, but if the index reaction was anaphylaxis, this goes up to 6% (11). A negative DPT is considered the gold standard to exclude true penicillin hypersensitivity (12).
Unlike the in vivo tests, sIgE carries no risk of a reaction, as this is a serum test. In a meta-analysis of mostly European studies, sIgE in penicillin allergy, has a specificity similar to ST (~ 97%) (8). Of note, this value may differ in other regions with different healthcare structures and prescribing practices.  However, the sensitivity of sIgE testing is very low (~ 19% for amoxicillin) (8). Also, sIgE testing is only available for a limited number of penicillins (penicillin V, benzylpenicillin, ampicillin, amoxicillin).  Penicillin determinants, such as penicilloyl polylysine (PPL) and minor determinant mixture (MDM), have been developed to mimic the epitopes presented when penicillin antibiotics bind to proteins when in the circulation.
The 2020 European Academy of Allergy and Clinical Immunology (EAACI) position paper on improving diagnosis of beta-lactam hypersensitivity (12) recommends in vitro testing such as basophil activation test (BAT) or sIgE, prior to in vivo testing in high risk patients. Laboratory methods used for BAT are heterogeneous. Most commonly, BAT involves immediate processing (immediate to 48 hours) of whole blood samples in a flow cytometer. Blood cells are labelled with antibody markers for cell surface proteins to identify basophils (e.g. CD193+, CD123+, HLA-DR-), and to quantify basophil activation (CD63, CD203c) (13). Samples are then exposed to a minimum of two different concentrations of penicillin- based allergen. The penicillin used can be the specific culprit drug or another commonly used penicillin, and a penicillin determinant. Spontaneous activation of basophils without any exposure to an allergen is known to occur. To account for this, the stimulation index (SI) is calculated as the ratio of the percentage of activated basophils after exposure to drug, and the percentage of basophil activation when left untreated. For a positive result, treated basophils must demonstrate at least 5% activation, and an SI above a set threshold, commonly ≥2, for at least one of the concentrations of penicillin. There are variations in practice at almost every level of this process, with significant efforts being made to unify practice across Europe (14-16).
However, BAT is limited in its clinical application by the need for immediate flow cytometric analysis of whole blood samples and access to laboratories and trained staff who can deliver this. Basophil activation has been shown to be stable in samples stored for up to 24 hours if samples are stored at 4oC (17) .  Access to such facilities and expertise within 24 hours is operationally challenging in the real-life setting. Especially compared to SIgE, which can instead be collected and stored for convenient future analysis.
To support the use of BAT in the diagnosis of penicillin allergy, there is a need for clarity on the sensitivity and specificity of the test is, and how this might alter with different BAT methods. This work brings together all published studies with data on sensitivity and specificity of BAT used in penicillin allergy diagnosis. Through sub-group analysis, it aims to explore how this sensitivity and specificity is affected by variations between methods, to guide decision making for allergists considering the use of BAT in penicillin allergy diagnosis.

Methods

The review was registered with PROSPERO number CRD42021223880, 25/05/2021. Methodology was in accordance with PRISMA-DTA (18) and grading of recommendations, assessment, development and evaluation (GRADE) guidelines (19).
A search of PubMed and EMBASE databases was carried out from inception to 04/02/2023 using the terms “penicillin” AND “basophil” AND “allergy” with no limits. Duplicated results were automatically removed by EndnoteX9 reference manager and remaining titles and abstracts were blindly and independently reviewed by two authors using rayyan.ai software. Inclusion criteria were predefined as original, retrospective or prospective studies evaluating the performance characteristics of basophil activation test for identifying penicillin allergy in adults (age >18). Exclusion criteria included case reports and studies with insufficient key information. Manuscript authors were contacted through private communication to avoid duplication of results where multiple papers used similar cohorts and also where information was missing for key findings (true positive, true negative, false positive, false negative, SI threshold, minimum number of basophils used). This raw data was used to calculate sensitivity and specificity as our primary outcomes for this work.
Bivariate diagnostic random-effects meta-analysis and heterogeneity analysis was undertaken using RStudio (R version 4.2.0) using mada (meta-analysis of diagnostic accuracy) package (version 0.5.11). This allows the bivariate model of Reitsma et al (20, 21) to be fitted and generates sensitivity and specificity values with 95% confidence intervals (CI) and heterogeneity value. Restricted maximum likelihood (REML) was used for calculating the variance components. Figures were generated using the package meta, mada, metafor; summary receiver operator characteristic (SROC) curves to summarise studies which had multiple different positive thresholds, and forest plots demonstrating summary points for sensitivity and specificity were generated for studies which used the same positive thresholds (22). No covariates or predictors were used as we did not have access to individual participant data for all included studies.
Publication bias analysis was undertaken using methods outlined by Deeks et al (23) as the recommended method for meta-analysis of diagnostic test accuracy in The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (24).

Results

Database searches found 288 results in total (Figure 1). Citation manager removed 27 duplicates, leaving 261 titles and abstracts which were independently reviewed by two authors. This resulted in assessment of 58 full texts. Of six authors who were contacted to clarify key information, three responded. Final analysis included 22 publications with sufficient detail for risk of bias (RoB) assessment, Figure 2 (25-46). No amendments were made to the registered protocol.
Characteristics of all included publications are summarised in Online Repository (OLR) Table E1. This included results for a total of 935 penicillin allergy cases (median cases per study 28, range 2 – 158). The majority of cohorts were from Europe (n=20, 91%) and two (9%) from the USA. Nearly all studies, 95% (20 of 21 that included this information) were based in dedicated specialist Allergy Centres/Units. Time interval from most recent reaction to time of BAT was reported in 19 (83%) studies, with the maximum time for any one study up to 540 months. Time from sample collection to sample processing was only reported in nine (41%) studies. Of these, one (11%) reported “immediate” analysis, one (11%) reported “<2 hours”, and six (67%) reported <24 hours. Penicillin allergy definition was based on European allergy diagnostic criteria as outlined by the EAACI/ European Network for Drug Allergy (ENDA) (12, 47) in eight (36%) of studies. Clinical history and at least one of skin test results or sIgE or drug provocation tests was used in a further 11 studies (50%). History alone was used in three (14%).