Objective To externally validate the “2021 AAGL Endometriosis Classification” staging system. Design Retrospective, diagnostic accuracy study Setting Multicentre Population or Sample Two hundred and seventy-two endometriosis patients (January 2016 - October 2021) Methods Three independent observers analysed coded surgical data to assign an AAGL surgical stage (1 to 4) as the index test, and surgical complexity level (A to D) as the reference standard. Main Outcome Measures The diagnostic accuracy of each AAGL stage to predict corresponding AAGL surgical complexity level was determined. Receiver operating characteristic curves used to determine the accuracy of cut off points used in the AAGL staging system to discriminate between surgical complexity levels. Results 272 cases were analysed. Diagnostic accuracy (sensitivity, specificity, PPV and NPV) for three observers were: stage 1 to predict level A 97.9-98.7%, 60.2-64.2%, 75.0-76.9%, and 96.3-97.5%; stage 2 to predict level B 26.1-30.4%, 93.2-95.6%, 26.3-35.3%, and 92.9-93.6%; stage 3 to predict level C 7.5-10.0%, 93.8-94.8%, 33.3-42.1%, and 70.9-71.5%; stage 4 to predict level D 90.-95.0%, 90.1-91.7% &, 41.9-47.5%, and 99.1-99.6%. For three observers AUROC for A vs B/C/D (cut-point 9) 0.75-0.88, A/B vs C/D (cut-point 16) 0.81 and A/B/C vs D (cut-point 22) 0.95-0.96. Conclusions This external validation study demonstrates that the AAGL Endometriosis Classification performs poorly overall for the prediction of surgical complexity. The results from this external validation study suggest that the system in its current form is not generalizable to all endometriosis patients and should be reviewed before its universal implementation. Funding Nil Keywords Endometriosis, staging, laparoscopy