Iyad Sultan

and 8 more

Background: Pediatric cancer stage at diagnosis is critical for prognosis and research comparisons. The Toronto Pediatric Cancer Stage Guidelines standardize staging across childhood malignancies . We developed a framework for automated staging of pediatric cancers. Methods: A structured staging schema was created. An extraction pipeline was implemented to orchestrate agents. The system ingests multi-disciplinary meeting notes, pathology reports, radiology findings, operative notes, and clinic documentation from the first 3 months after diagnosis. One agent identifies the cancer type and maps it to a Toronto diagnostic category, after which another agent applies the relevant staging logic; and a validation agent examines the stage and its logic against summarized documentations. We tested the tool on 500 pediatric cancer cases from our institutional registry. Cases outside the Toronto schema (e.g. acute myeloid leukemia and nasopharyngeal carcinoma, which have no stage per guidelines) were excluded, yielding 433 evaluable cases. Each case was processed independently in two runs. The outputs were compared to an expert consensus reference stage (ground truth) established by four pediatric oncologists. Results: The automated system matched the reference stage in 91.2% of cases overall. Per-run accuracy (compared to ground truth) was 93.8% for the first run and 88.7% for the second run. The two runs agreed on 89.8% of cases (Cohen’s κ=0.785, p<0.001). Accuracy dropped significantly when the validation agent flagged the stage and requested a recalculation. For stages obtained from the first attempt, accuracy was 97%; while for stages achieved on subsequent attempts, accuracy achieved 77%. Conclusion: We demonstrate the first automated staging system for pediatric cancers using standardized Toronto criteria. The tool showed high accuracy comparable to human experts and excellent consistency between independent runs. We identified a measurable metric (number of calculation attempts) that can flag problematic cases for further human analysis.