Purpose Real-world evidence (RWE) is increasingly important in the evaluation of efficacy and safety of new systemic anticancer treatments. However, as many RWE sources capture only parts of the healthcare continuum, dataset linkage is necessary to improve data richness. Linkage quality must be assessed to prevent information bias occurring as a result of incomplete data linkage. Methods We evaluated diagnosis concordance for lung cancer, melanoma, and renal cell cancer in patients recorded in a reference dataset, the UK Clinical Research Practice Database Aurum (CRPD Aurum). These were matched with data from three other linked datasets in the United Kingdom: the Hospital Episode Statistics Admitted Patient Care (HES APC), National Cancer Registry and Analysis Service (NCRAS), and Systemic Anticancer Treatment (SACT) datasets. Concordance was evaluated for cancer diagnosis and the date of diagnosis. Clinical determinants of non-concordance were also investigated to assess representativeness. Results In total, 119,396 patients with lung cancer, melanoma or renal cell carcinoma were identified. Concordance of cancer diagnosis records was relatively high (all >81%) for lung cancer and renal cell cancer in HES APC and NCRAS, as compared with CPRD Aurum. For melanoma however, a substantial underregistration was observed in all datasets. SACT concordance was poor in 2011 but had significantly improved by 2014, as reporting became mandatory for NHS trusts. Still, because patients were only registered in SACT when they received systemic treatment, patient numbers in SACT were significantly lower compared to the other datasets. For lung cancer and renal cell carcinoma, patients aged over 80 years were registered poorly. For melanoma, current smokers were limitedly registered. Conclusions Creating a rich real-world dataset for systemic anticancer treatments based on CPRD and SACT data is feasible, but improved registration of melanoma and outpatient care oncology treatments are still needed to improve comprehensiveness and representativeness of the data.