P. vivax genomic data summary
Based on a literature search including manuscripts published before
October 2022, we identified 1311 high-quality publicly shared P.
vivax genomes. Raw sequencing data were downloaded and all genomes were
combined, including in-house sequenced P. vivax genomes (n=163
samples originating from Peru, Brazil, Vietnam, and imported cases in
Belgium from travellers and migrants .
A total of 1474 high-quality P. vivax genomes (Supplementary
Table 1), coming from 36 countries in Asia (n=878), Americas (n=399),
and Africa (n=197), and collected between 2000 and 2019, were retained
after removing samples with less than 50% of the genome covered at
least 5-fold (Figure 1). The median sequencing coverage over the PvP01
reference genome including only retained isolates was 26-fold (range
1-763). After alignment and variant calling, a total of 2,435,842 high
quality genetic variants were identified (1,983,976 SNPs and 451,866
Indels), with a total of 1,836,935 variants in the core genome region,
(1,477,945 SNPs and 358,990 indels).
To facilitate the analysis, included genomes were grouped in regional
populations (following classifications from : Africa (AFR, including
isolates from all countries in sub-Saharan Africa, and returning
travellers with history of travel to these countries), Eastern South
East Asia (ESEA, including isolates from Cambodia, Laos, Thailand,
Vietnam, and the China-Myanmar border), Latin America (LAM, which
includes isolates from Mexico, Central and South America), Middle South
East Asia (MSEA, including isolates from Malaysia and The Philippines),
Oceania (OCE, including isolates from the island of New Guinea
(i.e. Papua New Guinea and part of Indonesia)), Western Asia
(WAS, which includes Afghanistan, Bangladesh, India, Iran, Pakistan, and
Sri Lanka).