Inter-rater analysis
We conducted inter-rater reliability assessment tests (IRR), to determine the agreement level between raters. We used % agreement IRR for every question in each domain in the four eligible SCD in pregnancy CPGs to assess the level of agreement among the four raters as well as the percent agreement of the first overall assessment (OA1) of the AGREE II Instrument. Moreover, we used the Intra-class correlation and measured the consistency of ratings or capacities for datasets that have been gathered as clusters or arranged into clusters including the second overall assessment (OA2 or ‘recommend this CPG for use’). Intra-class correlation (ICC) is one of the most prevalent IRR approaches that is used when we have more than two raters. We used it as we had more than a couple of pairs. A high Intra-class-Correlation-Coefficient (ICCC or Kappa) near one specified high resemblance between standards from the same set. A low Kappa value near zero indicated that standards from the same set are not alike. We used ANOVA “One-Way Random” on SPSS Statistics, version 21 because we had inconsistent raters/rates. The diversity of numerical data from groups or clusters, drove us to use ICCC. This helped us in detecting reproducibility as well as how closely peers resemble each other regarding to certain traits or characteristics. We evaluated the agreement between two ordinal scale classifications. Henceforth, we used Weighted Kappa (Quadratic Weights) because the data came from an ordered scale.
We used linear weights as the difference between the first and second category had the same importance as the difference between the second and third category, and so on. Agreement was quantified by the Kappa (K) statistic.18,19 where K equals 1 when there is perfect agreement between the classification systems; K equals 0 when there is no agreement better than chance; and K is negative when agreement is worse than chance. The K value can be interpreted as shown in Table 1.20