Inter-rater analysis
We conducted inter-rater reliability assessment tests (IRR), to
determine the agreement level between raters. We used % agreement IRR
for every question in each domain in the four eligible SCD in pregnancy
CPGs to assess the level of agreement among the four raters as well as
the percent agreement of the first overall assessment (OA1) of the AGREE
II Instrument. Moreover, we used the Intra-class correlation and
measured the consistency of ratings or capacities for datasets that have
been gathered as clusters or arranged into clusters including the second
overall assessment (OA2 or ‘recommend this CPG for use’). Intra-class
correlation (ICC) is one of the most prevalent IRR approaches that is
used when we have more than two raters. We used it as we had more than a
couple of pairs. A high Intra-class-Correlation-Coefficient (ICCC or
Kappa) near one specified high resemblance between standards from the
same set. A low Kappa value near zero indicated that standards from the
same set are not alike. We used ANOVA “One-Way Random” on SPSS
Statistics, version 21 because we had inconsistent raters/rates. The
diversity of numerical data from groups or clusters, drove us to use
ICCC. This helped us in detecting reproducibility as well as how closely
peers resemble each other regarding to certain traits or
characteristics. We evaluated the agreement between two ordinal scale
classifications. Henceforth, we used Weighted Kappa (Quadratic Weights)
because the data came from an ordered scale.
We used linear weights as the difference between the first and second
category had the same importance as the difference between the second
and third category, and so on. Agreement was quantified by the Kappa (K)
statistic.18,19 where K equals 1 when there is perfect
agreement between the classification systems; K equals 0 when there is
no agreement better than chance; and K is negative when agreement is
worse than chance. The K value can be interpreted as shown in Table
1.20