Methods
Ethical approval was obtained (University of Sheffield 022646). Four
consultants working within the neurology department of Sheffield
Teaching Hospitals NHS Foundation Trust with 0, 6, 16, and 23 years’
experience participated in the study.
The first 200 consecutive cases seen by the first consultant immediately
after qualification were anonymized to include only the history,
examination findings, and any investigation results already available at
the time of initial assessment. As this study was exploratory, a sample
size calculation was not performed, and was based on previous
studies3. To assess validity of the cases sampled, and
place subsequent results into clinical context, cases were divided into
disease groups mapped to the UK national neurology
curriculum5, based on primary presenting complaint.
Each case was included in only one disease group. In cases where
multiple symptoms were present, the symptom deemed most relevant to the
clinical question was chosen. The number of major topics in the UK
Neurology Curriculum5 covered by each disease group
was compared to the percentage of cases in that group observed in
practice.
Consultants were asked to assess each of the real-world cases as if they
had seen the patient in clinic themselves using a standardised
questionnaire (Appendix 1). Briefly, they were asked to provide a single
primary diagnosis, a differential diagnosis if relevant, and to rate
their diagnostic certainty on a 1-10 Likert scale (1=Completely
uncertain, 10=Completely certain). They were asked whether they would
arrange any investigations and, if so, to specify them. Diagnostic
peer-agreement was defined on an ordinal scale of 4 to 1 for each case
as follows: 4= concordant diagnoses between all four consultants; 3=
concordance between three consultants; 2= concordance between two
consultants; 1=complete discordance. Similarly, for each case, each
individual consultant was assigned a diagnostic agreement score: 3=all
three colleagues agree with their diagnosis; 2=two other consultants
agree; 1=one other consultant agrees, 0= no other consultants agree.
Essentially identical diagnoses with minor differences in wording,
judged clinically unimportant by the two principal authors (CA, TJ),
were categorized as agreement. At the end of the study, the importance
of any variance in diagnoses for each case was assessed by the
consultant in charge of clinical care on a six-point Likert scale,
taking into account subsequent investigation results and clinical
follow-up data when available, and making an overall judgement as
follows: 1= Discordant diagnoses could result in a very important
difference in management with potentially severe consequences e.g. in
patient with a headache, migraine versus glioblastoma; 2=A difference in
management could result with potentially serious consequences e.g.
epilepsy versus dissociative non-epileptic attacks; 3=A difference in
management could result but with less serious potential consequence e.g.
migraine versus tension headache; 4=A difference in management could
result but with less immediate management consequences e.g. Parkinson’s
disease dementia versus Lewy body dementia; 5=A difference in wording
but no meaningful difference in management e.g. tremor secondary to
basal ganglia haemorrhage vs vascular parkinsonism; 6=Complete
concordance. Percentages in each category were reported. Categories 1
and 2 were combined to define cases with important potential differences
in clinical management. Data were reported for the cohort as a whole and
by disease group.
Mean diagnostic certainty was calculated, with standard deviations, for
each consultant for each disease group. Logistic regression models were
applied using IBM SPSS Statistics for Windows, version 25 (IBM Corp,
Ill., USA) to assess associations between diagnostic certainty,
agreement and whether investigations were requested. The following
metrics were reported for each individual rater: years’ experience;
number and percentage of cases investigations were arranged; mean
investigations arranged per case; mean and median diagnostic agreement
(with standard deviations); and Spearman’s correlation coefficients
between diagnostic agreement and diagnostic certainty (as a surrogate
measure of accuracy of judgement of case difficulty). Ordinal logistic
regression models were performed post-hoc to further explore
associations between diagnostic agreement and diagnostic certainty, with
diagnostic agreement entered as the dependent variable. In this model, a
negative regression coefficient indicates that an increase in the
independent variable (diagnostic certainty) is associated with an
increase in the dependent variable (diagnostic agreement). Binary
logistic regression models were performed to explore associations
between the decision to arrange investigations and diagnostic certainty,
with decision to investigate (yes/no) specified as the categorical
dependent variable. In this model, a negative regression coefficient
indicates that an increase in the independent variable (diagnostic
certainty) is associated with lower odds of arranging investigations
(i.e. dependent variable score=0 rather than 1). Odds ratios with
associated 95% confidence intervals and p values were reported.
P<0.05 was considered statistically significant.