Methods
Ethical approval was obtained (University of Sheffield 022646). Four consultants working within the neurology department of Sheffield Teaching Hospitals NHS Foundation Trust with 0, 6, 16, and 23 years’ experience participated in the study.
The first 200 consecutive cases seen by the first consultant immediately after qualification were anonymized to include only the history, examination findings, and any investigation results already available at the time of initial assessment. As this study was exploratory, a sample size calculation was not performed, and was based on previous studies3. To assess validity of the cases sampled, and place subsequent results into clinical context, cases were divided into disease groups mapped to the UK national neurology curriculum5, based on primary presenting complaint. Each case was included in only one disease group. In cases where multiple symptoms were present, the symptom deemed most relevant to the clinical question was chosen. The number of major topics in the UK Neurology Curriculum5 covered by each disease group was compared to the percentage of cases in that group observed in practice.
Consultants were asked to assess each of the real-world cases as if they had seen the patient in clinic themselves using a standardised questionnaire (Appendix 1). Briefly, they were asked to provide a single primary diagnosis, a differential diagnosis if relevant, and to rate their diagnostic certainty on a 1-10 Likert scale (1=Completely uncertain, 10=Completely certain). They were asked whether they would arrange any investigations and, if so, to specify them. Diagnostic peer-agreement was defined on an ordinal scale of 4 to 1 for each case as follows: 4= concordant diagnoses between all four consultants; 3= concordance between three consultants; 2= concordance between two consultants; 1=complete discordance. Similarly, for each case, each individual consultant was assigned a diagnostic agreement score: 3=all three colleagues agree with their diagnosis; 2=two other consultants agree; 1=one other consultant agrees, 0= no other consultants agree.
Essentially identical diagnoses with minor differences in wording, judged clinically unimportant by the two principal authors (CA, TJ), were categorized as agreement. At the end of the study, the importance of any variance in diagnoses for each case was assessed by the consultant in charge of clinical care on a six-point Likert scale, taking into account subsequent investigation results and clinical follow-up data when available, and making an overall judgement as follows: 1= Discordant diagnoses could result in a very important difference in management with potentially severe consequences e.g. in patient with a headache, migraine versus glioblastoma; 2=A difference in management could result with potentially serious consequences e.g. epilepsy versus dissociative non-epileptic attacks; 3=A difference in management could result but with less serious potential consequence e.g. migraine versus tension headache; 4=A difference in management could result but with less immediate management consequences e.g. Parkinson’s disease dementia versus Lewy body dementia; 5=A difference in wording but no meaningful difference in management e.g. tremor secondary to basal ganglia haemorrhage vs vascular parkinsonism; 6=Complete concordance. Percentages in each category were reported. Categories 1 and 2 were combined to define cases with important potential differences in clinical management. Data were reported for the cohort as a whole and by disease group.
Mean diagnostic certainty was calculated, with standard deviations, for each consultant for each disease group. Logistic regression models were applied using IBM SPSS Statistics for Windows, version 25 (IBM Corp, Ill., USA) to assess associations between diagnostic certainty, agreement and whether investigations were requested. The following metrics were reported for each individual rater: years’ experience; number and percentage of cases investigations were arranged; mean investigations arranged per case; mean and median diagnostic agreement (with standard deviations); and Spearman’s correlation coefficients between diagnostic agreement and diagnostic certainty (as a surrogate measure of accuracy of judgement of case difficulty). Ordinal logistic regression models were performed post-hoc to further explore associations between diagnostic agreement and diagnostic certainty, with diagnostic agreement entered as the dependent variable. In this model, a negative regression coefficient indicates that an increase in the independent variable (diagnostic certainty) is associated with an increase in the dependent variable (diagnostic agreement). Binary logistic regression models were performed to explore associations between the decision to arrange investigations and diagnostic certainty, with decision to investigate (yes/no) specified as the categorical dependent variable. In this model, a negative regression coefficient indicates that an increase in the independent variable (diagnostic certainty) is associated with lower odds of arranging investigations (i.e. dependent variable score=0 rather than 1). Odds ratios with associated 95% confidence intervals and p values were reported. P<0.05 was considered statistically significant.