What factors influence cellular pathologists’ confidence in case reporting?

Predictor variables

The overall reporting experience of the 16 pathologists was between 3 and 35 years. In terms of routine DP reporting experience, five pathologists had no experience, with the rest having up to 5 years of experience (Fig. 1).

Fig. 1figure 1

Pathologists’ reporting experiences

71.5% of cases were considered routine, 8.1% moderately difficult and 20.4% difficult to report cases. Of the 16,187 diagnoses with corresponding confidence scores, 89% of diagnoses showed complete agreement with GT, whilst in 4.8% of the reports, there was CID between a proffered diagnosis and GT diagnosis. There was complete agreement between LM and DP diagnoses for 91% of the diagnoses and CID in 4% of the diagnoses.

Diagnostic confidence

Figure 2 shows the confidence scores for LM (top row) and DP (bottom row), for a range of variables (1 = lowest confidence score, 7 = highest confidence score). Overall, we see high confidence, with most diagnoses given one of the two highest scores. Confidence was slightly higher for LM diagnoses than for DP. As expected, for LM diagnoses, there was no relationship between diagnostic confidence and DP reporting experience (top right graph).

Fig. 2figure 2

Percentage of different diagnostic confidence scores for LM (top row) and DP (bottom row) diagnoses in different categories of the potential predictor variables. Each score has an assigned colour, with dark blue corresponding to the highest diagnostic confidence score of 7. CA, complete agreement; CUD, clinically unimportant difference; CID, clinically important difference; yrs, years

Similar trends were observed within the DP and LM modalities in terms of the relationship between confidence and the predictors. Confidence was lower for difficult to report cases when compared to routine and moderately difficult cases. Confidence was also noticeably lower when there was a CID between a pathologist’s diagnosis and GT or when there was a CID between LM and DP diagnosis (i.e. in cases where inter- and intra-observer diagnostic discrepancies existed). In comparison to pathologists with least experience (3–10 years), pathologists with moderate experience (10.5–20 years) are less confident, whilst the difference with pathologists with the most experience (21–35 years) is very small.

Over all data, all variables investigated were found to be significantly predictive of diagnostic confidence (Table 1). Diagnostic confidence was lower for DP reporting than LM reporting (rate ratio 1.09 (95% CI 1.01–1.18), p = 0.035). Diagnostic confidence was highest for routine cases and lowest for difficult to report cases (p < 0.001). Compared to when there is CA between a report’s diagnosis and GT, diagnostic confidence was lower when there is a CID (p = 0.002) or when there is a CUD (p < 0.001) with confidence lowest for the latter. Compared to where pathologists LM and DP diagnoses CA, diagnostic confidence was significantly lower when there was CUD between LM and DP diagnoses but not significantly lower for CID, indicating pathologists can still be confident when they give a different diagnosis for a case they have reported previously (i.e. in instances of intra-observer variability on multiple assessments of the case). As reported previously, there is high LM-DP intra-observer agreement for the cases in this study [7].

Table 1 Assessing predictors for diagnostic confidence using random effects generalised Poisson models

Confidence was lowest for breast diagnoses followed by renal diagnoses but the difference between the two is not significant (p = 0.380). The adjusted analysis gives the effect of a predictor after adjusting for other factors, thus as renal cases were all considered difficult to report, and difficult to report cases have low diagnostic confidence scores, when you adjust for difficulty, it is not surprising that renal was not the lowest confidence speciality. Confidence was highest for GI diagnoses.

Findings were similar when LM and DP data were analysed separately. The only noticeable difference was within the DP-only analysis, where pathologist’s overall reporting experience failed to reach significance (p = 0.083). Pathologist’s DP reporting experience was also found non-predictive of confidence (p = 0.78). This may be attributed to relatively few years of routine DP reporting experience by most pathologists but also to the lack of correlation between years of pathologists’ reporting experience and the ability to make a diagnosis of certain lesions using either diagnostic modality.

Lowest confidence cases

There were 35 diagnoses where the pathologist had rated their diagnostic confidence as 1–3. This was split across 31 cases (in some instances, the pathologist gave a low score on both LM and DP). Of these 31, 14 were breast cases (2.3% of 608), 3 were GI (0.5% of 607), 6 were skin (1.0% of 609), and 8 were renal (4.0% of 200).

Only six (out of 16) pathologists contributed to the 35 low confidence diagnoses scores, despite the cases being split across all four specialities. It could be postulated that this is due to these pathologists having less experience; however, the results demonstrate a non-monotonic relationship between experience and confidence. Therefore, this relatively small group of pathologists contributing all the low confidence cases may be due to other factors, including individual variation in self-scoring.

Low confidence and case quality

Diagnostic confidence can be affected by the quality of glass slides or WSI. In these low confidence cases, several pathologists commented on the quality. For the LM low confidence cases, quality concerns included marginal biopsies, poor IHC, and a faded section. With DP cases, similar quality concerns were raised, but additional quality issues due to digitisation were also reported including scanned slides out of focus, poor quality or not high enough resolution. This suggests that overcoming these quality control issues, as could be done in practice, would increase reporting confidence.

Low confidence and case difficulty

The majority of the 31 low confidence cases (58.1%) were classified as difficult to report, in keeping with the notion that confidence falls in complex cases (Table 2).

Table 2 Details of the 31 low confidence cases. This lists all cases where a low confidence score was given (scores 1–3), along with the difficulty level, the confidence score by the same pathologist in the other diagnostic modality and the ground truth diagnosis (diagnostic confidence is given as 1–7 in which 1 is the lowest)

The GT diagnoses for these cases show some that are known to be difficult diagnostic areas within each speciality. For example, in the breast, there were rarer diagnoses including encysted papillary carcinoma and a lymphoid neoplasm, as well as B3 lesions which are a known area of diagnostic complexity. In the skin, there were also challenging areas including melanocytic lesions such as Spitz naevus and lentigo maligna and inflammatory skins such as erythema multiforme and a psoriasiform drug reaction. However, there were also some surprising diagnoses that are commonly reported. These include tubular adenoma with low-grade dysplasia, squamous cell carcinoma and fibroadenoma, for which no comments on poor quality were made. In several cases, pathologists commented that they would like to do further work before diagnosing such a case, including IHC and reviewing the case with colleagues. This is something that would be done in practice so likely to improve confidence beyond what was reported.

Low confidence and diagnostic discrepancy

In one-third (33.3%) of DP and half (52.9%) of LM low confidence diagnoses, there was a CID compared to GT, which is substantially higher than the overall values of 4.7% for LM and 4.8% for DP (as reported in Supplementary Table 1), indicating that there were several instances where pathologists were uncertain about their diagnosis, which subsequently corresponded to a diagnostic error. In these cases, the low confidence is most likely based on the awareness of making a difficult diagnostic judgement call.

Diagnoses with high confidence but clinically important differences

There were a total of 514 diagnoses with a CID compared to the GT and a confidence score of 7. Although this is a small number of the total 16,187 diagnoses (3.2%), across all study diagnoses, there was a total of 765 CID diagnoses, highlighting that the majority of these clinically important incorrect diagnoses actually had a confidence score of 7.

Of these 514 cases, 251 (48.8%) were made on LM and 263 (51.2%) were on DP, with 174 occurring in the breast (3.5% of breast diagnoses), 250 GI (5.1%), 87 skin (1.8%) and 3 renal (0.2%). The types of CID errors between the diagnosis and GT were classified as above.

In some breast and skin cases, multiple errors were attributed to a single diagnosis, e.g. the diagnosis contained both a grading and IHC error, meaning there were 531 error types attributed to these 514 diagnoses. Table 3 shows the spread of different error types seen across each speciality, showing that for the breast, skin and GI, the main error type was diagnostic errors, which include tumour typing errors.

Table 3 Comparison of the different types of high confidence errors across specialitiesTypes of high confidence diagnostic errors by speciality Breast diagnostic errors

Supplementary Table 2 shows a subcategorisation of the different types of breast errors. Excluding errors related to tumour subtyping (in which a malignant diagnosis was given but there were differences in the tumour type), the three most common types of breast diagnostic errors were B3 versus B2, (25.6%), B2 versus B3 (11.6%) and B1 versus B2 (9.9%). Another common source of errors was the presence or absence of atypia in B3 lesions.

Perhaps the most concerning error is a case where the GT diagnosis was B5b, but the pathologist diagnosis was B2. This happened in two instances (a single pathologist missed the same lesion on both LM and DP) which was missed lobular carcinoma.

An important error to note is the two instances where low-grade lymphoma was missed in a lymph node which instead was called normal. In practice, this case should have been seen by a lymphoma colleague or had a basic panel of IHC performed, but if the pathologist was highly confident in their diagnosis of a reactive lymph node, this may not have been instigated.

GI diagnostic errors

Missed high-grade dysplasia (the study pathologist reported low-grade dysplasia when the GT was high-grade dysplasia) was the most common error and accounted for 15.4% of the GI diagnostic errors (Supplementary Table 3). Overcalling dysplasia (where the GT was low-grade, but the pathologist reported high-grade) was also fairly common, accounting for 6.5% of GI diagnostic errors.

Another very common error was the differentiation between sessile serrated lesions (SSL) and hyperplastic polyps. In 14.0% of cases, the GT was SSL but the study pathologist called hyperplastic polyp, and in 6.1% of cases, the reverse was true (GT was hyperplastic, but study pathologist called SSL). This is also known to be an area of diagnostic complexity between pathologists.

Finally, a common diagnostic error was missing microorganisms. Missed microorganisms included Helicobacter pylori, Spirochetosis and Candida.

Skin diagnostic errors

The most common diagnostic discrepancies within skin pathology had to do with the subtyping of basal cell carcinomas (BCC) (Supplementary Table 4). This included both missing a high-risk subtype, e.g. infiltrative or overcalling a high-risk subtype, with this accounting for almost 40% of skin diagnostic errors, although this is likely due to the high frequency of BCC in the data. There were a few cases of benign versus malignant melanocytic lesions (Spitz naevus versus melanoma, benign naevus versus lentigo maligna for example) which is known to be a difficult area of skin pathology.

Renal diagnostic error

There was a single renal diagnostic error, with a GT diagnosis of no evidence of rejection, but a given diagnosis of borderline changes of T-cell-mediated rejection.

留言 (0)

沒有登入
gif