Highly performing automatic detection of structural chromosomal abnormalities using Siamese architecture

Chromosomes contain the genetic information of every cell. Chromosomal abnormalities are divided into two categories: abnormalities in the number of chromosomes (gain or loss of chromosomes) or changes in their structure (known as structural chromosomal abnormalities (SCA) such as deletion, translocation, inversion, etc.). Many recurrent chromosomal abnormalities have a strong diagnostic or prognostic value in cancers, especially in hematological malignancies, or in inherited genetic diseases. Therefore, detection of chromosomal abnormalities is crucial for therapeutic management and follow-up of many diseases [1].

Human cells have 23 pairs of chromosomes: 22 pairs of autosomes (numbered from 1 to 22), and a pair of sexual chromosomes (X or Y). In genetic laboratories, karyotyping consists in the preparation and staining (R or G staining) of chromosomes, followed by their imaging by microscopy and subsequent analyses. Karyotyping enables to build an image of the 23 pairs of chromosomes in an ordered manner, known as karyograms. An example of karyogram is presented in supplemental Figure S1. Each pair of chromosomes is characterized by a specific banding pattern due to chromatin condensation and molecular composition. Both chromosomes of a given pair display identical banding (see supplemental Figure S1 and Fig. 1). This specific banding pattern is schematically represented by an ideogram that can be considered as a template. An example of the ideogram of chromosome 5 is provided in Fig. 1a.

Karyotype analysis primarily relies on the visual inspection of karyogram images performed by medical experts. This analysis requires a high expertise, in addition to being tedious and time-consuming. Thus, automating the detection of chromosomal abnormalities has become an urgent need in order to help cytogeneticists to provide quick and reliable diagnoses of various genetic diseases, propose prognostic evaluations and guide treatments.

Previous works in computer vision have already addressed different chromosome issues: segmentation [2], [3], classification [4], [5], [6], [7], chromosome generation [8], trisomy detection [9], translocation detection [10], [11] and structural abnormality detection [12].

Wang et al. [11] described the automatic image processing for the detection of translocation between chromosomes 9 and 22 (also written t(9;22) or Philadelphia chromosome). This is a highly recurrent aberration found in chronic myeloid leukemia. The authors compared chromosomes 22 to its reference template to detect structural changes. This approach involved several image processing steps to extract chromosome 22, followed by an adaptive matching to the template and calculation of the similarity between the candidate and the reference template of chromosome 22. For the same abnormality, Pravalphruekul et al. [10] proposed to separate and extract only chromosomes 9 and 22, and then, train a CNN model to detect t(9;22) translocation.

It is a real challenge to develop a computer-assisted technique that enables the automatic detection of all types of structural chromosomal abnormalities (SCA). With the exception of translation t(9;22), this area of research is, to date, very limited: the sole paper in this context was proposed by Cox et al. [12]. Their database contained 13 recurrent abnormalities (deletions, inversions, isochromosomes and translocations) and two non-recurrent chromosomal abnormalities. The authors successfully achieved accurate classifications of recurrent abnormalities such as deletion del(5)(q22q35), deletion del(7q) or inversion inv(16) using two CNN models (VGG and ResNet), achieving a mean F1-score of 94.03% in their test set.

Our aim was to automatically detect any SCA without the need to train for each possible specific abnormality. Our proposed method was inspired from the practice of medical experts, which consists in the comparison of chromosome banding (i) with its template ideogram, and (ii) with the banding pattern of the other chromosome of the pair. To automatically perform this comparison, we used Siamese architecture, which has the particularity of assessing the correspondence between two or multiple images. Our contribution was to build a Siamese model that receives the images of both chromosomes of a pair as inputs. We prepared images, applied data augmentation and used the contrastive loss function during training. In addition, we studied the margin parameter of this contrastive loss function to choose the optimal margin value. After training, the test consisted in measuring the similarity between image of each chromosome of a pair.

As a proof-of-concept, we first focused on the deletion of the long arm of chromosome 5 (written del(5q)) due to its clinical value and high frequency. This abnormality is observed in acute myeloid leukemia and in 10-15% myelodysplasia. Fig. 1 illustrates normal pairs of chromosome 5 and pairs with a del(5q) deletion. The size of the deletion is variable from one patient to another. Identification of del(5q) chromosome abnormality is crucial because it will guide patients’ treatment and follow-up.

As a summary, our contribution consists in the development of an efficient Siamese model for the automated identification of deletion del(5q) in pairs of chromosomes. We prepared a dataset with two labels: normal pairs of chromosome 5 and pairs of chromosome 5 with deletion del(5q). We trained the Siamese architecture using widely used CNN models with different settings related to data augmentation and loss function. Moreover, we studied the potential of Siamese architecture to identify other types of abnormalities, including some that are more difficult to detect, such as inversion of chromosome 3. This work is a proof-of-concept that demonstrates the feasibility of using Siamese architecture to accurately identify multiple structural chromosomal abnormalities.

留言 (0)

沒有登入
gif