Molecular basis of mucopolysaccharidosis type II (Hunter syndrome): first review and classification of published IDS gene variants

IDS variant spectrum

A Pubmed and Google literature search was performed and, overall, 255 articles were collected; among these, 230 articles meeting our criteria were in depth evaluated (Fig. 1). Overall, 2852 individuals suspected or diagnosed with Hunter syndrome were collected; among them 24 female Hunter patients were described (Supplementary File - Table S2). In addition, 19 neonates that tested positive by NBS and whose diagnosis was further confirmed by second-tier tests were collected. Among the whole number of patients, 80 related subjects belonging to 29 families were reported; more specifically, 19 couples of siblings, 4 groups of 3 siblings, 1 family with 16 members, 1 family with 4 members, 2 families with 3 members and 2 families with 2 members were reported. Thus, the total number of families evaluated is 2798.

Fig. 1figure 1

Flowchart reporting the collection and filtering of the articles analyzed as well as the number of subjects included in the study

A number of 2133 families carried point variants, while 424 carried large deletions/duplications or complex rearrangements. For the remaining 241 subjects/families evaluated, either no variants were detected or reported, or the genetic analyses were not carried out. In addition, patients with a normal Southern blot pattern for whom no further analyses were described, were included in this group. 779 unique point variants were detected, while for large deletions/duplications or complex rearrangements, it was not feasible to define the unicity of all genetic alterations as, for most of them, the boundaries were not defined at nucleotide level. Therefore, these alterations were grouped per pattern similarity, as described in the related section, below.

Once collected, the IDS variants reported in the articles were checked for their annotation. Annotation check of point variants was performed by Name Checker and Variant Validator, revealing several misreported variants that, when possible, were corrected accordingly. However, for some variants it was not possible to obtain a consistent annotation, given the erroneous or incomplete annotation reported in the original article, mainly due to the lack of guidelines for the correct annotation at the time of publication. In several cases, variants reported in a textual form were interpreted and properly annotated.

The analysis of the distribution by variant type of the 2798 families is reported in Fig. 2.

Fig. 2figure 2

Distribution of families (n = 2798) per variant type

Most families (1129; 40.4%) carried missense variants, followed by large deletions-insertions and complex rearrangements (424 pts; 15.2%), small frameshift deletions/insertions (360 pts; 12.9%), nonsense (295; 10.5%), synonymous (126 pts, 4.5%), splicing (125 pts; 4.5%), small inframe deletions/insertions (42 pts; 1.5%), intronic, 5’ and 3’ UTR-located variants (31 pts; 1.1%), startloss (1 pt; 0.04%). In addition, 24 patients (0.9%) carried two or more variants in cis. Figure 2 also includes 241 families (8.6%), where variants were not reported, not found or not recoverable from the text of the publication, being some of these papers mainly clinical works, mostly reporting phenotypic features of the patients. In addition, more than half of these cases were described before 1994, when molecular diagnosis of the patients was mainly conducted by Southern blot analysis. Furthermore, in about one hundred cases, though published in the following years, variants were not found or, if found, they were not reported, since they were not novel [37] and therefore likely considered of no interests for the readers. Finally, in a few cases only the polymorphism c.438 C > T [p.(Thr146Thr)] was identified/reported, but no pathogenic variants were described.

The 19 neonates detected by NBS and confirmed positive by a second-tier test, showed a spectrum of 10 variants, one of which is an IDS-IDSP1 recombination associated with inversion [38]. As for the other 9, four of them had been previously described in Hunter patients (c.817 C > T, c.1025 A > G, c.311 A > T and c.1400 C > T). The other 5 variants were only identified in NBS cases (c.142 C > T, c.1405 C > G, c.779 C > G, c.254 C > T and c.1007–1666_c.1180 + 2113delinsTT). An extremely low enzyme activity was referred for all of these 9 subjects, although some of them were still asymptomatic at the time of the second evaluation [39]. Instead, for three of them (carrying the variants c.142 C > T, c.1405 C > G and c.779 C > G) the clinical confirmation was already obtained at the time of first description, with a diagnosis of a mild form [40]. The detection of absent or extremely low IDS enzyme activity in NBS, confirmed by a second-tier test, should provide enough certainty on the Hunter diagnosis, considering that only a few pseudodeficit cases have been described so far for the disease [39].

IDS point variants

On the whole, we collected 779 unique point variants (Supplementary File - Table S3), distributed as reported in Fig. 3: 42.4% missense, 32.7% small frameshift deletion/insertions, 11.3% nonsense, 6.5% splicing, 4.0% small inframe deletions/insertions, 2.3% intronic or involving 5’UTR or 3’UTR, 0.6% synonymous, 0.1% startloss.

Fig. 3figure 3

Distribution of IDS gene unique point variants (n = 779) per variant type

As depicted in Fig. 4, although IDS exonic point variants occurred throughout the length of the IDS gene, several hotspot codons are apparent: codon 468 (6 different variants, overall frequency n = 170 families), codon 374 (1 variant, n = 120 families), codon 88 (6 variants, n = 90 families) and codon 333 (4 variants, n = 68 families) (Table 1). Moreover, if we consider single exons, the more affected is exon 7 with 0.35 point variants per basepair, followed by exons 3 and 4 (0.31 and 0.21 variants/bp respectively); the least affected is exon 6 with 0.04 variants/bp.

Fig. 4figure 4

Schematic representation of the distribution of exonic point variants on the IDS protein

Table. 1 IDS point variants located in the hotspot codons

As regarding the distribution of all unique point variants per range of frequencies, more than one half (57.4%) were reported only in one family and 36.1% were detected in a range of frequency going from 2 to 5 families. Only the remaining 6.5% unique point variants were reported with a frequency ranging from 6 to 118 families (Fig. 5).

Fig. 5figure 5

Distribution of IDS gene unique point variants per range of frequencies

As shown in Table 2, the ten most reported point variants were seven missense, two nonsense and one synonymous and, overall, they were detected in 20.9% of all families. The most frequent point variant was the synonymous variant c.1122 C > T [p.(Gly374Gly)] which was described in 118 families worldwide, followed by c.1403G > A [p.(Arg468Gln)] detected in 87 families and by c.1402 C > T [p.(Arg468Trp)] reported in 73 families. c.998 C > T [p.(Ser333Leu] and c.1327 C > T [p.(Arg443*)] were reported in 64 and 54 families respectively.

Table 2 First ten most frequently described point variants by reported ethnicity/countryLarge deletions/duplications and complex rearrangements involving IDS gene

Overall, we collected 187 patients carrying large deletions/duplications, and 238 patients carrying complex rearrangements, for a total of 425 cases, the 14.9% of all described patients. However, analysis of these two classes of variants could not be conducted in detail, as it was previously described for point variants, since for most of them the nucleotide edges were not precisely described and, in many cases, not even investigated. In addition, in the last 30 years the annotation used to describe the variants has widely changed, and this may presently result in a difficult or limited comprehension of the variants described in the original papers.

As for large deletions/duplications, they were more frequently located in exons 6, 4 and 5 respectively, although this class of genetic alterations affects all exons, less the promoter region. Due to the undefined boundaries reported in most cases, their unicity could not be unequivocally determined.

Also, for complex rearrangements, it was not feasible to define the unicity as, for most of them the boundaries of the rearranged sequences were not defined at nucleotide level. Thus, for the sake of simplicity, we grouped complex rearrangements in 6 subgroups according to the pattern similarity that was possible to infer by the article reporting each variant (Table 3). The first 4 groups include different types of recombinational events likely caused by the great level of similarity of IDS gene and its pseudogene IDSP1: on the whole, we collected 180 patients carrying this type of rearrangements. This represents a sort of peculiarity of this gene, since its pseudogene has been annotated as a low-copy repeat (LCR) in the human genome [41].

Table 3 Complex rearrangements classification according to pattern similarity

Type 5 and type 6 groups include rearrangements other than the recombinational type: 41 patients with complex rearrangements likely not caused by recombinational mechanisms and not further specified, and only 14 patients in which the rearrangement was well-characterized (at exon or at nucleotide level).

Thus, given the data reported above, if we consider all types of variants, including point variants, large deletions/duplications and complex rearrangements, the most reported variant is the recombination between IDS gene and its pseudogene IDSP1 associated with inversion, with 130 patients carrying this rearrangement. Since this could not be considered a unique variant, as most cases were not characterized at single nucleotide level, the reported ethnicities for this variant were separately described in Table 4.

Table 4 Alleles carrying the IDS-IDSP1 recombination combined with inversion divided by reported ethnicity/country. Only countries/ethnicities with ≥ 3 detected alleles are listed. The other countries/ethnicities are included in the category “all other countries”Geographic distribution of IDS variants

Patients’ geographical information was available for 2231 individuals (78.8%), with subjects originating from all continents. Genetic heterogeneity was apparent among all populations. However, if we consider the ratio between the number of unique variants reported in a specific population and the total number of alleles described for the same population, individuals from Mexico, India, Korea and Argentina were the most heterogeneous, whereas those from Russia and Brazil were the least heterogeneous. China and Russia were the most highly represented, contributing 16.3% and 10.2% of all alleles, respectively, followed by Brazil (8.4%), Japan (6.8%) and India (5.8%) (Table 5).

Table 5 Most common variants for the 5 most frequent nationalities/ethnic backgrounds reported. For each country, only variants with ≥ 5 detected alleles are listed. The other variants are included in the category “all other variants”Hunter disease in females

As MPS II is an X-linked inherited disease, it affects principally male children while only 24 cases of female patients (Supplementary File - Table S2) have been so far described in literature. Due to the clinical severity of most patients, coupled with the precocious death, male patients generally do not have children. Therefore, Hunter females very rarely originate from the combination of two mutated alleles, each inherited from one of the 2 parents. They mostly are the result of two different events: the presence of a pathological variant (de novo or mother-inherited), and the selective inactivation of the X chromosome (skewed X-inactivation) inherited by the father, thus leading to the expression of the mutated allele in a heterozygous background. Among the 24 cases of Hunter females here reported, in only one case (subject 7) the same missense variant was inherited from both parents, of French Gipsy origin [42], thus being so far the only reported case of homozygous MPS II patient. As for the other 23 cases described, subjects are all carriers of only one mutated allele. Except in one case (subject 20), where the girl was symptomatic due to the presence of only one X chromosome, being a Turner patient [43], all of the other cases likely resulted symptomatic from an imbalanced X-chromosome inactivation. Overall, in the 24 female patients, 20 different variants were identified: 7 missense, 1 nonsense affecting 2 subjects, 3 frameshift, 3 events of recombination between the IDS gene and the IDSP1 pseudogene, 4 large deletions and 2 balanced reciprocal translocations. Eleven patients had inherited the variant, 8 of them represented de novo cases, for 2 cases inheritance was uncertain; for the 3 remaining subjects, inheritance from the parents was not assessed.

According to Cook (2019) [44], interchromosomal recombinational events between X chromosome and an autosome may somehow unbalance skewed inactivation, favoring the expression of the recombined chromosomes, thus guaranteeing a balanced situation. Consequently, if any of the genes involved in the recombination event carries a pathogenic variant, the female subject will show the related X-linked disease. In the cohort of females reported in Table S2, this is likely the case of subject 1, carrying a balanced reciprocal translocation between chromosome X and 5, and of subject 18, carrying a de novo balanced reciprocal X;9 translocation.

Genotype-phenotype correlation and in vitro functional studies

Given its X-linked inheritance, genotype-phenotype correlation in MPS II could be considered quite easy as only a single allele has to be correlated to the reported phenotype. However, from a genetic point of view, MPS II is also a very heterogeneous disorder, where only 2.7% of unique variants were reported in a range of frequency from 8 to 118 families (data reported above), and more than 50% of the variants were described in only 1 family. This great heterogeneity makes genotype-phenotype correlation very challenging and in most cases merely unfeasible.

Moreover, the criteria used to define the clinical phenotypes in literature are not homogeneous and, in most cases, are not reported, especially in the oldest publications. For the 21 unique point variants showing a frequency equal or above 8 families, we collected and analyzed the reported phenotype (Fig. 6a). We evidenced that for 12 variants the severe phenotype was the predominant category. These variants include 8 missense, 2 nonsense, 1 frameshift and 1 inframe deletion. Notably, among the missense variants, c.262C > T and c.263G > A affect the Arg88 residue, located in the catalytic core of the protein [20]. For 6 variants (5 missense and 1 synonymous) the mild phenotype was more frequent than the other categories. Finally, for 3 variants (2 nonsense and 1 missense), the most frequent category was the ‘’unknown/unreported’’ phenotype.

Figure 6b reports these 21 variants analyzed taking into consideration only cases associated with a clinical phenotype (73.9% of families), thus excluding the not reported/unknown phenotypes.

Five variants are associated with severe/neuronopathic phenotypes in all reported cases, 8 variants in ≥ 80% of the cases. Only one variant is associated with attenuated phenotypes in all reported cases and 2 variants in ≥ 80% of the cases. The remaining 5 variants are described in literature with different proportions of discordant phenotypes: severe, intermediate and attenuated.

Overall, families reported with a severe clinical phenotype were 68.3%, the two-thirds of all reported phenotypes, as commonly described for the disease [3, 45].

On the other side, all families reported with the variant c.230 C > A (19 families) present an attenuated phenotype, as well as four out of 5 families carrying the variant c.1037 C > T and 15 out of the 18 cases reported with the variant c.187 A > G; on the whole 38 cases. The remaining five variants (c.1122 C > T, c.1327 C > T, c.253G > A, c.1019G > A, c.22 C > T) are described in literature with different proportions of discordant phenotypes: severe, intermediate and attenuated (Fig. 6b).

However, these analyses should be considered carefully, as for most of the reported phenotype, the criteria used for its definition are not known, especially in the oldest publications.

Fig. 6figure 6

Genotype/phenotype correlation for point variants with frequency above or equal to 8

For 13 of these variants, data from in vitro expression analyses were also available and, in most cases, they were consistent with the reported phenotype. In vitro protein expression and immunoblotting analyses were conducted in several cell types, mainly in COS-1, COS-7, human fibroblasts, CHO, HEK293 cells. Data related to the most frequently reported among them are described below.

The most frequent point variant c.1122C > T, is a synonymous variant in terms of amino acid coding [p.(Gly364Gly)], which however activates a 5’ cryptic splice site inside exon 8, causing the generation of a shorter transcript, lacking the last 60 bp of the exon [45]. Patients’ phenotype is known for 78 out of 118 patients, while for 40 of them phenotype was not described in the original paper. We choose to include in the “unknown phenotypes” also 4 patients whose severity had been based on cardio-pulmonary function, which is not a common criterion to discriminate between attenuated and severe patients, being most, if not all of them, cardio-pulmonary affected. Forty-seven of the 78 patients for whom a clinical phenotype was reported (60.3%) showed an attenuated/mild phenotype. According to Matos et al. (2015), this is due to the presence of some residual IDS activity since together with the mutant splicing form also the correct form is commonly detected in the patients, as previously reported by the same authors [45, 46].

Variant c.1403G > A, [p.(Arg468Gln)] described so far in 87 families, was also analyzed in vitro where IDS enzyme activity, evaluated both in COS cells [47] and in patient’s fibroblasts following transient transfection [48], was detected very low. Immunoblotting analysis, conducted in 1995 by Sukegawa et al., and more recently by Charoenwattanasatien et al., in 2012, also revealed an altered processing of the IDS protein leading to a defective cleavage to the mature form [48, 49]. Seventy-five out of the 87 families carrying this variant were reported with a phenotype, 74 of which (99.7%) associated with a severe clinical form.

Extremely low to absent enzyme activities were shown in vitro following expression of the variants c.1402 C > T [p.(Arg468Trp)], c.1327 C > T [p.(Arg443*)], c.262 C > T [p.(Arg88Cys)], c.253G > A [p.(Ala85Thr)], c.257 C > T [p.(Pro86Leu)] and others. In some of them, in vitro studies also highlighted an altered post-translational processing of the protein. For 2 of these variants, c.1327 C > T and c.253G > A, a discordant phenotype (from attenuated to severe) was reported, while patients carrying the other 3 variants commonly presented with a severe clinical form.

No in vitro expression data was instead available for eight variants shown in Fig. 7. However, for most of them a clear correlation with the phenotype could be observed. The severe phenotype correlated in 100% of the identified cases with the variants c.596_599del and c.1165 C > T, and in most cases carrying the variants c.998 C > T and c.514 C > T. An attenuated phenotype correlated in all the cases carrying the c.230 C > A variant, and in most cases reported with the variants c.187 A > G and c.1037 C > T.

As regarding large deletions/duplications and complex rearrangements, the genotype-phenotype correlation evidenced, as expected, that most patients carrying these types of variants shows a severe phenotype. More precisely, 72.7% of patients carrying large deletions/duplications show a severe phenotype, 3.7% a mild phenotype, 1.6% an intermediate and for 21.9% the phenotype is unknown or is not reported. Similarly, for 57.8% of patients carrying complex rearrangements a severe phenotype is reported, for 3% a mild phenotype, for 0.8% intermediate and for 38.4% is not reported or is unknown (Fig. 7a). Furthermore, if we take into consideration for these two types of variants only patients for whom a clinical phenotype was reported, who are 78.1% for the large deletions/duplications, and 61.6%, for the complex rearrangements, 93.2% and 93.8% of these cases present with a severe/neuronopathic phenotype, respectively (Fig. 7b).

Fig. 7figure 7

Genotype/phenotype correlation for large deletions/duplications and complex rearrangements

ACMG classification of IDS variants

The ACMG/AMP classification of the 779 IDS point variants collected in this study is reported in the Supplementary Table S3. It indicated that for most variants, enough evidence was available to classify them based on their pathogenicity. Indeed, most variants were classified as “pathogenic” (490; 62.9%) and “likely pathogenic” (276; 35.4%), Only, 13 point variants (1.7%) were classified as variants of “uncertain significance.” 768 classified variants and their associated pathogenic evidence were submitted to ClinVar, where they can be retrieved by the following accession numbers: SCV005088913-SCV005089669 and SCV000929879.1-SCV000929889.1. Ten variants were not submitted given their ambiguous annotations which were not accepted by ClinVar and one variant because the URL reference citation was not allowed.

Genotype and response to ERT

The response to ERT and more specifically the variation between patients in each efficacy outcome, may reflect differences in the age at start of therapy as well as other factors, like the influence of patient’s genotype [50]. Only a few studies evaluated the potential correlation between ERT efficacy and MPS II patients’ genotype. Barbier et al., studied 36 attenuated patients above 5 years of age and found that subjects carrying nonsense or frameshift variants were likely more prone to develop antibodies, infusion-related reactions, and to experience a limited urinary GAG response than those with missense variants [51]. An extension of this study to 27 severe patients and to a higher number of efficacy outcomes [52] evidenced after one year of treatment that patients with complete deletions/large rearrangements (CD/LR) and with frameshift/splice site variants (FS/SSM) had a lower decrease in liver size than those carrying missense variants (MS). Moreover, the average spleen volume was similar in the CD/LR genotype and in the MS genotype groups, but it was significantly higher in the FS/SSM with respect to the other groups. As regarding urinary GAG levels, also in this second study patients with CD/LR evidenced following ERT a less pronounced reduction than patients with MS. In addition, patients with the CD/LR genotype were more likely to develop antibodies to idursulfase than patients with the MS genotype, while the FS/SSM group fell between the CD/LR and the MS groups [52]. Specifically concerning the antibodies, these studies may suggest a limited production of antibodies where the protein, although altered and non-functional, is produced and therefore recognized as self, as in the patients carrying a missense variant. Instead, the total absence of the protein in subjects carrying severe gene alterations, as complete deletions or large rearrangements or frame-shift variants, not allowing the production of the protein, may cause an elevated immune-response against the recombinant IDS, being the protein completely unknown to the immune system, thus representing a non self antigen [51].

However, further studies on larger cohorts of patients are needed to confirm the potential correlations suggested by these studies.

Open issues

Since the characterization of the region distal to IDS gene and the discovering of its pseudogene IDSP1, many rearrangements due to intrachromosomal recombinational events between the homologous regions of IDS and IDSP1 have been reported. However, the relative high number of patients with no variants detected (241; 8.6%), reported in this study, as well as in our direct experience, arise the suspect that some recombinational events are still being missed by the standard molecular analysis. Indeed, being these recombinations balanced rearrangements, they are not detected by simple exons sequencing and they require additional approaches such as the rapid RFLP analysis protocol set by Lualdi et al. [53]. This should be a pivotal aspect to take into consideration when approaching the diagnosis of a Hunter patient. In addition, some undetected variants might be deep intronic variants that would be revealed only through second-level molecular analyses. Finally, the molecular analysis of IDS gene should not avoid to consider the promoter region of the gene, where a 128 bp deletion was detected in a few mild patients [54, 55].

One additional problem may be represented by the fact that often variants already published are under-reported in the following publications, being considered of limited or no interests for the readers. Such an approach represents a bias in the overall variant analysis, and it is a common problem of several genetic diseases. It causes an underestimate of the frequency of each variant in the general population, as well as in specific ethnic groups. Variants already described, identified in new subjects/families, should at least be communicated to open databases, with some essential demographic, geographical and clinical information. This would certainly change the overall molecular scenario.

Finally, to reduce the risk of variants misreporting, all IDS variants should be annotated according to the most recent HGVS nomenclature (Version 20.05), thus allowing an unambiguous and consistent description. To this aim, several in silico tools are available that may support in variants annotating (i.e. Mutalyzer -Name Checker, Variant Validator).

Future perspectives

As presented in this review, IDS is a gene prone to mutate, with more than half variants described in only one family (‘private mutations’). Furthermore, due to the presence of the pseudogene, sharing long sequences with the gene, which favors recombination events, additional genomic changes arise from this phenomenon, often generating genetic alterations difficult to properly define, due to boundaries undefined at nucleotide level. According to our records, patients carrying variants due to events of intrachromosomal recombination so far described account for 180 subjects. A deeper genomic investigation on these variants, as well as on other complex rearrangements or gross gene alterations, should be performed in the light of the new diagnostic approaches, available and more widely used in the last 10–15 years, as the whole genome or the whole exome sequencing analyses or others, depending on the query to be solved.

Update and correct classification of variants characterizing complex genes as IDS results very difficult to perform, but also extremely useful, helping to describe a scenario of the gene complexity and providing an updated summary, to be used for diagnostic purposes and in genetic counseling.

Being Hunter syndrome one of the most common LSDs, accounting in some countries for almost half of all MPS cases [56], the inclusion of the disease within the newborn screenings, followed by a second-tier test for confirmation, would be desirable, allowing an early identification of the patients, who still mostly suffer from delayed diagnosis.

An increased reporting of the identified IDS variants in public databases should be encouraged, favoring the exchange of information between different laboratories and reference centres, progressively helping a correct and timely molecular diagnosis, and preventing misdiagnoses.

留言 (0)

沒有登入
gif