Integration of long-read sequencing, DNA methylation and gene expression reveals heterogeneity in Y chromosome segment lengths in phenotypic males with 46,XX testicular disorder/difference of sex development

Participants

The study included 46,XX DSD (n = 11) with a mean age of 44.2 [SD: ±13.0] years. For comparisons we included: male controls (46,XY, n = 26; mean age: 42.8 [SD: ±12.0]), female controls (46,XX, n = 32; mean age: 44.1 [SD: ±12.4]), and males with KS (47,XXY, n = 22; mean age: 40.2 [SD: ±7.9]) (Fig. 1A). There was no significant age difference between 46,XX DSD and 46,XY (p = 0.8), 46,XX (p = 0.9) or 47,XXY (p = 0.6).

Clinical characterization

46,XX DSD had significantly lower height and weight than 46,XY, but there was no difference in body mass index (BMI), nor in hip- and waist circumference. Total body and total abdominal lean mass were also significantly lower in 46,XX DSD compared to 46,XY, whereas there was no difference in total body and total abdominal fat mass. Testicular height, width, and length were all considerably reduced in 46,XX DSD compared to 46,XY, resulting in a significantly lower testis volume (Table 1).

Table 1 Anthropometrics, blood pressure, body composition, bone mineral density, and testicular size in males 46,XX DSD and controlsY chromosome segmental heterogeneity in 46,XX DSD identified by long-read DNA sequencing

While previous studies have identified approximate breakpoints, the exact genomic content of the translocated Y chromosome segment and thereby genetic mechanisms underlying male sex differentiation among 46,XX DSD have not been established [30,31,32,33]. Therefore, we initially focused on evaluating the genetic material of Y-chromosomal origin in each 46,XX DSD individual with the application of long-read DNA sequencing (Oxford Nanopore). The read density of all reads mapping to the Y chromosome revealed heterogeneity (Fig. 2A). In some cases, breakpoints could be identified as multiple reads aligned to both the Y chromosome and X chromosome (split reads) (Supplemental Fig. 1B, C, Table 2). One individual with 46,XX DSD had reads restricted to the pseudoautosomal region 1 (PAR1), a 2.7 Mb region homologous between the sex chromosomes located distal on the p-arms, thus lacking Y-specific material, and thereby the SRY gene (referred to as “SRY negative”, n = 1). For this individual, no other genetic cause for the male sex development was identified. SRY was present in all other 46,XX DSD in the cohort (SRY positive) (n = 10). Two individuals with 46,XX DSD had reads limited to the distal part of Yp (referred to as “short Yp” 1 and 2, n = 2), with breakpoints on the Y chromosome situated distal of ZFY (short Yp 1: chrY:2,823 kb; short Yp2: chrY:2,782 kb) while breakpoints on the X chromosome were either proximal of CD99P1 (short Yp 1: chrX:2,191 kb) or within CD99 (short Yp 2: chrX:2,349 kb), both within PAR1 (Fig. 2C). One individual with 46,XX DSD had reads spanning about 75% of Yp (referred to as “medium Yp”, n = 1), with breakpoints identified distal of PRKY on the Y chromosome (chrY:7,275 kb) and between VCX2 and VCX3B on the X chromosome (chrX:8,018 kb) (Fig. 2B), which also have Y-linked homologues. The remaining 46,XX DSD had reads spanning most of Yp (referred to as “long Yp” 1–7, n = 7) (Fig. 2; Table 2). Repetitive DNA sequences in the proximal part of Yp, especially around FAM197Y- and TSPY-genes, complicated the alignment, and thereby identification of breakpoints in these cases. However, data strongly indicated that all had breakpoints close to the centromere, distal of the TSPY-genes (Supplemental Fig. 2). Thus, 46,XX DSD could be sub-categorized into “SRY negative”, “short Yp”, “medium Yp”, and “long Yp” based on the presence and length of the translocated Yp segments, with breakpoints situated outside of the most proximal part of Yp in SRY-positive individuals.

Fig. 2figure 2

Y-chromosomal heterogeneity in 46, XX DSD identified by long-read DNA sequencing. Based on read densities across the Y (A) and X (B) chromosomes for 11 46,XX DSD, three representative female controls (46,XX; orange) and three representative male controls (46,XY; blue), we identified Y-chromosomal heterogeneity among 46,XX DSD within our cohort. The 11 46,XX DSD were separated into four groups according to Y-chromosomal segment length; SRY negative (n = 1; white), short Yp (n = 2; light grey), medium Yp (n = 1; dark grey) and long Yp (n = 7; black) (A). No obvious differences were observed from the read density across the X chromosome, except for a decreased read density in the distal part of Xp for the 46,XX DSD individual with a medium Yp segment (B). A schematic illustration of the X-Y translocations is presented in (C). As expected, the SRY negative 46,XX DSD individual only had Y-chromosomal reads spanning PAR1, similar to observations for females. The 46,XX DSD with short Yp segments had breakpoints after ZFY (chrY: 2,782 kb; chrY: 2,823 kb) on the Y chromosome, continuing upstream of CD99P1 (chrX: 2,191 kb) and within CD99 (chrX: 2,349 kb) on the X chromosome. The 46,XX DSD individual with a medium Yp segment had a breakpoint after PRKY (chrY: 7,275 kb) on the Y chromosome, continuing upstream of VCX3B (chrX: 8,018 kb) on the X chromosome, where read density increased in (B). Identification of breakpoints for the seven 46,XX DSD with longer Yp segments was not possible, however, they had reads spanning most of Yp all the way up to the centromere. When no knowledge of X-linked breakpoints was observed the boxes were marked with “?”

A characteristic loss of Y chromosome material two thirds into Yp were observed for all “long Yp” individuals (Supplemental Fig. 1A-B) (discussed in further detail below). We did not find similar patterns of lost read density on the X chromosome in regions with X-Y homologues (Fig. 2B).

Table 2 Subgroups, X-Y breakpoints, PRKY-status and PRKY/PRKX/ARSD expression of 46,XX DSDY-chromosomal methylation and gene expression patterns

To support our findings, we employed gene expression profiling (RNAseq) and DNA methylation analysis of approximately 850,000 CpG sites (Infinium MethylationEPIC). CpG sites are regions in the DNA sequence comprising cytosine followed by a guanine nucleotide in the 5’ to 3’ direction, across the genome. These analyses compared 46,XX DSD to 46,XX, 46,XY, and 47,XXY.

When evaluating CpG sites located on the Y chromosome, we found an average of approximately 130 Y-linked CpG sites for 46,XX (unspecific binding) and approximately 530 CpG sites for both 46,XY and 47,XXY, while 46,XX DSD had a heterogenic in-between number of CpG sites, corresponding to the length of the translocated Yp segment (Fig. 3A). Clustering based on the Y-chromosomal methylome also revealed that 46,XX DSD clustered according to the amount of translocated Y chromosome – some in a separate group (long Yp, medium Yp) and others in closer proximity to 46,XX (SRY negative, short Yp) (Fig. 3B).

Fig. 3figure 3

Y-chromosomal methylation and gene expression patterns. DNA methylation and gene expression data from the Y chromosome for all included groups. The number of Y-linked CPG-sites detected from DNA methylation data were denoted for each group. 46,XX DSD had in-between numbers of CpG sites between the numbers for 46,XX and 46,XY/47,XXY, with the CpG counts for each 46,XX DSD reflecting their Y chromosome segment lengths (A). Multi-dimensional scaling analysis of CpG sites from the Y chromosome showed the same in-between pattern of 46,XX DSD – depending on Y chromosome segment length (B). Principal component analysis of gene expression data from the Y chromosome showed that all 46,XX DSD with Y chromosome material clustered by themselves, except for the SRY negative 46,XX DSD clustering with 46,XX (C). Normalized counts of differentially expressed Y-chromosomal genes, sorted by position on the Y chromosome and separated into Yp and Yq, showed that ZFY, RPS4Y1 and LINC00278 expression was present in all 46,XX DSD with Y chromosome material. In the medium and long Y groups a variable expression of PRKY was observed (D). Colors indicate karyotype (46,XX: orange; 46,XY: blue; 47,XXY: green) and Y chromosome segment length for 46,XX DSD (SRY neg.: white; short Yp: light grey; medium Yp: dark grey; long Yp: black)

Y-chromosomal gene expression showed that 46,XX DSD clustered separately from all other groups, except for the one SRY negative 46,XX DSD, who clustered with 46,XX (Fig. 3C). Among 46,XX DSD, four Y-chromosomal differentially expressed genes (DEGs) were upregulated compared to 46,XX; ZFY, PRKY, RPS4Y1 and LINC00278, and 13 DEGs were downregulated compared to 46,XY and 47,XXY (Table 3, Supplementary File 1). When annotating the location of these genes to Yp and Yq, we identified a marked upregulation of Yp-genes compared to 46,XX and an absence of Yq-genes (Fig. 3D), supporting our finding that Y-chromosomal sequences in 46,XX DSD were restricted to Yp.

Thus, both the Y-chromosomal methylome and transcriptome supported our finding of Y-chromosomal segment heterogeneity among 46,XX DSD.

Table 3 Differentially expressed genes (DEGs)Loss of Y chromosome segment spanning X-Y homologous regions

As mentioned above, a specific part of Yp was missing in all “long Yp” 46,XX DSD. This region contained three genes with homologues on the X chromosome; AMELY, TBLY1 and PRKY. We found a complete loss of both AMELY and TBLY1, but a variable partial loss of PRKY (1 x PRKY+, 2 x PRKY-, 4 x PRKY+/-) (Table 2, Supplemental Fig. 1C). The X-linked homologues to these genes, AMELX, TBLX1 and PRKX, had not been lost, but we found that 46,XX DSD had either mono- or biallelic partial PRKX deletions (Supplemental Fig. 3). Due to the variable PRKY patterns, we made an investigation of this region in respect to gene expression. According to their PRKY-patterns, some 46,XX DSD lacked PRKY expression, while others had levels close to those of 46,XY and 47,XXY (Fig. 4A; Table 2). Similar variation was observed for the X-linked homologue, PRKX (Fig. 4B). In general, we found no difference in the combined expression of PRKY and PRKX compared to 46,XX (Fig. 4C), however, we observed a higher expression of PRKX in 46,XX DSD lacking PRKY and vice versa, suggesting a compensatory mechanism and/or dysfunctional recombination between PRKY and PRKX (Fig. 4D). Based on this, we further sub-divided the 46,XX DSD into “PRKY high” and “PRKY low“ (Table 2, discussed in further detail below).

Fig. 4figure 4

Variable expression patterns in the X-Y homologous gene pair PRKY/PRKX. Normalized expression of X-Y homologous gene pair PRKY (A) and PRKX (B), illustrating a variable expression pattern in 46,XX DSD. The summed gene expression of PRKY and PRKX showed expression levels close to those of 46,XX (C). PRKY expression vs. PRKX expression indicated that high PRKY expression led to lower PRKX expression in 46,XX DSD (D). Colors indicate karyotype (46,XX: orange; 46,XY: blue; 47,XXY: green) and Y chromosome segment length for 46,XX DSD (SRY neg.: white; short Yp: light grey; medium Yp: dark grey; long Yp: black)

In conclusion, 46,XX DSD with longer Y-segments had a loss of specific Y-linked genes with X-linked homologues: AMELY and TBLY1. In addition, we observed a variable DNA loss and expression of PRKY, seemingly balanced by a variable expression of PRKX.

The parentage of PAR1

Next, we aimed to identify the parentage of PAR1 on the X-Y translocated chromosome. First, a de novo assembly of the sex chromosomes was made for each individual with 46,XX DSD. Second, sex-chromosomal gene sequences were aligned to the contigs arising from the de novo assembly. The majority (7/8) had PAR1 genes and the beginning of XG and the XG pseudogene XGY2 localized in the same contig. The continuation of the XG gene and subsequent X-linked genes were found in another contig, and the continuation of XGY2 and the subsequent Y genes in a third (Supplementary Fig. 4). This indicated that PAR1 of the X chromosome, containing Y material in 46,XX DSD, mainly originates from the paternal Y, as previously shown [34].

X-chromosomal and autosomal methylation and gene expression

As expected, cluster analysis based on the X-chromosomal methylome and the X-chromosomal transcriptome showed that individuals clustered according to the number of X chromosomes; 46,XX DSD, 47,XXY, and 46,XX clustered together, with 46,XY in a separate cluster (Fig. 5A, B)(Tables 4 and 5; Supplementary Files 2, 3). Differential methylation analysis of individual CpG sites (DMPs, deltaM > |1|, padj < 0.05) identified a single DMP that was specific for 46,XX DSD, situated in the promoter region of the transcription factor SOX3. We then identified differentially expressed genes (DEGs) (p adj < 0.05) (Table 3, Supplementary File 1). Only one X-chromosomal gene, ARSD, was differentially expressed in 46,XX DSD compared to both 46,XX and 46,XY. Expression of this gene was markedly downregulated in the “long Yp” 46,XX DSD compared to both control groups (Fig. 5C). This was in line with a previous study where the breakpoint of the X chromosome was found in the vicinity of the ARSE gene [30], located just downstream of ARSD.

Fig. 5figure 5

Autosomal and X-chromosomal methylation and gene expression patterns in 46,XX DSD. Multi-dimensional scaling analysis of DNA methylation (A) and principal component analysis of gene expression (B) from the X chromosome comparing 46,XX DSD, 46,XX, 46,XY and 47,XXY. Expression of the ARSD gene located on the X chromosome, which was differentially expressed in 46,XX DSD compared to both 46,XX and 46,XY. Here we observe a marked decrease in 46,XX DSD with medium or long Yp segments (C).The overall Log2FoldChange in X-chromosomal gene expression of all expressed escape genes, PAR1 genes and inactivated genes between 46,XX DSD and 46,XX, 46,XY and 47,XXY, as well as 46,XX and 46,XY (D). The individual gene expression changes between 46,XX DSD and 46,XX (red), 46,XX DSD and 46,XY (blue) and 46,XX and 46,XY (green) for genes expressed from PAR1 and PAR2 (E), X chromosome Inactivation (XCI) genes (F) and differentially expressed genes (padj < 0.05) in either 46,XX DSD vs. 46,XX or 46,XX DSD vs. 46,XY from the X chromosome (G). For panels E-G, the order of the genes was annotated according to chromosomal location (p to q arm). Multi-dimensional scaling analysis of DNA methylation (H) and principal component analysis of gene expression (I) from autosomes comparing 46,XX DSD, 46,XX, 46,XY and 47,XXY. Colors indicate karyotype (46,XX, orange; 46,XY, blue; 47,XXY, green) and Y chromosome segment length for 46,XX DSD (SRY neg.: white; short Yp: light grey; medium Yp: dark grey; long Yp: black)

Table 4 Differentially methylated positions (DMPs)Table 5 Differentially methylated regions (DMRs)

We annotated X-chromosomal genes as escape, inactivated, and PAR, and compared changes in overall expression. As expected, escape genes had a higher level of expression in 46,XX DSD compared to 46,XY (p = 2.2e-6) and in 46,XX compared to 46,XY (p = 1.4e-8) (Fig. 5D). The overall expression of PAR1 genes was decreased in 46,XX DSD compared to both 46,XX (p = 0.03) and 46,XY (p = 0.004). This suggests that the expression of PAR1 genes derived from a Y chromosome may decrease when translocated to an inactive X chromosome, or that reduced expression could result from partial spreading of X-inactivation as described by Tukiainen et al. [35]. The same applied to inactive genes, where 46,XX DSD had a slightly decreased expression compared to both 46,XX (p = 0.01) and 46,XY (p = 0.02). The decreased expression of PAR1 genes in 46,XX DSD was clearly observed when plotting the individual expressional changes for all expressed PAR1 genes (Fig. 5E). The most notable expression changes were observed near the end of the p-arm of the X chromosome; however, this may simply reflect variability in the expression of PAR1 genes. An upregulation of XIST in 46,XX DSD compared to 46,XY was also observed (Fig. 5F) and X-linked DEGs were generally upregulated in this contrast as well (Fig. 5G).

We further investigated autosomal DNA methylation and gene expression. Cluster analysis of autosomal DNA methylation showed clustering of 46,XX DSD with 47,XXY, and of 46,XX with 46,XY (Fig. 5H). Furthermore, autosomal hypermethylation was observed in 46,XX DSD (Table 4, Supplementary File 2), supporting the finding of the clustering analyses (Fig. 5A). Forty-nine autosomal DMPs were unique to 46,XX DSD when compared to both 46,XX and 46,XY. Extending the analysis to the regional level (Table 5, Supplementary File 3), four DMRs were specific to 46,XX DSD, located in the genes GFOD2, RASAL2, UNC5D and EBF4.

Cluster analysis of autosomal gene expression showed clustering of 46,XX DSD with 46,XX, and 47,XXY with 46,XY (Fig. 5I), the latter in accordance with previous reports [34]. Three autosomal DEGs were unique for 46,XX DSD compared to both 46,XX and 46,XY; LFNG, KLHL24, and C7orf26 (INTS15).

Finally, X-chromosomal and autosomal DEGs were used as input for pathway enrichment analysis. This showed enrichment within DSDs (Turner syndrome, gonadal dysfunction, spermatogenic failure, and 46,XX DSD) and biological processes related to DNA methylation (Supplemental Fig. 5).

Characterization of 46,XX DSD subgroups related to PRKY

To identify gene networks associated with 46,XX DSD, their subgroups and clinical traits, we performed WGCNA on gene expression data from 46,XX DSD and 46,XY. Here, co-expressed genes were separated into groups of so-called modules. Each module was assigned a unique color and correlated to karyotype and expression of PRKY and PRKX. We found that the lightcyan module was positively correlated to 46,XX DSD (0.93, p = 1e-09) and negatively to 46,XY (-0.93, p = 1e-09) (Supplemental Fig. 6), meaning that the majority of the genes within this module was overexpressed in 46,XX DSD. This module had a strong negative correlation to expression of PRKY (-0.85, p = 1e-06), but not to expression of PRKX (0.17, p = 0.5). When correlating the aforementioned module to clinical traits, we found an association to height (-0.57, p = 0.007), lean mass (-0.48, p = 0.03) and testicular measurements (width (-0.89, p = 8e-08), length (-0.89, p = 7e-08), height (-0.86, p = 5e-07), volume (-0.85, p = 1e-06)). Grouping 46,XX DSD by PRKY expression (low, 0-170; high, 739–1405 normalized counts), low expression was highly correlated to the lightcyan module as well (0.76, p = 7e-05). In contrast, high PRKY expression was not correlated to this module (0.28, p = 0.2). This could indicate that low PRKY expression in 46,XX DSD is associated with lower testis measures, height and lean mass, compared to 46,XX DSD with higher PRKY expression. Further studies are needed to validate this.

留言 (0)

沒有登入
gif