Using multiplexed functional data to reduce variant classification inequities in underrepresented populations

Cohorts

Genomic data from 245,394 individuals enrolled in the All of Us v7 cohort were analyzed. Findings were validated using two independent datasets from the Genome Aggregation Database (gnomAD), specifically 123,709 exomes from gnomAD v2.1.1 and 51,535 genomes from gnomAD v3.1.2 (excluding individuals from gnomAD v2). To facilitate comparative analyses, individuals were stratified into two major superpopulation groups: European-like and non-European-like genetic ancestries. The final sample sizes were as follows: non-European-like vs. European-like groups: 122,322 vs. 123,072 (All of Us v7); 59,106 vs. 64,603 (gnomAD v2.1.1); and 25,547 vs. 25,988 (gnomAD v3.1.2). These cohorts and stratification were used for both statistical tests as well as variant reclassification. Further information regarding participant enrollment and sample collection and study origination can be found on the All of Us [9, 25,26,27] or gnomAD [28] website, respectively. Variants for each gene are available on the All of Us Public Data Browser [27] version 7 or gnomAD [28]. The clinical variant classifications in this study found in All of Us or gnomAD were originally sourced from ClinVar [29, 30]. Variant calls, allele counts, population descriptors, and variant classifications were used as prescribed by All of Us or gnomAD.

Genetic ancestry

We use descriptors from All of Us and gnomAD for consistency, as we cannot reclassify individuals due to the public, deidentified nature of the databases. Full details are available in the respective website documentation and publications of gnomAD [28] and All of Us [9, 25,26,27]. All databases assign a single genetic ancestry to each individual based on projection to principal components built using reference populations [9, 25, 26]. We have appended “-like” to the labels to explicitly reflect that they primarily capture genetic similarity to reference groups used by the original publications to train their classifiers [31]. We acknowledge their imperfect and incomplete nature as descriptors of continuous human diversity. The non-European-like group encompassed individuals with genetic ancestries from the “African/African American,” “Latino/Admixed American,” “East Asian,” “South Asian,” and “Other” groups as prescribed by the genetic ancestry calculation done by All of Us or gnomAD. In all cases, individuals are assigned to a single genetic ancestry first by projection into a principal component space built from established population genetics resources. Principal component loadings for each individual are then input into a random forest classifier, and the genetic ancestry label is assigned on the basis of the output from the classifier. Given the nature of random forest classifiers, this approach will struggle to assign a label to admixed individuals and to individuals whose genetic ancestry is poorly represented in the reference samples. These individuals, therefore, make up a significant fraction of the “Other” group, which is openly acknowledged by both All of Us and gnomAD.

Gene lists and calculating allele prevalence

Gene lists for medical specialties that commonly use genetic testing were compiled from genes known to be tested on next-generation sequencing tests of Invitae, Ambry Genetics, and Baylor Genetics. The ACMG78 gene list represents the 78 genes from the secondary findings list curated per the American College of Medical Genetics and Genomics Secondary Findings v3.2 standard. The GenCC gene list [32] represents all 4640 curated known clinical disease genes as of June 2023. The “Cancer” gene list represents 209 genes implicated in hereditary cancers and cancer syndromes across every major organ system. The “Cardiac” gene list represents 306 genes implicated in arrhythmias, cardiomyopathies, RASopathies, congenital heart diseases, lipidemias, and aortopathies. The “Hematology” gene list represents 240 genes implicated in benign and malignant blood disorders such as inherited platelet disorders and thrombocytopenias, anemias, enzymopathies, red blood cell membrane disorders, telomere disorders, bone marrow failure, and more. The “Newborn screening” gene list represents 1755 genes implicated in inherited metabolic disorders. The “Carrier Screening” gene list represents 568 genes commonly examined to understand if there is an increased risk of having a child affected with a genetic condition. The “Endocrinology” gene list represents 321 genes implicated in disorders of sex development, obesity, thyroid and parathyroid conditions, bone mineralization disorders, and glucose metabolism. The “Immunology” gene list represents 572 genes implicated in primary immunodeficiency, telomere biology disorders, antibody deficiencies, autoinflammatory syndromes, B and T cell deficiencies, phagocytic defects, hereditary angioedema, complement deficiencies, and congenital diarrhea. The “Nephrology” gene list represents 565 genes implicated in ciliopathies, nephrolithiasis, progressive renal disease, rare clinical syndromes with renal manifestations, atypical hemolytic uremic syndrome, and thrombotic microangiopathies. The “Neurology” gene list represents 1374 genes implicated in neuropathies, movement disorders, neurodegenerative disorders, neurovascular disorders, epilepsy disorders, seizure disorders, neurodevelopmental disorders, and neuromuscular disorders. The “Ophthalmology” gene list represents 514 implicated in blindness are rare disorders affecting vision, the eye, and/or the retina. The DDG2P gene list representing the curated list of 2307 genes reported to be associated with developmental disorders from the DECIPHER project was accessed in June 2023. The “SGE” gene list represents the 694 genes that are both essential in HAP1 cells and found in the GenCC gene list. The “VAMPseq” gene list represents the 394 genes that are both high priority for VAMPseq and found in the GenCC gene list. The high priority VAMPseq genes were selected because their proteins are not secreted extracellularly, thermostable, have previously been shown to be GFP tagged, and are monomeric. The “MAVERegistry” list was determined based on the 110 genes as of August 2023 that are either “Under Investigation” or in the “MAVE Data Collection” phases on the MAVERegistry [33]. When appropriate, the same gene may be found in more than one gene list (for example, BRCA2 would be found in the Oncology, GenCC, ACMG78, SGE, and MAVERegistry lists). Overall, all gene lists and corresponding ENSG terms used in this study are available in Additional file 1: Table S1. Allele prevalence was calculated by summing allele counts for variants of each clinical classification for examined genetic ancestries and dividing this sum by the number of individuals in the genetic ancestry group(s).

Clinical significance classifications for variants

From gnomAD, allele prevalence for the individuals of European-like genetic ancestry was calculated from the “European-like (non-Finnish)” group. Due to the high degrees of consanguinity in the Finnish and Ashkenazi Jewish populations, these two populations were not included in our analysis. Allele counts, frequencies, population descriptors, ClinVar clinical significance calls, and number of individuals sequenced in each population were used as prescribed by All of Us [34] or gnomAD as of June 2023. As only approximately 2% of short variants (variants affecting less than 50 base pairs) are not assigned “one star” review status, we did not filter for review status or any other metric of clinical variant classification quality to prevent accidentally biasing against individual or smaller labs working with underrepresented communities. Further, for each variant in the All of Us v7 where the full set of unique submitted clinical classifications needed to be reconciled to just one clinical variant classification call, we took the most conservative approach per the aggregation of clinical variant germline classification approach used by ClinVar [35]. All clinical significance calls were mapped to one of six categories: “Pathogenic or Likely Pathogenic,” “Benign or Likely Benign,” “Variant of Uncertain Significance,” “Conflicting Interpretations,” “Not Included,” or “No Designation” based on their current ClinVar clinical significance designation as specified in All of Us or gnomAD. Trends were pinpointed if shown to be consistent across all three databases. Due to differences in extraction of ClinVar data between gnomAD and All of Us, there are systematic database level differences that potentially are unaccounted for. In these instances, the GenCC (The Gene Curation Coalition) list of all curated clinical genes being the biggest and most comprehensive list is used as the main indicator of a trend. gnomAD version 2.1.1 and version 3.1.2, non-v2 (removes individuals overlapping between v2 and v3) were treated as two independent population databases [25, 26].

Two orthogonal statistical methods

Two orthogonal statistical methods were used to assess variant classification disparities. First, at the gene-level using a matched pairs, Wilcoxon signed-rank test with Bonferroni correction resulting in a p value, estimate of statistical power, and rank biserial coefficient with 95% confidence interval to quantify the magnitude of the differences using pre-established thresholds [36]. The matched pairs were the same gene’s allele prevalence between ancestry groups. Second, unique variants (not allele counts) for each clinical classification were counted across a gene list that were exclusive to each superpopulation group. If alleles for a unique variant were found in both superpopulation groups that unique variant was excluded from the counts. Then, a chi-square test for independence with Bonferroni correction was conducted. We ensured that the number of individuals of European-like genetic ancestry and non-European-like genetic ancestry was approximately equal in each database to prevent biased statistical analysis due to differences in group sizes. This is important because both orthogonal statistical methods are based on allele counts within genes or groups of genes (the matched pairs nature of the Wilcoxon test compares non-European-like allele prevalence to European-like allele prevalence and the chi-square test on unique variants).

Wilcoxon matched-pairs signed-rank test

We employed a matched pairs signed rank Wilcoxon test, with the matched pairs based on the gene itself and its allele prevalence between individuals of non-European-like versus European-like genetic ancestry. This gene-by-gene comparison mitigates any other confounders such as gene length, coverage during sequencing, and other gene-specific intricacies that are canceled out by comparing the allele prevalence within the non-European-like group to the allele prevalence in the European-like group within each gene. The ranking aspect of the test is crucial, as it does not presuppose a uniform trend of larger allele prevalence in the non-European-like group compared to the European-like group across all genes for every clinical significance allele type. By ranking the genes prior to the statistical test, we incorporate genes that have a higher number of alleles in Europeans into our analysis, ensuring a complete survey of the allele prevalence in all genes in the statistical test. The difference in allele prevalence and difference in unique VUS between the non-European-like group and European-like group was also used to rank the genes with the greatest VUS disparity between non-Europeans vs. Europeans.

While the p value informs us whether or not there is a difference, we then calculated the rank biserial coefficient (r) with a 95% confidence interval to quantify the magnitude of the statistically significant differences. This calculation was performed using Python-wrapped R code, employing the ggwithinstats function from the ggstatsplot library and the effectsize library, with settings based on thresholds outlined by Funder and Ozer [36]. The resultant coefficient categories are based on the magnitude of r < 0.05—tiny; 0.05 ≤ r < 0.1—very small; 0.1 ≤ r < 0.2—small; 0.2 ≤ r < 0.3—medium; 0.3 ≤ r < 0.4—large; and r ≥ 0.4—very large. Additionally, we evaluated the statistical power of each Wilcoxon matched-pairs signed-rank using a simulation-based approach. The simulation iterates 50,000 times to generate matched pair samples under a normal distribution, with the first sample being the control and the second sample being offset by the defined effect size. Each iteration performs the Wilcoxon signed-rank test to assess the significance of the observed effect based on the Bonferroni-corrected alpha. The proportion of 50,000 iterations yielding significant results was the estimate of statistical power, reflecting the test’s ability to correctly reject the null hypothesis for a specified effect size and sample size.

We also assessed the overlap in variants between the non-European-like and European-like groups. This involved calculating the number of variants present in both groups, as well as the number of variants unique to each group, and expressing these as percentage contributions. In contrast to the below orthogonal statistical method, all variants, including those shared between groups, were retained for the Wilcoxon matched-pairs signed-rank test to ensure that any unique variant’s prevalence in both populations was duly considered in assessing potential differences.

Chi-square test for independence

Furthermore, we employed a chi-square test for independence to investigate the presence of unique variants in each population group. In contrast to the above orthogonal statistical method, variants found in both groups were removed for the chi-square test for independence to examine prevalence differences of variants found exclusively in the European-like versus non-European-like genetic ancestry groups with an accompanying power estimate. Instead of the gene-by-gene approach, this approach allowed us to systematically assess three population databases, seeking to determine whether there is a consistent higher count of unique variants (not allele count) across different medical specialties and gene groups. Further, it helps to satisfy the requirement of independence of observations for the chi-square test as there are no relationships between the counts in the individual medical specialty groups and no pairing of the data between the super populations.

It is worth noting that neither the Wilcoxon test nor the chi-square test for independence necessitates an underlying distribution that approximates normality. Visual inspection of variant prevalence in the GenCC data revealed that the distributions of variants best resembles a chi-square distribution. Thus, the chi-square test, based on the chi-square data distribution, is particularly suitable for modeling our data.

For the analysis of ClinVar high confidence variants, 413,016 variants that were short variants (< 50 bp resulting in SNVs and indels) not haplotype entries and had multiple submitters in agreement on the clinical classification (2 stars or higher) were downloaded from ClinVar in September 2024 and annotated using gnomAD allele frequencies. In the same way as above, variants found in both individuals of European-like and non-European-like genetic ancestry were removed to examine the counts of different variants found exclusively in the European-like versus non-European-like genetic ancestry groups.

Bonferroni corrections

To counteract the potential for type I errors due to multiple comparisons, we apply a stringent Bonferroni correction to each statistical test. For testing the difference in allele prevalence of all coding variants of a particular clinical significance type across specialties, there are 14 specialties × 3 databases × 5 clinical significance groups = 210 total tests. For testing the difference in allele prevalence of all coding variants without missense variants of a particular clinical significance type across specialties, there are also 14 specialties × 3 databases × 3 clinical significance groups = 126 total tests. For testing the difference in allele prevalence of variant types for different clinical classifications for the GenCC curated genes list, there are 11 variant types × 3 databases × 5 clinical significance categories = 165 statistical tests. For testing the difference in allele prevalence of coding variants of a particular clinical significance type across population distributions for the GenCC curated genes list, there are 1 specialty × 3 databases × 5 clinical significance groups × 5 pairwise population comparisons = 75 total tests. For testing the difference in allele prevalence of noncoding variants of a particular clinical significance type across specialties, there are 14 specialties × 2 databases × 5 clinical significance groups = 140 total tests. For testing the difference in unique variants of a particular clinical significance found only in one population group via chi-square testing, there are 3 databases × 5 clinical significance groups = 15 total tests. Of particular note, because the three research questions are independent of each other (e.g., no nested hypotheses, no repeated measures, no sequential testing) and the underlying data distributions for each statistical test are very different for the three questions, each group of tests received its own Bonferroni correction.

Variant reclassification and essential code analysis

We developed an automated pipeline to reclassify VUS in BRCA1, TP53, and PTEN found in gnomAD and All of Us. These three genes were selected because all three have clinically calibrated MAVE data and Clinical Genome Resource’s (ClinGen) Variant Curation Expert Panel (VCEP) guidelines [18, 24, 37,38,39,40,41,42,43]. Our pipeline follows the gene-specific criteria of the corresponding VCEP (TP53 v1, BRCA1 v1, PTEN v2) as closely as possible except for the functional data evidence code (PS3/BS3) where MAVE data was used. Initially, each variant was annotated using the 2015 ACMG (American College of Medical Genetics and Genomics) evidence codes through the Intervar API. During this process, we ensured that the correct reference genomes were used for the different databases (All of Us and gnomAD v3.1.2 utilized GRCh38; whereas gnomAD 2.1.1 utilized GRCh37). Following this initial annotation, each variant was further annotated with functional scores from MAVE data. The clinical curation and clinical strength assignment as per the ClinGen recommendations in Brnich et al. [44] for or against pathogenicity or benignity of each of these MAVE datasets utilized in this study were previously published in Fayer et al. [18]. In brief, for BRCA1 variants, if a variant was categorized as FUNC (functional), it was assigned BS3 evidence and no PS3 evidence, whereas if it was categorized as LOF (loss of function), the variant was assigned PS3 evidence and no BS3 evidence. Variants categorized as INT (intermediate) were left unannotated. For the BRCA1 combining criteria, ≥ 1 criteria of strong benign evidence was enough to reclassify the VUS as Likely Benign. For TP53, we used the output of the Naïve Bayes classifier that synthesized data from four different TP53 MAVEs in Fayer et al. If the classifier predicted a variant to be “Functionally abnormal,” the variant was assigned PS3 evidence and no BS3 evidence. If a variant was predicted to be “Functionally normal,” BS3_moderate evidence was used with no PS3 evidence. For PTEN, two assays measuring activity and abundance were used. If the abundance was categorized as “wt-like” or “possibly wt-like,” BS3_Supporting evidence was used. Furthermore, if the cumulative score was less than or equal to − 1.11, BS3_moderate evidence was used. All other evidence codes and combining criteria were adhered to as closely as possible based on the ClinGen gene-specific recommendations for BRCA1, TP53, and PTEN, respectively (Additional file 2: Fig. S42). The ClinGen VCEPs are highly regarded as the gold standard for gene-specific variant curation and are developed after extensive evaluation of the evidence by clinical and scientific experts for the particular gene to classify genomic variants on a spectrum from pathogenic to benign using the 2015 ACMG/AMP Variant Interpretation Guidelines as a backbone [43]. Reclassification of variants from gnomAD or All of Us focused only on variants originally classified as VUS.

We comprehensively reanalyzed the set of BRCA1, PTEN, and TP53 VUS previously reclassified by Fayer et al. [18] (Supplemental Tables 7, 10, 11 in Fayer et al.) to benchmark our automated pipeline. The automated pipeline uses VCEP recommendations as of Fall 2023; however, the Fayer et al. VUS dataset was analyzed by hand with a mix of VCEP and ACMG/AMP 2015 recommendations prior to 2021. Using this dataset, we sought to establish a robust benchmark for the automated variant classification pipeline built for this project to ensure clinical variant classifications ascertained by the automated pipeline were concordant with the Fayer et al. reclassifications where MAVE data was also used for variant classification. We defined a concordant classification as a final clinical classification on the same side of pathogenicity as was found in the Fayer et al. dataset (the groups being Benign or Likely Benign versus Pathogenic or Likely Pathogenic versus remaining a VUS). Further, we used this dataset to follow-up on the essential code analysis with allele frequencies from gnomAD v4. We annotated all possible variants in the Fayer dataset with allele frequencies from gnomAD v4 (not using All of Us v7 nor gnomAD v3 nor gnomAD v2 to prevent accidental double-dipping).

To assess evidence code essentiality, we sequentially removed each code from the final set of codes for a reclassified VUS and observed if removal led to reversion of the reclassified variant back to VUS. To ensure reproducibility, transparency, and increased throughput, all the procedures for annotating variants and assigning evidence codes were codified using Python. All code has been made freely available and is linked in the “Availability of data and materials” Sect [45].

留言 (0)

沒有登入
gif