Immunohistochemistry allows for intuitive visual identification and localization of a target protein at both cellular and subcellular levels. It is routinely used in clinical practice for diagnostic pathology and is an important tool not just in healthcare but also in research. The Human Protein Atlas project has performed high-throughput IHC to map the human proteome in tissues and cells [2]. However, the semi-quantitative nature and narrow range of staining intensity of IHC pose limitations in its usage. We hence investigated if there is a disparity in the immunohistochemical and RNA-seq datasets in the Human Protein Atlas kidney-specific proteome and to compare IHC-reported protein expression levels with kidney proteomics data.
We observed better concordance between the kidney mRNA and tubular IHC data than its glomerular counterpart, which could be due to the RNA-seq samples being 60–80% tubules. Kidney biopsies are mostly composed of cortical tissues, which primarily contain tubular cell populations. Cross-checking the Human Protein Atlas’ IHC data with external mass spectrometry data, we found that some undetected proteins by IHC are expressed and detected in other datasets. On the other hand, the Human Protein Atlas’ RNA-seq data seem to be more accurate as most of the undetected transcripts also have very low transcript expression in a validation dataset. This is expected as the Human Protein Atlas RNA-seq data are mostly consistent with other human transcriptome datasets, such as GTEx and FANTOM5 [12].
Of interest are the mRNA-protein pairs with extremely discordant expression. In a first scenario, where no mRNA is detected but protein expression is high, the discordance could be due to non-specific staining brought about by a promiscuous polyclonal antibody or IHC protocol failure, especially in the blocking step. The antibody could have low binding affinities leading to dissociation during processing steps [13]. The epitope could also be located in a cellular compartment inaccessible to reagents, or tissue artifacts could be present leading to false-positive staining due to leakage of proteins [13]. In the tubular cells of the kidney, proteins are reabsorbed and as such, could bind to antibodies non-specifically. For example, H4C1 is highly expressed at the protein level according to IHC despite practically being undetected transcript-wise (Fig. 2c) and the Human Protein Atlas explains that this could be due to the antibody targeting proteins from more than one gene. The other scenario is when transcript expression is detected but immunohistochemistry does not detect any protein expression. This could be due to a weak or diluted antibody and a failure in the antigen retrieval protocol [13]. It could also be due to a difference in the expected location of the transcript and the protein, which is what is inferred about COL6A1 (Fig. 2a). These are just examples of how both biological and technical reasons could bring about such differences in abundance levels of the transcript and protein.
In this study, we only focused on the glomerular and tubular data and excluded the others, which are a minority. However, this makes mRNA-IHC comparison even trickier due to potential variability in patient tissue samples and sample preparation differences. This is a limitation of both this study and of the Human Protein Atlas: the samples used for IHC and RNA-seq are different, thus preventing us from comparing the same biosamples. Despite being considered histologically normal, the tissue samples are not from healthy individuals, who may have underlying diseases that could affect kidney molecular processes and protein and transcript expression. Additionally, the Human Protein Atlas has a disclaimer on their website about how an antibody may not bind to its target due to differences in protein conformation and target accessibility. Protein denaturation and concentration, as well as sample complexity, are some of the factors that may influence off-target binding, which could lead to false results [2]. The Human Protein Atlas solves this issue by providing a reliability score to its data, and also the reason why we chose to exclude all proteins with “uncertain” reliability in this analysis.
Despite the depth of characterization housed by the Human Protein Atlas, immunohistochemistry comes with limitations. Aside from its reliability being dependent on the antibodies used, it is also only a semi-quantitative measurement, manually interpreted by a pathologist, which can cause interobserver variability. Reproducibility also hinges on laboratory protocols. The most substantial hurdle is the narrow linear range of its staining intensity, resulting in rapid assay saturation [14]. The single order of magnitude range of the antibody label DAB, for example, is poorly suited to assess markers across the full dynamic range of biological protein expression, which is approximately 8 orders of magnitude [4].
Immunohistochemistry provides spatio-temporal expression information at a cellular or subcellular level, although at the price of quantitative measurements [15]. Mass spectrometry methods can measure the dynamic protein expression range and is an important complement to IHC, by also providing isotype-specific information [4, 16]. However, mass spectrometry has low sensitivity and is biased toward more precise detection of abundant proteins [16]. The current development and improvement of high-throughput single-cell spatial proteomics methods, however, is one of the potential solutions to accurate protein quantitation.
We also want to discuss the possible biological reasons for discordant mRNA and protein levels aside from IHC technical limitations (e.g., antibody binding properties, interobserver variability in interpretation, narrow range of staining intensity). Other studies have stated that discordant correlation between mRNA and protein levels can be due to regulatory elements that play diverse roles in translation [4,5,6]. Previous studies have also reported that several mRNA elements affect translation and mRNA stability, such as codon usage, start codon context, and among others, secondary structures [4]. Transcript range is also in four orders of magnitude, compared to eight in proteins, which explains the higher coverage of RNA-seq in comparison to mass spectrometry. In addition, the number of protein molecules produced per mRNA molecule is much higher for abundant transcripts. It is postulated that genes encoding for abundant proteins have higher mRNA levels and also encode regulatory elements that lead to high translation efficiency and protein stability. On the other hand, small proteins are difficult to measure and mass spectrometry sample preparation may also affect protein concentration [4].
The Human Protein Atlas, with more than 10 million high-resolution IHC images in its portal, is a powerful tool for scientists engaging in protein and transcript expression studies [15]. However, immunohistochemistry, while powerful in its ability for in situ protein detection at the single cell level, has its limitations which can greatly affect biological interpretation of scientific results. Our study shows how there is a discordance between the mRNA and protein abundance in the kidney dataset of the Human Protein Atlas. We then validate the Human Protein Atlas IHC data with external mass spectrometry-based proteomics datasets and demonstrate that more than 500 proteins undetected by IHC are measured by mass spectrometry. Thus, we recommend scientists to exercise caution in treating the Human Protein Atlas’ IHC data as the ‘ground truth’ for protein expression as it could have negative repercussions in research result interpretation.
留言 (0)