A standardized nomenclature for mammalian histone genes

As mentioned above, the agreed unified nomenclature for histone protein variants was published by 42 experts from the histone field [12]. The HGNC and MGNC were already aware of problems with the existing gene nomenclature, so both committees agreed that they would work towards revising histone gene symbols—with the input of histone researchers—to be as close to the unified protein nomenclature as possible, whilst fulfilling the requirements of standardized mammalian gene nomenclature. For example, one feature of the histone protein nomenclature is the use of periods as separators in symbols to indicate protein variants and/or proteins that represent phylogenetic branch points. Approved gene symbols cannot include periods, primarily as this could cause problems in data processing. It was agreed that where periods exist between a letter and number in protein variant symbols, these are left out of the new gene symbols entirely, e.g., H2AZ1 is the symbol for the gene encoding the H2A.Z.1 variant. Where a separator is needed between two consecutive numbers, a hyphen is used for HGNC and VGNC gene symbols, and the letter ‘f’ in mouse gene symbols, e.g., genes encoding the H1.0 variant are H1-0 in the HGNC and VGNC databases and H1f0 for mouse. Hyphens are avoided in mouse gene symbols because punctuation is reserved for specific usage in mouse allele nomenclature.

The new gene nomenclature uses symbols that begin with the letter H followed by a numeral, or numeral and letter, to indicate which major histone type they encode, e.g., “H2BC3” encodes an H2B type histone. Tissue of expression as a characteristic is not used in the revised histone gene names, due to variability of reporting, discovery of expression in other tissues in new datasets, and possible lack of conservation across species. Efforts have been made to create a nomenclature that makes sense across vertebrates where possible—the revised symbols no longer refer to individual clusters, meaning that the naming scheme can be extended into non-mammalian vertebrates, e.g., human HIST1H2AA, “histone cluster 1 H2A family member a” has been renamed as H2AC1, “H2A clustered histone 1”. Replication-independent genes are named as closely to the agreed upon symbols for the protein variants as possible.

Resolving replication-dependent histone gene nomenclature across mammalian species

Histone protein nomenclature does not encompass the complexity of histone genes where many paralogs encode identical, or very similar, proteins. The distinct H2A and H2B isoforms encoded by genes within the replication-dependent clusters contain many small variations which have not been characterized as functionally significant and are not well conserved across species. There are two H3 isoforms on the largest two replication-dependent largest clusters, H3.1 and H3.2; all human H3 genes on cluster 1 encode H3.1, while some of the orthologous mouse genes on cluster 1 encode H3.2. For this reason, it is not possible to approve one gene symbol per replication-dependent protein isoform. However, due to the remarkable conservation of gene order within the mammalian replication-dependent clusters, it is possible to identify one-to-one orthologs for most of the genes and name orthologs with equivalent gene symbols (see Fig. 1).

Fig. 1figure 1

The three replication-dependent histone gene clusters in mammals. Gene symbols are shown across the top; species and chromosomal location of each cluster is indicated at the side. Black = non-histone genes, pink = histone H1 genes, yellow = H2A genes, red = H2B genes, blue = H3 genes, green = H4 genes. Pseudogenes are indicated by a gray box around the gene (pseudogenes have the same symbol as their protein-coding orthologs but end with -ps for mouse and P for the other species). A paler shade indicates that the gene is present but currently unannotated and unnamed; a blank space indicates that the gene is missing entirely. Mouse H1 genes contain an ‘f’ in place of the hyphen, so each mouse H1f symbol is shown above each relevant mouse gene. A: The largest replication-dependent cluster, also known as HIST1. There are two large gaps in the cluster in all species. Conservation between species is remarkable, although there are some species-specific duplications, gene losses and in situ pseudogenizations. Mouse has an expansion at the end of the cluster—these genes are shown with the mouse gene symbol format; note that all mouse symbols follow this format but for simplicity only the uppercase format used for other mammalian genes is shown for the conserved genes. B. The second largest replication-dependent cluster, also known as HIST2; each species has at least 10 genes in this cluster that contains genes for the 4 core histones but contains no histone H1 genes. The cluster contains a large inverted repeat, indicated by brackets. C. The third mammalian replication-dependent cluster, also known as HIST3. Note that H3-4 has an exceptional symbol due to the common usage of the H3.4 symbol for the protein encoded by this gene, the systematic H3C16 alias is shown in parentheses

The previous replication-dependent histone gene nomenclature was curated for human and mouse only. It is impossible for automated naming systems to resolve orthology well enough to assign gene symbols to most replication-dependent histone genes in other mammalian species. Therefore, VGNC curators have manually named histone genes in the species chimpanzee, rhesus macaque, dog, cat, pig, horse and cattle to be the same as their identified human orthologs, using a combination of conserved gene order and sequence similarity (Fig. 1 and Additional File 1). Note that numbering of genes in the clusters is not intended to reflect gene order, so that symbols for any additional paralogs identified in new species may be added, e.g., dog, cattle and horse all have the genes H4C19 and H2BC25 which are not present in human or mouse and therefore take higher numbers, so that the nearest 5’ H2B gene to H2BC25 is H2BC8 and the nearest 5’ H4 gene to H4C19 is H4C5 (Fig. 1A).

Comparing the histone clusters across mammalian species has resulted in the resolution of several former gene symbol differences between human and mouse orthologs. In mouse, the genes now named as H2ac18 and H2ac19 were previously named as Hist2h2aa1 and Hist2h2aa2, while the human orthologs that are now named consistently as H2AC18 and H2AC19 were previously named with different symbols—HIST2H2AA3 and HIST2H2AA4. The inverted repeat within histone cluster 2 was missing from the initial human reference genome, meaning that only one copy of each human gene in this repeat was included in the initial round of naming. The primate inverted repeats include two pseudogenes (H2BC19P, previously HIST2H2BD and H2BC20P, previously HIST2H2BC) that are not present in mouse. Aligning these genes with the additional species shows that in cat, pig, horse and cattle these genes encode an H2B protein and the mRNA ends in a stem loop (Fig. 1B), and therefore, these are functional genes and are named as H2BC19 and H2BC20 in cat, pig, horse and cattle (the ‘P’ at the end of the human symbols indicates the locus is a pseudogene). In all species, these H2BC genes are flanked by H2AC18 and H2AC19 (note that H2AC18 appears to be pseudogenized in cat and horse) in a conserved gene order and gene orientation. Therefore, it is clear to see that H2BC19 and H2BC20 orthologs have been lost in mouse but the flanking H2ac18 and H2ac19 genes remain and can be named in concordance with their orthologs in other mammals (Fig. 1A).

Pseudogenization of genes in situ is a feature of replication-dependent histone clusters; studying cluster organization across other mammals has enabled us to rename more mouse and human pseudogenes to be in line with their protein-coding orthologs. Human H2AC10P was not previously named as orthologous to mouse—its previous symbol was HIST1H2APS4, while the previous symbol for the coding mouse ortholog (now H2ac10) was Hist1h2af. All other mammalian species studied here appear to have a coding copy of H2AC10 (Fig. 1A), making it clear that the human gene at this conserved position is a pseudogenized ortholog and can be named accordingly as H2AC10P. As for H2BC19P and H2BC20P mentioned above, there are several other human pseudogenes that have no equivalent mouse ortholog, which have now been named relative to protein-coding orthologs in other species. For example, H1-12 is predicted to be coding in dog, horse and cattle (although the predicted H1.12 protein has not been studied) allowing the naming of the human pseudogenized ortholog as H1-12P (previously HIST1H1PS1); pig also carries a pseudogenized copy which is again named as H1-12P (Fig. 1A). H2BC2 and H2AC2 are predicted to be coding in dog, cat and pig, resulting in the renaming of human HIST1H2BPS1 and HIST1H2APS1 as H2BC2P and H2AC2P (Fig. 1A).

Full description of the revised gene nomenclature by histone typeH1 histone genes

The revised H1 histone gene nomenclature, along with previous symbols and protein variant symbols, is shown in Table 1 for human and Table 2 for mouse. All symbols begin with the root ‘H1’. The H1 genes were relatively simple to fit with the Strasbourg protein nomenclature, where each H1 variant is distinguished by a separate number, because each H1 gene encodes a different histone variant. It has, therefore, been possible to approve gene symbols that are equivalent to the H1 protein symbols, shown in Talbert et al. [12]. As mentioned above, HGNC and VGNC gene symbols include a hyphen in place of a period where two numbers need to be separated. The H1 nomenclature distinguishes replication-dependent from replication-independent genes in the gene names by the presence of the words “cluster member” but “C” for cluster is not used in the gene symbols to preserve parallel H1 protein and gene symbols. The terminology “cluster member” was chosen to feature at the end of replication-dependent histone H1 gene names rather than “clustered”, which is used elsewhere, so that the gene names of all H1 genes can contain the type of H1 followed by the term ‘linker histone’, e.g., “H1.1 linker histone, cluster member”. It is now clear to the non-expert that H1-0 and H1-1 both encode histone H1 genes. Mouse gene symbols include an ‘f’ instead of a hyphen (Table 2) because hyphens are reserved for mouse allele nomenclature where this punctuation has a specific role. For example, in the allele Tg(tetO-H1f0)1Hzo, the hyphen separates the promoter from the expressed gene. Although slightly different, the mouse and human symbols are clearly equivalent, e.g., H1f1 and H1-1.

Table 1 Revised gene nomenclature for human histone H1 genesTable 2 Revised gene nomenclature for mouse histone H1 genesH2A histone genesReplication-dependent H2A genes

The complete set of revised H2A histone genes is shown in Table 3 for human and Table 4 for mouse. Two general classes of H2A proteins were first identified by Fred Zweidler using triton-acid urea gel electrophoresis [19] based on a characteristic change at position 51 of the protein sequence where there is a leucine in H2A.1 and a methionine in H2A.2. H2A.1 and H2A.2 are encoded by multiple replication-dependent H2A histone genes and these variant designations were recommended in the unified histone nomenclature [12]. For the most part, H2A genes on the largest replication-dependent cluster, known as HIST1 (Fig. 1A), encode H2A.1 proteins, while those on the second largest cluster, known as HIST2 (Fig. 1B), encode H2A.2 proteins. The H2A protein on the smaller cluster known as HIST3 (Fig. 1C) encodes an H2A.1 protein with more amino acid changes elsewhere in the protein compared to other H2A.1-encoding genes. Human H2AC21 on cluster 2 encodes an H2A.1 protein, while the mouse ortholog, H2ac21, encodes an H2A.2 protein. Therefore, the revised gene nomenclature does not distinguish between H2A.1 or H2A.2 to allow consistent naming of orthologs across vertebrate species. There are alternative histone protein naming systems that do not distinguish between H2A.1 and H2A.2 but refer to replication-dependent H2A as ‘canonical H2A’ [20]. During discussions with the wider histone community, advice was given to avoid use of the term ‘canonical’ as this term can be interpreted in different ways by different researchers. It was during this feedback process that the suggestion was made to use ‘clustered’ to refer to H2A, H2B, H3 and H4 replication-dependent genes. The H2A histone genes on replication-dependent clusters have, therefore, been named with the root symbol ‘H2AC#’ (H2ac# in mouse) for ‘H2A clustered histone’. Although there are multiple proteins encoded by the mammalian replication-dependent H2A genes, with small differences primarily at the C terminus [15], these variations are not conserved between mouse and human orthologs, suggesting they are not functional, and are therefore not reflected at the level of gene nomenclature. The H2AC1 and H2BC1 genes encode the H2A and H2B proteins with the largest number of amino acid changes; they also were initially reported as “sperm specific”, which likely accounts for the variability from the other genes.

Table 3 Revised gene nomenclature for human histone H2A genesTable 4 Revised gene nomenclature for mouse histone H2A genesReplication-independent H2A genes

For replication-independent H2A histones the protein nomenclature has been followed as closely as possible. For example, the human genes encoding the H2A.Z variant that is present in all eukaryotes [21] have been approved as H2AZ1 and H2AZ2 (full gene names “H2A.Z histone 1” and “H2A.Z histone 2”). The macroH2A variant was so named because it is almost three times as large as replication-dependent H2A histones [22]. This variant name is included in the Strasbourg nomenclature and accepted by the histone community. Therefore, an exception has been made and the corresponding gene symbols MACROH2A1 and MACROH2A2 (MacroH2a1 and MacroH2a2 for mouse) approved, even though this means that the symbols do not begin with the root symbol ‘H2A’.

The Strasbourg nomenclature aims to “use letter suffixes for monophyletic clades” [12]. However, the experts that devised this nomenclature recognized that in some cases historical usage and community support for such usage should be taken into consideration. Therefore, they recommended that the histone community continue to use the H2A.X designation even though this histone variant does not appear to be from a separate clade to the replication-dependent H2A histones that have been assigned the root symbol H2AC. This recommendation has been followed and the gene named as H2AX; this gene is not positioned within a replication-dependent cluster and is interesting as it encodes two distinct mRNAs, one ending in a stem loop and the other one polyadenylated [8]. The stem-loop form of the mRNA is expressed in S-phase of the cell cycle and the polyadenylated form expressed outside of S-phase. The H2AX gene does not bind NPAT, distinguishing it from the genes in the clusters.

The same principle has been followed for the gene named as H2AJ, which also does not belong to a separate clade to the replication-dependent H2A histone genes; the encoded variant has been published as H2A.J [23] and has been referred to as replication independent [9]. For the species reported here, the gene produces only polyadenylated mRNA, and does not contain a stem loop; hence, this has been assigned the separate variant-type symbol H2AJ. Note the H2AJ gene is adjacent to the H4C16 gene.

Short H2A replication-independent variants

Short histone H2A variants lack a C-terminal region compared to replication-dependent histones and all appear to be expressed primarily in the testis, although H2A.B is also expressed in brain [24]. These variants all derive from a single gene on the X chromosome of a common ancestor and have since diverged into four distinct clades known as H2A.P, H2A.Q, H2A.B, and H2A.L [25]. The H2A.P-encoding gene had the previously approved symbol HYPM for “huntingtin interacting protein M” and has now been renamed as H2AP for “H2A.P histone” (and from Hypm to H2ap in mouse). H2A.Q is the most recently discovered short H2A variant [25] and a functional protein has been predicted for many non-Euarchontoglires mammals. However, the only VGNC species with a supporting protein-coding gene annotation is dog; this gene has been named as H2AQ1, for “H2A.Q variant histone 1” (Additional File 1). In human the locus is pseudogenized at a conserved position on the X chromosome and has been named H2AQ1P for “H2A.Q variant histone 1, pseudogene”. Although the presence of an orthologous mouse pseudogene is suggested in [25], there is currently no annotated mouse gene.

Human and most other mammals have three paralogs that encode H2A.B histones. In humans, these duplicated paralogs neighbor coagulation factor VIII genes and are numbered consistently with these genes—H2AB1 is next to F8A1; H2AB2 is next to F8A2; H2AB3 is next to F8A3. All three H2AB genes are highly similar in sequence and encode a protein that is identical in the case of H2AB2 and H2AB3, with only one amino acid difference in the protein encoded by H2AB1. In the literature these two proteins have sometimes been referred to as the variants H2A.B.1 (encoded by H2AB2 and H2AB3) and H2A.B.2 (encoded by H2AB1) [25], although many papers do not make this distinction and refer to variant H2A.B only [26,27,28]. Mouse has three paralogs named H2ab1, H2ab2 and H2ab3; the mouse-encoded H2A.B protein has been referred to as H2A.B.3 [29].

Mouse has an expansion of H2A.L-encoding genes, with fourteen H2al1 protein-coding genes (named H2al1a through to H2al1o), three H2al2 genes (H2al2a, H2al2b and H2al2c) which are the only H2al family members to be found outside of the X chromosome, and one H2al3 gene. Although no H2A.L protein has been detected in human so far [25], human has an ortholog of mouse H2al3 with an intact open reading frame which has therefore been named H2AL3. This gene is conserved in rhesus macaque, cattle, pig, horse and dog (Additional File 1). There is also a human H2AL1 family member, H2AL1Q, which has an intact open reading frame, so has the potential to encode a protein. There is a mouse pseudogene, H2al1q-ps, at a syntenic location and this locus is predicted to be coding in dog, cat and cattle (Additional File 1). Finally, there is a human gene at a conserved genomic position to mouse H2al1m, but this is a pseudogene and has therefore been named H2AL1MP.

Histone H2B genes

Revised human H2B gene nomenclature is shown in Table 5; revised mouse H2B gene nomenclature is shown in Table 6. In accordance with the H2A genes on replication-dependent clusters described above, H2B genes on these clusters have been named with the root symbol H2BC# for ‘H2B clustered histone’ (H2bc# in mouse). Attempts have been made to follow the Strasbourg nomenclature as closely as possible for the H2B replication-independent genes.

Table 5 Revised gene nomenclature for human histone H2B genesTable 6 Revised gene nomenclature for mouse histone H2B genesH2B.W-encoding histone genes

The H2B.W variant symbol was proposed in the Strasbourg nomenclature [12] for the variant that had been previously known as H2BFWT [30, 31] and TH2B-175 [31]. Human has two H2B.W-encoding paralogs which are now named as H2BW1 and H2BW2 and two pseudogenes (H2BW3P and H2BW4P) all located on the X chromosome between RAB9B and SLC25A3, while mouse has only one H2B.W-encoding gene (H2bw2) found in a syntenic location. Other mammals have between 1 and 4 H2BW paralogs but these are all located at the same conserved location of the X chromosome (Additional File 1).

H2B.L-encoding histone genes

Another mammalian H2B variant was first published as SubH2Bv [32] based on its location in the subacrosomal component of cattle spermatozoa. The homologous mouse variant was published as H2BL1 (originally to denote H2B-like 1) [33]. For the macroH2A variant mentioned above, an exception was made, and the MACROH2A# gene symbols were approved due to the overwhelming usage of macroH2A in the scientific literature. In contrast, the SubH2Bv/H2BL variant has not been well published. Following discussions between the HGNC and groups that have published on this variant, it was agreed to use H2BL# for the genes encoding this variant so that the root symbol H2B# is preserved. Therefore, the cattle gene is now named H2BL1 for “H2B.L histone” (Additional File 1), and the mouse ortholog has the equivalent symbol H2bl1. Human has a pseudogenized version of this gene, which is therefore named as H2BL1P, “H2B.L histone variant 1, pseudogene”.

H2B.K-encoding histone genes

The H2BK1 gene was first discovered via gene annotation [34] and was independently identified in a recent study on H2B variants [35]. There is no mouse ortholog of this gene, but there are one-to-one orthologs in many other mammals (including all curated VGNC species, see Additional File 1), birds and fish. In human there are transcripts overlapping H2BK1 and the upstream gene ABCF2, which, combined with the lack of mouse ortholog, had previously meant this histone gene was not annotated. According to Hidden Markov Model classification using the ‘Analyze sequence’ tool at the HistoneDB 2.0 database [36], the encoded protein does not match a characterized histone variant (67% identity with the most similar protein encoded by the other H2B genes). Therefore, at the initial time of naming, it was decided to name this gene as encoding a new histone variant. This gene was originally approved as H2BE1 for “H2B.E variant histone 1” but as H2B.E has been used in the literature several times for an isoform of mouse H2bc21, the nomenclature has been updated to avoid possible confusion. The variant identifier H2B.K was agreed with the authors of [35] ahead of their publication and the gene has been updated with the corresponding gene symbol H2BK1 and name “H2B.K variant histone 1”.

H2B.N-encoding histone genes

The H2BN1 gene encodes the most recently described H2B variant, H2B.N [35]. Like the H2BK1 gene, the human H2BN1 gene has an exon overlapping a separate gene, in this case the long non-coding RNA gene MYO1D-DT, and has no protein-coding ortholog in mouse. Additionally, as noted in [35], the H2BN1 and H2BK1 genes are both composed of two exons with the same part of the coding sequence encoding the histone fold domain split by the intron in both genes, but phylogenetic analysis in [35] does not support a common origin for the two variants. The H2BK1 gene is present in mammals, fish, birds and reptiles while H2BN1 is only found in mammals. It should be noted that neither the protein variant H2B.K nor H2B.N has been experimentally determined.

H2BC12L histone gene

There is a human-specific duplication of the H2BC12 gene from the chromosome 6 replication-dependent cluster gene on chromosome 21. CAGE tag data [37] supports expression of this gene and as there are no frameshifts or deletions within the open reading frame, it is annotated as coding. Although the gene appears to be expressed there is no direct evidence that a protein is produced. The encoded protein does not represent a new histone variant—it only has one nonsynonymous amino acid difference from the H2B protein encoded by the parent gene H2BC12, and is classified as a “canonical” histone when analyzing the sequence via the Histone DB2.0 database. Therefore, this gene has been named as H2BC12L for “H2B clustered histone 12 like”.

Histone H3 genesReplication-dependent H3 genes

Nomenclature for histone H3 genes for human is shown in Table 7 and for mouse in Table 8. Histone H3 genes on major replication-dependent clusters are named with the root symbol ‘H3C#’ for ‘H3 clustered histone’ (H3c# in mouse). The Strasbourg nomenclature refers to histone H3.1 and H3.2 for histone proteins encoded on the larger replication-coupled clusters. However, as for H2A.1 and H2A.2 above, it has not been possible to reflect this in the gene nomenclature as it would not allow for consistent naming across orthologs. The H3.1 and H3.2 proteins are identical except that H3.1 has a cysteine at position 96 while H3.2 has a serine at this position. Again, there are examples where an ortholog in one species may encode an H3.1 protein and an H3.2 protein in another, e.g., human H3C2 encodes an H3.1 protein while mouse H3c2 encodes an H3.2 protein. Therefore, the H3.1 vs H3.2 distinction is not reflected in the gene nomenclature.

Table 7 Revised gene nomenclature for human histone H3 genesTable 8 Revised gene nomenclature for mouse histone H3 genesReplication-independent H3 genes

The H3.3 replication-independent variant is found across metazoa [38] while the H3.4 variant is found only in mammals [39]. The H3.3 variant is encoded by two mammalian genes which have been named as H3-3A and H3-3B for “H3.3 histone A” and “H3.3 histone B” (H3f3a and H3f3b in mouse). The H3.4 variant is encoded by a gene previously named HIST3H3 in human and the uninformative gene symbol Gm12260 in mouse. This variant is also commonly referred to as H3.1t [39] or H3t [40] because it was originally thought to be testis specific, but it has since been shown to be expressed at lower levels in other tissues [41]. The Strasbourg nomenclature recommendation was to refer to this as histone H3.4 which supports the symbol first published for this variant [42]. The H3.4 protein has a conserved valine residue at position 25 which has been reported to affect binding of the N-terminal tail by the Tudor domain of PHF1 and PHF19 [41, 43]. Due to the referral in the literature of this as an H3 variant, and the ‘H3.4’ recommendation by the Strasbourg nomenclature, we have named this gene H3-4 in human and other VGNC species (H3f4 in mouse). The gene encodes mRNA with a stem-loop structure and is adjacent to genes named with the H2AC# and H2BC# root symbols (H2AC25 and H2BC26). To reflect its position on a replication-dependent cluster, we have given this gene the full name “H3.4 histone, cluster member” and have added the gene symbol alias “H3C16” (see Fig. 1C).

Following discussions with the histone community, the symbol CENPA has been retained for the H3-like histone encoding gene that is found at the nucleosome core of centromeric chromatin [44], but the symbol alias “cenH3” has been included for this gene.

Primate-specific predicted H3 variants

This section describes symbols and names for a number of primate histone H3 genes. Note that it is not trivial to decide whether histone duplications limited to individual species, or even orders, are protein coding or should be represented as pseudogenes. Other mammals may also have additional predicted protein-coding histone genes that have not yet been named because these species are not currently supported by manual annotation projects.

H3.5-encoding histone gene

The H3.5 variant is a hominid-specific testis expressed gene that is likely a duplication of the H3-3B gene via retrotransposition [45]. While the H3-3A and H3-3B genes encode the same H3.3 protein, the protein predicted from the duplication is distinct. For this reason, common usage in the scientific literature and the variant identifier mentioned in the Strasbourg nomenclature have been followed and the gene named H3-5 for “H3.5 histone”.

H3.7-encoding histone gene

The H3.7 variant identified in [46] is encoded by a duplication of the H3C13 gene, located roughly 6 MB upstream of the “cluster 2” replication-dependent histone genes. The predicted protein is most like an H3.2 variant, but H3.2 variants are characterized by a serine at residue 96 while the H3.7 variant has an arginine at residue 96, which is not seen in any other H3 histone proteins. The nomenclature published in [46] has been followed and the gene assigned as H3-7 with the gene name “H3.7 histone (putative)”. The term “putative” will be removed if there is future experimental evidence that this variant exists. Taguchi et al. [46] also identified two further putative H3 variants which they called H3.6 and H3.8. However, there are insufficient expression data to support annotation of these genes as protein coding, so in the absence of further data, the encoding genes are annotated and named as H3 pseudogenes: H3P16 (H3 histone pseudogene 16) and H3P44 (H3 histone pseudogene 44). These pseudogenes have been given the aliases H3.6 and H3.8.

H3.Y-encoding histone genes

The H3.Y variant is encoded by two genes in human, which were initially referred to as H3.X and H3.Y [47]. As H3.Y forms a clear primate-specific clade incorporating both genes, and the protein referred to as ‘H3.X’ is only predicted from mRNA sequence, the recommendations of the Strasbourg nomenclature have been followed and the two human genes assigned as H3Y1 and H3Y2 for “H3.Y histone 1” and “H3.Y histone 2”. While human and chimpanzee have two paralogs, rhesus macaque appears to only have H3Y1 (Additional File 1). However, in chimp a symbol has only been approved for the H3Y2 ortholog (Additional File 1) because “H3Y1” is currently on an unplaced scaffold; updates to the chimpanzee genome may result in the putative chimp “H3Y1” being assigned.

Histone H4 genes

The genes encoding histone H4 proteins are mostly found within replication-dependent clusters and are, thus, named with the root symbol H4C# for “H4 clustered histone” for human (Table 9) and H4c# for mouse (Table

留言 (0)

沒有登入
gif