Whole genome sequencing of mouse lines divergently selected for fatness (FLI) and leanness (FHI) revealed several genetic variants as candidates for novel obesity genes

In the present study, we analysed genomes of Fat and Lean mouse lines, divergently selected for body fat percentage, focusing on line-specific SNPs. Firstly, their genomic distribution was analysed, followed by the identification of genes with the highest number and density of SNPs, location of constrained elements (GERP score), and analysis of variant biotypes and consequences. Then, we focused on regulatory variants and variants with predicted moderate or high impacts and GERP score above 6, resulting in 19 SNPs within 20 genes. For these genes, their previous annotations related to obesity were obtained from a public database, and their positions within previously identified obesity QTLs in our mouse models and their expressions in different tissues were examined. Additionally, the potential importance of identified SNP was validated by comparing the presence of alleles in obesity-resistant A/J and obesity-prone NZO/HlLtJ mice lines. The workflow and main results are shown in Fig. 1.

Fig. 1figure 1

Workflow and main results from whole-genome sequencing of the Fat and Lean mice lines

Sequencing genomes of male Lean (FHI line) and Fat (FLI line) mice revealed 4,651,068 and 4,320,310 SNPs in each line respectively among which the majority (2,661,583) are shared between both lines (Fig. 2). Out of 6,309,795 SNPs in total, we identified 1,303,138 SNPs that have not been previously reported (without rs ID). Of those variants, 488,784 are shared by both lines, while 439,192 and 375,162 SNPs are private to the Lean and Fat line, respectively (Fig. 2a, c). Moreover, 1,014,395 and 928,287 insertions and deletions (indels) were identified in the Lean and Fat line, respectively, and 469,647 indels occurred in both (Fig. 2b).

Fig. 2figure 2

Sequence variants (SNPs and indels) identified in the Lean and Fat mouse selection lines. (a) Numbers of private and shared SNPs from the Fat and Lean mouse lines and the Ensembl database. The shape sizes are not proportional to the number of SNPs. (b) In scale comparison between SNPs deposited in the Ensembl and identified in the Lean and Fat lines, (c) number of indels identified in the Lean and Fat lines

Three selected SNPs were validated using Sanger sequencing and are summarized along with SNPs from Mikec et al. (2022) and Šimon et al. (2024) on Supplementary Fig. S1-3. Among these SNPs, it is worth mentioning a SNP rs37739792 within the intron of Hif1a and also overlapping protein-coding gene Gm15283 (Supplementary Fig. S2).

Out of 1,303,138 novel SNPs, 439,192 and 375,162 were private either for the Lean or the Fat line, respectively, while 488,784 were present in both lines. In total, 180 novel variants with the predicted high-impact and 779 deleterious (including low confidence) missense variants were identified (Supplementary Table S2).

From 1,303,138 novel SNPs, 552,539 (Fat: 164,055, Lean: 178,498, Both: 209,986) are located within 20,600 genes involved in biological processes such as localization, response to stimuli, and signalling/signal transduction (Supplementary Fig. S4a). Meanwhile, the predicted 180 high-impact and 779 deleterious missense variants (DMVs) are located within 676 genes primarily involved in immune response (Supplementary Fig. 4b).

Distribution of line-specific SNPs

In the Lean line, most line-specific SNPs were located on chromosome 2 (184,531), followed by chromosomes 7 (162,635) and 15 (154,786). In contrast, in the Fat line, chromosome 6 (154,366) has the most SNPs, followed by chromosomes 1 (140,760) and 13 (136,462) (Supplementary Fig. S5a, b). A total of 36 line-specific and novel SNPs (Fat: 15, Lean: 21) were also identified on chromosome Y, of which one and nine were in the introns of genes Mid1-ps1 and Gm47283 in the Lean line, respectively (Supplementary Fig. S5c).

Genes with the most line-specific SNPs

Sixty-four genes (Lean: 25, Fat: 23) contain at least 3000 SNPs, all within protein coding genes. Two genes with more than 3000 line-specific SNPs were identified in both lines, Macrod2 and Tenm2. In the Lean line genes Csmd3, Erbb4, and Inpp4b had the most SNPs. Meanwhile, in the Fat line, these were Skint5, Exoc4, and Galnt2l. Other genes in the Fat line include Adcy2, Cpq, Ctnna3, Dcc, Fgf14, Gm37013, Hdac9, Lhfpl3, Mast4, Mctp1, Ndst4, Skint6, Slc14a2, Slc9a9, Smyd3, Sntg1, Sox5, and Trpm3, and in the Lean line 5730522E02Rik, Atrnl1, Cdkal1, Diaph3, Edil3, Fstl5, Gabrb3, Gmds, Hcn1, Hs3st4, Immp2l, Oca2, Pak5, Pcsk5, Prkn, Prr16, Slc7a11, Tenm4, and Zfp536 (Supplementary Table S3).

Considering the relative number of SNPs per gene (SNPs/bp), 30 genes (Lean: 12, Fat: 18) had at least four SNPs per 100 bp. Compared to the genes with the highest absolute number of SNPs, the SNP density was highest in pseudogenes and various types of regulatory RNAs. In more detail: gene segment (Ighd2-6), miRNAs (Gm23063, Mir3086, Mir7237), miscellaneous RNA (Gm25403), pseudogenes (AC152418.1, Ear-ps10, Gm15115, Gm34658, Gm37489, Gm43943, Gm47555, Gm4873, Gm49055, Gm9002, Olfr1139-ps1, S100a11-ps), pseudogenic gene segments (Gm43220, Igkv1-136), RNase P RNA gene (Rprl1), rRNAs (Gm23668, n-R5s3), snoRNA gene (Gm24127), and snRNA genes (Gm27385, Gm26449, Gm25785, Gm24725, Gm24582, Gm23511, Gm22828) (Supplementary Table S4).

For two miRNAs, Mir7237 and Mir3086, we explored whether SNPs fall into their seed region, altering their target sequence recognition. Whereas no SNP was within the for Mir7237 seed region, the SNP rs248726381 (T/C) in the Lean line fells into the seed region of mmu-miR-3086-3p (CCAAUGA◊ CCAAUGG) (Fig. S6a). Several biological processes in the Lean line could be affected due to polymorphism in the seed region of this miRNA caused by rs248726381, including histone and protein acetylation (Fig. S6b). Further enrichment analysis of mmu-miR-3086-3p target genes revealed that, in addition to amino acid metabolism and the nervous system, the target genes also participate in fatty acid biosynthesis (Supplementary Fig. S6c).

GERP scores of line-specific SNPs

GERP score of SNPs was then retrieved to reveal potentially functional SNPs. Most SNPs have GERP score between − 1 and 1, however, 276,397 SNPs have GERP score above 2, 1,369 SNPs above 4, and 125 SNPs above 6 (Supplementary Fig. 7). The latter group include 13 SNPs in regulatory elements within or close to 14 genes, and 6 SNPs with predicted moderate or high impact on 6 genes.

Variant biotype of line-specific SNPs

The most represented line-specific variant biotype was protein coding, followed by lncRNA and intergenic variants. Interestingly, the largest relative difference in the number of SNPs between the lines was IG gene biotype (especially IG V gene) with 8908 SNPs identified in the Fat line and only 129 in the Lean line (Fig. 3).

Fig. 3figure 3

Biotypes and share of line-specific variants identified in the Lean and Fat mouse selection lines

Variant consequences of line-specific SNPs

In both lines, most of the SNPs were intronic variants, followed by intergenic, non-coding transcript, downstream, and upstream variants (Supplementary Fig. S8). In total, 887,371 (Lean) and 778,626 (Fat) line-specific SNPs were located within 13,120 and 13,094 genes, representing 45% and 47% of all the SNPs, respectively.

In the Lean line, synonymous and missense variants accounted for 0.32% and 0.15%, respectively, while in the Fat line they accounted for 0.36% and 0.19% of all SNPs. Among all the missense variants, predicted deleterious variants represented 17.4% and 17.7% in the Lean and Fat lines, respectively. In the Lean line, they were mainly located on chromosomes 2, 7, and 9, while in the Fat line they were found on chromosomes 7, 4, and 6 (Supplementary Fig. S9).

We next investigated if predicted DMVs were located within genes related to obesity (abnormal adipose tissue amount - IMPC). In both mice, line-specific DMVs within several obesity-related genes were identified (Fat: 11, Lean: 12). In addition, 24 genes had shared DMVs (Supplementary Table S5). Among the DMVs, nine SNPs within nine genes were newly identified: 12_13433013_T/A in Nbas, 6_128327592_C/T in Tulp3, and 7_77124609_C/T in Agbl1 of the Fat line, 15_48791841_G/C in Csmd3, 18_59409565_G/T in Chsy3, and 6_95117339_C/T in Kbtbd8 of the Lean line, and 11_87874953_A/G in Epx, 16_35824901_G/A in Hspbap1, and 8_84872251_G/A in Syce2 of both lines. Other obesity genes in the Fat line include 2210408I21Rik, Cep250, Fam81b, Il6st, Mamld1, Nbas, Pth1r, Sema4d, and Slco1b2. Meanwhile, in the Lean line these are Alg8, Alpk2, Aspm, D430041D05Rik, D630045J12Rik, Dock9, Gpr15, Phldb1, and Zfp462. Worth mentioning is Csmd3, a gene with the highest number of SNPs in the Lean line (shown in Supplementary Table S3).

Enrichment analysis revealed that in both lines the genes with line-specific DMVs are involved in the following pathways: graft-versus-host disease, type I diabetes mellitus, and allograft rejection. Other pathways in the Fat line included antigen processing and presentation, serotonergic synapse, viral myocarditis, and asthma. In contrast, genes with Lean line-specific DMVs were involved in cell adhesion molecules, autoimmune thyroid disease, and inflammatory bowel disease. Interestingly, in both lines, the genes mostly related to pathways potentially involved in food perception attained the lowest p-value; olfactory transduction in the Lean line and taste transduction in the Fat line (Supplementary Fig. S10).

For the taste and olfactory transduction pathways, we then explored whether the genes with DMVs map to previously identified obesity/leanness QTLs. While Plcb2, a gene involved in the taste transduction, locate within the Fob1 QTL, 26 out of 75 genes of olfactory transduction are within Fob1 QTL: Or4b1d, Or5d47, Or5aq7, Or9g4b, Or5m12, Or5m13b, Or8k1, Or8k20, Or8k24, Or8k28, Or8k32, Or8h9, Or8h10, Or5j1, Or5aq6, Or5w10, Or10ag59, Or8w1, Or5w1b, Or5w13, Or5w15, Or5w17, Or4a2, Or4a69, Or4a74, and Or4f57.

We than analyzed the density of line-specific missense variants in protein-coding transcripts to identify proteins exhibiting significant differences between the lines. We found 38 transcripts (Fat line: 28, Lean lin: 10) that had an average density of missense variants at intervals of fewer than 25 amino acids. These transcripts correspond to 32 genes (Fat line: 21, Lean line: 10, and 1 common to both). Notably, the genes associated with the highest density of missense variants in the Fat line include Cx3cl1, Nlrp1b, Tas2r136, Cd22, and Or5p58. Meanwhile, in the Lean line these are Hbb-bh2, Kcnmb2, Ang5, Or5w17, and Cbr1b, as detailed in Supplementary Table S6. Interestingly, Hamp2 appears in the top five for both lines, albeit represented by different transcripts (ENSMUST00000205641 in the Fat line and ENSMUST00000109753 in the Lean line). Additionally, the genes Skint5 and Macrod2 were identified as having a notably high number of variants, as shown in Supplementary Table S3.

Eighteen of these genes are involved in 35 different KEGG pathways (Supplementary Table S7). However, three genes are involved in the majority of these pathways: Cx3cl1 (immune and inflammatory reactions) and H2-Aa and H2-Ab1 (various disease, immune system, and type I diabetes mellitus). Worth mentioning are eight genes potentially involved in food perception, among which Or5w17 and Or8k28 are within FOB QTL Fob1. Another gene in FOB QTL is Tmsb15b1. In addition, Kcnmb2 and Nlrp1b are involved in insulin secretion and NOD-like receptor signalling pathway, respectively.

Annotation of candidate genes

Genes with regulatory variants and variants with moderate or high impact with the GERP score above 6 were further analysed and their annotation related to obesity retrieved from various sources. In total, 14 genes have 13 regulatory SNPs, and 6 genes have 6 SNPs with predicted moderate or high impact. These 20 genes include three lncRNAs (4930441H08Rik, 4930595O18Rik, Gm36633), one polymorphic pseudogene (Or56b2j), 15 protein coding genes (Aff3, Angpt1, Atpsckmt, Cpped1, Erc2, Gfra1, Fam237b, Mast4, Pced1a, Prr5l, Serpine2, Tecrl, Tmem132d, Trim24, Zfp536), and one pseudogene (Gm17131). Two genes are within FOB QTL (Pced1a and Prr5l). Gfra1, Or56b2j, Serpine2, and Tecrl were the only genes previously annotated with obesity-related traits. In addition to the above-mentioned Gfra1, Prr5l, and Serpine2, differential expression was also measured for Angpt1, Tmem132d, and Trim24.

Table 1 Prioritization of candidate genes. Genes carrying SNPs with GERP score above 6 located in regulatory regions or having a predicted moderate or high impact on protein function

Out of 20 genes only four are annotated in the KEGG database: Angpt1 being involved in various signalling pathways, Prr5l in mTOR signalling pathway, Serpine2 in immune system, and Or56b2j in sensory (olfactory) system (Fig. 4a). The signalling function of ANGPT1 protein might be due to the interaction with TIE2 (TEK receptor tyrosine kinase). ATPSCKMT is a positive regulator v ATP synthase activity, and TECRL catalyses trans-octadec-2-enoyl-CoA to stearyl-CoA (Fig. 4b).

Fig. 4figure 4

Biological function of the genes carrying SNPs with GERP score above 6 located in regulatory regions or having predicted moderate or high impact on protein function identified in the Fat and Lean mouse selection lines. a) KEGG ontology terms, b) biological function of identified genes

Finally, the potential involvement of 19 identified SNPs within 20 genes in obesity was validated by comparing alleles between our mice lines and obesity prone NZO/HlLtJ and obesity resistant A/J strain. The analysis revealed four missense SNPs and 2 regulatory variants within 7 genes (4930441H08Rik, Aff3, Fam237b, Gm36633, Pced1a, Tecrl, Zfp536) (Table 2).

Table 2 Allele comparison between fat lines (our Fat line and NZO/HlLtJ) and lean lines (our Lean line and A/J)

留言 (0)

沒有登入
gif