Differentiated adaptative genetic architecture and language-related demographical history in South China inferred from 619 genomes from 56 populations

Differentiated genetic affinities of GPH and their neighbors

We newly generated genome-wide genotype dataset of 261 individuals, including 236 Sinitic-speaking individuals from 19 populations across Guangxi Province and 25 Sinitic speakers from two populations in Yunnan Province, combined them with previously reported Guangxi populations, and formed a basal studied dataset including 619 Guangxi genomes (Additional file 1: Fig. S1). We merged it with published population data with different single-nucleotide polymorphism (SNP) densities and formed four datasets. We merged it with previously reported Affymetrix genotyping data from ethnolinguistically diverse Chinese populations (Mongolian, Han, Tujia, Miao, Sui, Jing, Zhuang, and Li) [26, 39,40,41,42,43] and formed the high-density SNP panel dataset (465,941 SNPs). Populations from the Human Genome Diversity Project (HGDP) and Oceania genomic resources were included in our high-density merged HGDP-O dataset [12, 44]. The Human Origin (HO) dataset and the 1240 K dataset from Allen Ancient DNA Resource (AADR) [45] were merged to form the low-density merged HO and middle-density merged 1240 K datasets, to explore the relationship with different ancient genomes especially ancient individuals in Guangxi [46]. To characterize the genetic profile of GPH, we first performed principal component analysis (PCA) to explore the genetic affinities between GPH and other references in modern and ancient Eastern Asian contexts (Fig. 1a). The genetic difference generated from the first component (PC1: 1.25%) has differentiated HM and AN speakers in South China from TB and Tungusic/Mongolic people in North China. The second component (PC2: 0.48%) has distinguished HM and TB speakers from AN and Tungusic/Mongolic people. PCA based on the merged HO dataset demonstrated that GPH was adjacent to TB at one end and AN at the other, overlapping with TK groups and part of HM-, Sinitic-, and TB-related clines. We further projected ancient samples into the background of modern East Asians to explore the genetic affinity between target populations and ancient East Asians. Our studied populations showed a close genetic similarity with ancient Guangxi individuals and were located between ancient Northern millet farmers from the Yellow River Basin (YRB) and ancient Southern East Asian (ASEA)-related populations. The observed patterns suggested that GPH’s ancestries were possibly related to the descendants of these two groups and their different demographic interactions compared to other Han populations.

Fig. 1figure 1

General population structure inferred from principal component analysis and model-based ADMIXTURE. a Patterns of genetic relationship among 93 ethnolinguistically distinct modern East Asians and 58 ancient populations, in which ancient people were projected into the essential background of two top components extracted from modern genetic variations. Populations from different language families or archeologically different groups were color-coded with different colors and different shapes, presenting different populations within one language family or archeological group. b Individual or population ancestry components among 210 ancient and modern Eastern Eurasian groups were inferred based on the ADMIXTURE with seven predefined ancestral sources. The model with K = 7 possessed the lowest cross-validation error

The well-fitted ADMIXTURE model based on the merged HO dataset revealed that the newly studied GPH harbored the highest proportion of AN-related ancestry (ancestral component colored as orange) and a certain proportion of TB- (green), HM- (red), and AA-related (pink) ancestries (Fig. 1b). The ADMIXTURE-based admixture model showed that GPH derived ancestries from Northern and Southern ancestral sources, which was in line with the result of PCA, supporting the view that GPH derived from these ancient Northern East Asians (ANEAs) and ASEAs. Furthermore, the complex genetic relationship has attracted our interest in exploring the genetic contribution of Northern Han to GPH. We performed ADMIXTURE in the context of the merged HGDP-O dataset, which showed that there were most likely three ancestral populations that contributed to GPH (Fig. 2). The ADMIXTURE result indicated that GPH shared the majority ancestral makeup with Northern Han-related ancestry (ancestral component colored as green, proportion of 48.2%), followed by Zhuang- (43.1%) and Miao-related (8.7%) ancestry. The ancestral composition pattern of GPH was similar to that of Guangxi Zhuang (GXZ). To further explore the distribution patterns of the ancestral components related to Northern Han, we compared the ancestral proportion of geographically different Han and Southern indigenous groups, including Northern Han (Han_HulunBuir), Southern Han (Han_Guizhou), Southern ethnic minority (GXZ), and GPH. The Han-related component consistently decreased from North to South China, while the Zhuang-related component increased, consistent with the geographic location. Besides, we explored the similarities of geographically different GPH populations, and no statistically significant population stratifications or correlations with longitude or latitude were observed (R2 < 0.5).

Fig. 2figure 2

Unsupervised ADMIXTURE results with K = 12. Admixture analysis of GPH shown with red, green and light blue components maximized in Hmong-Mien-, Northern Han-, and Tai-Kadai (TK)-related groups, respectively. Four groups (including Northern Han, Southern Han, Southern TK, and GPH groups) were selected individually to present their specific ancestral composition and proportions. The 19 Guangxi groups were arranged from left to right as the Northern Han-related components increased

GPH falls in the genetic variability between Northern and Southern East Asians (NEA and SEA) in several analyses and demonstrates a high genetic affinity to Southern ethnic minorities. We calculated the pairwise Fst values to examine the genetic relationship between GPH and surrounding populations (Additional file 1: Fig. S2a). The overall genetic makeup of GPH was closest to Han_Haikou (Fst = 0.0010), followed by surrounding Han and ethnic minorities, including GXZ, Han_Hunan, and Han_Guizhou (Fst = 0.0013–0.0017). The estimated affinity based on the pairwise genetic distances demonstrated that GPH had a closer genetic relationship with Han and ethnic minorities of geographically adjacent populations from South China than with Northern Han. We further confirmed the genetic affinity between GPH and geographically close populations by using outgroup-f3 statistics as f3 (modern reference populations, GPH; Mbuti), which detected the amount of shared genetic drift between modern worldwide groups and newly studied populations. GPH generally had the most significant outgroup-f3 values and the most shared genetic drift with Southern ethnic minorities, such as She, Paiwan, Zhuang, and other Southern groups (Fig. 3a; Additional file 1: Fig. S2b). Our previous analysis of Fst and outgroup-f3 statistics suggested a more robust genetic affinity between GPH and surrounding populations (Fig. 3a; Additional file 1: Fig. S2a-b). We further confirmed the genetic links by calculating the shared IBD (identity by descent) among East Asian populations using Refined IBD [47], which again showed frequent genetic exchanges between GPH and surrounding populations, especially ethnic minorities from Guangxi and Guizhou Provinces at a wide range of time, suggesting that GPH received genetic influence from geographically close populations (Additional file 1: Fig. S3a-f). Next, the genetic affinity between GPH and 30 ancient Chinese populations was also quantified by calculating outgroup-f3 statistics in the form of f3 (ancient reference populations, GPH; Mbuti) and found that GPH shared the most alleles with Iron Age and historic ASEAs, including Taiwan_Hanben_IA, BaBanQinChen, and LaCen (Fig. 3b; Additional file 1: Fig. S4). These observations on outgroup-f3 statistics are also confirmed by the f4 statistics in the form of f4 (ancient reference population1, ancient reference population2; GPH, Mbuti) (Additional file 1: Fig. S5). As described above, GPH possessed genetic similarities to modern and ancient Southern East Asians.

Fig. 3figure 3

Outgroup-f3 statistics and admixture-f3 statistics for GPHs. a, b Outgroup-f3 statistics based on the merged HGDP-O data and the merged 1240 K data to explore the geographic distribution pattern of genetic drift shared between modern groups, ancient populations, and GPH, respectively. Different colors represent different levels of f3 values. See Additional file 1: Figs. S2b and S4 for more details. c, d We conducted admixture-f3 statistics of the form f3 (modern reference population1/ancient reference population1, modern reference population2/ancient reference population2; GPH), and the top 30 were respectively shown in c and d after filtering. Red represents Z values that are significant, and green represents Z values that are not significant

Admixture scenarios and gene flows

In order to assess the admixture scenarios revealed by the ADMIXTURE analysis, we investigated whether it could recover the Southern and Northern genetic ancestries by simulating GPH as a mixture of all available pairs of sources. We performed admixture-f3 statistics using modern plausible ancestral sources included in the merged HGDP-O dataset to explore the potential ancestry groups of GPH. Most of the tests with GXZ and Northern Han groups as source pairs are statistically significantly negative (Z scores < − 3), indicating that the gene pool of GPH can be modeled as the result of the north-to-south admixture model (Fig. 3c), which was consistent with the observed mixed pattern in the fitted ADMIXTURE models. We also observed that the combination of Southern Han and Southern ethnic minorities had significant negative values, suggesting that GPH possessed additional genetic material compared with the other Southern Han. To test whether source pairs of ANEAs and ASEAs can be used to fit the observed genetic diversity of GPH, we then conducted the admixture-f3 analysis using the potential ancestral source groups included in the merged 1240 K dataset. We observed statistically significant negative Z scores in f3 (ancient reference population1, ancient reference population2; GPH), especially the LaCen/Taiwan_Hanben_IA and China_Upper_YR_LN pairs, suggesting that GPH was an admixed population and could be simulated as an admixture of these groups (Fig. 3d). In addition, a few pairs of two ancient people from South China could also be used as possible ancestral sources of GPH.

Furthermore, we performed a series of f4 statistics in different forms. To assess the degree of genetic homogeneity within GPH groups, we carried out symmetrical f4 (GPH1, GPH2; reference populations, Mbuti) based on the merged HO dataset. A significant positive-f4 value indicated more allele sharing between reference populations and GPH1, while a significant negative-f4 value indicated more allele sharing between reference populations and GPH2. Most results with non-significant f4 value (|Z|≤ 3) were observed, which indicated the relative genetic homogeneity in GPH from different Guangxi prefecture-level cities. However, a few results of symmetrical f4 (GPH1, GPH2; reference populations, Mbuti) with significant f4 value (|Z|> 3) were also identified here, such as Northwest and Northeast GPH (Baise and Yulin) showed more SEA-related ancestry, including AN-, TK-, and AA-related ancestry when compared with Southwestern GPH (Fangchenggang) (Additional file 2: Table S1). The result showed that several GPH people had genetic heterogeneity and contained different genetic admixtures and evolutionary histories but generally genetic homogeneity within GPH. For the remaining analyses, GPH were merged into a single cluster based on their minimal genetic heterogeneity.

To quantify the genetic heterogeneity between GPH and ethnically and geographically different people, we focused on the differences between Southern ethnic minorities and Hans by calculating f4 statistics in the form of f4 (GPH, Guangxi ethnic minorities/Northern Han/Southern Han; reference populations, Mbuti). GPH shared more SEA-related alleles than Northern Han (Han_HulunBuir) (Additional file 1: Fig. S6a). We observed statistically significant f4 values, indicating the differentiated demographical history of ethnolinguistically different Guangxi people, and GPH shared more NEA-related alleles when compared to Southern ethnic minorities, including Hlai, Jing, and Zhuang (Additional file 1: Fig. S6b-d). The estimated positive values in f4 (GPH, Han_Haikou; reference populations, Mbuti) further suggested that GPH received additional genetic influence from Southern ethnic minorities than Southern Han, confirming the admixture signatures inferred from the admixture-f3 statistics (Additional file 1: Fig. S7a). The genetic discrepancy between GPH and Guangxi Miao, Sui, and Yao groups was also confirmed via statistically significant f4 (GPH, Guangxi ethnic minorities; reference populations, Mbuti) (Additional file 1: Fig. S7b-d). We then examined the genetic cladality between our studied populations and modern Chinese people using an individual-based qpWave analysis, and we observed significant statistical differences between GPH and other Han Chinese from North and South China. The potential reason could be that GPH harbored more indigenous SEA-related components than other Hans, which mainly contributed from GXZ (Additional file 2: Table S2). Genetic heterogeneity also existed between GPH and GXZ populations. Besides, non-significant statistical values were observed between GPH and some geographically close groups (Dongs, Jings, Shes, Yaos, and Miaos). As stated above, GPH was genetically divergent from both Han Chinese groups and Southern ethnic minorities, further supporting the formation of the north-to-south admixture. We found two potential ancestral proxies of GPH based on the resulting admixture-f3 statistics, including the Northern Han or ANEA and the other GXZ or ASEA. We then performed f4 (reference populations, GPH; Northern Han, Mbuti) and used all publicly available East Asian populations as the reference populations (Additional file 2: Table S3). Statistically significant negative f4 values indicated that GPH showed closer genetic connections with Northern Han than other East Asian groups. Furthermore, to test whether GPH directly from the Northern Han was plausible, we computed the f4 (Northern Han, GPH; reference populations, Mbuti) to determine whether newly studied populations and Northern Han form a robust clade, where reference populations were 133 worldwide groups. However, we observed significant negative f4 values when SEAs related to Hlai, Zhuang, and Sui were used as the reference groups, indicating that GPH obtained additional gene flow from SEA compared to their Northern ancestral proximity (Additional file 2: Table S4). These observations were also supported and confirmed by the admixture-f3 statistics. Generally, affinity statistics showed GPH and GXZ had significant genetic heterogeneity and the former shared more alleles with Northern Han relative to GXZ, which provided the clues supporting both millet and rice farmers participating in the formation of GPH. To test our hypothesis, we then conducted f4 (ancient reference populations, GPH; ASEA, Mbuti) to test the affinity with ASEA sources, where other ancient East Asians represented ancient reference populations. We observed significant negative values, which suggested ASEA shared more alleles with GPH than other reference groups and is consistent with the expectation in the hypothesis status (Additional file 1: Fig. S8a). To identify signals of additional genetic materials of GPH, we selected Taiwan_Hanben_IA, LaCen, and BaBanQinCen as the Southern ancestral source and performed f4 (ASEA, GPH; ancient reference populations, Mbuti). We identified several Northern ancestral sources, particularly China_Upper_YR_LN, which possessed the most significant negative value compared to Taiwan_Hanben_IA (Additional file 1: Fig. S8b), suggesting their additional gene flow into GPH. As stated above, these results combined the admixture signatures identified in the admixture f3 statistics, suggesting that two ancient ancestries from North and South China participated in the formation of GPH, especially the combination of ancient people from YRB in the late Neolithic period and ancient groups from Taiwan Hanben in the Iron Age, which was in line with the admixture-f3 statistics models.

Finally, we also estimated the ROH (runs of homozygosity) to measure the recent inbreeding of GPH and other East Asians under the merged HO dataset using PLINK v.1.90 [48], illustrating that Miao_Guangxi have many long ROH segments within populations related to other groups and GPH shared an approximately close number of ROH segments with other Hans and surrounding groups (Additional file 1: Fig. S9a-e). We observed the same pattern of ROH segments in three different levels in Miao_Guangxi. One possible explanation could be that Miao_Guangxi has experienced more consanguineous marriages compared to their surrounding populations, and there were different situations of consanguineous marriage among different groups in Guangxi Province.

Graph-based complex evolutionary models

To construct the admixture graph, including gene flow events and population split between the diverged human populations, we performed TreeMix-based phylogenetic tree reconstructions. Phylogenetic relationships within 29 modern populations without admixture events portrayed two lineages following the geographic distribution. One was the lineage of Northern Han, and the other was Southern ethnic minorities. Our target population was located on the Southern indigenous genetic lineage, which closely clustered with Han_Haikou, followed by surrounding ethnic minorities (Additional file 1: Fig. S10a). However, with the increasing number of gene flow events, we identified gene flow from the Tujia_Guizhou into GPH with admixture events, which showed frequent genetic exchanges among surrounding populations. Generally, GPH showed a strong affinity to geographically close populations, including ethnic minorities of Guangxi and Guizhou Provinces and placed in the Southern indigenous lineage (Additional file 1: Fig. S10b-d). Additionally, results based on the shared IBD, Fst, outgroup-f3 statistics, and f4 statistics among individual-level or population-level groups also confirmed the genetic affinity within the geographically adjacent ethnic minorities. Furthermore, we modeled the relationship between GPH and Southern ethnic minorities using a graph-based qpGraph method. We observed that the Li_Qiongzhong-related and Upper_YellowRiver_LN-related lineages contributed to modern GPH with different proportions (Z = − 2.674). GPH was fitted with 29% ancestry related to Upper_YellowRiver_LN and the primary (71%) ancestry from Li-Qiongzhong, which revealed the potential genetic pattern of GPH that resulted from the expansion of the ancient YRB-related ancestries to South China (Fig. 4a). Admixture modeling confirms the contribution of ANEA and SEA among the GPH. To quantify the fine-scale ancestral proportion of different ancestry sources in studied populations, we used qpAdm modeling analysis. We considered that the qpAdm model is rejected if the p value < 0.05 and admixture proportions exceed the bound of 0–1 and the stand error is below zero [49]. To better define the genetic link between Neolithic to Iron Age populations and GPH, we used Neolithic to Iron Age populations as the distal sources for the qpAdm models and observed GPH could be modeled using YRB populations (71.8 to 94.3%) with additional ancestry from Guangxi and Fujian sources (5.7–28.2%) (Fig. 4b). Additionally, models using more contemporaneous potential ancestral surrogates portray GPH as a mixture of two major ancestry sources that are descended from ancestry present in the NEA/ANEA and Southern ethnic minorities (Fig. 4c). The genetic heterogeneity between GPH and the ancestry sources was examined using qpWave analysis (Additional file 2: Tables S5-S6). In addition, the three-way admixture model of Han-Atayal-GXZ (0.264-0.052-0.684, respectively) could also provide a good fit for GPH’s admixture history (Additional file 2: Table S7). Taken together, we find two distinct and geographically structured ancestry sources contributed to the gene pools of GPH, with the ANEA/NEA population representing one of them. We refer to the other one as ASEA/SEA.

Fig. 4figure 4

Genetic ancestry modeling for potential sources across newly reported genetic groups. a Evolutionary history with five admixture events constructed by qpGraph. The selected qpGraph-based phylogenetic topology fitted the best-worst Z scores below 3 (Z = − 2.674). Percentages on the dashed lines represent the admixture proportions of the two ancestral groups. Numbers on solid lines show 1000 times genetic drift. Admixture graph fitting GPH as an admixture of ancestries associated with Li_Qiongzhong and Upper_YR_LN. b Working distal qpAdm models with distinct ancestral sources. GPH show high levels of YRB ancestry. The error bar indicates the stand errors of predicted proportions of ancestors obtained from qpAdm. c The admixture proportions of proximal qpAdm models for GPH. Each bar represents ancestry proportions of the listed subgroups for GPH. ANEA, ancient Northern East Asian; ASEA, ancient Southern East Asian; NEA, Northern East Asian; SEA, Southern East Asian

We further used the ALDER (Admixture-induced Linkage Disequilibrium for Evolutionary Relationship)-based method to infer the date of admixture events between two ancestral populations with a generation time of 29 years [50]. Our observed result showed complex genetic admixture processes between NEA and SEA in studied populations, such as the evidence of admixture with standard deviation of 24.90 and 13.57 for GPH 77.32 generations (985.38 BCE–485.82 CE) in the Han_Harbin-GXZ model and 70.84 generations (468.89 BCE–318.17 CE) for the Han_Changchun-GXZ simulation, respectively (Additional file 2: Table S8). Linkage-based admixture time estimation suggested that our target groups can be modeled as Northern Han and GXZ admixture results in a wide range of time. To further identify, date, and describe the fine-scale admixture events and get more detailed information on the demographic history of GPH, we conducted the fastGLOBETROTTER analysis and used 14 genetically different populations (including 11 populations from East Asia, two populations from Europe and one population from South Asia) as surrogates for the admixture sources and employed Han_Harbin as the possible Northern donor and Cambodian as the Southern source (Additional file 1: Fig. S11a-b). In the provided output results, the admixture conclusion was “one date” at around 986 years ago (34 generations with a generation time of 29 years). Of the two sources contributing to the GPH, one was inferred to contribute 46% of the total admixture proportion and most genetically similar to the Cambodian groups. Analogously, the other source was inferred to contribute 54% and most genetically similar to Han_Harbin. This also supported the mixed north–south model as confirmed by several analysis.

Detailed demographic history and fine-scale genetic structure of GPH

We used the merged HO dataset to infer demographic history and utilized IBDNe to infer GPH’s recent demographic history. Our observation found that Han from Guigang, Liuzhou, Guilin, and Qinzhou experienced a different demographic history, indicating that people from these four districts did not experience population bottlenecks in recent generations (Additional file 1: Fig. S12a-b). At the same time, most populations in Guangxi Province showed different degrees of population bottleneck around ten to fifty generations ago. Moreover, Han and Zhuang populations in Guangxi from previously published studies have also been used to infer demographic history. We found that Guilin and Qinzhou Hans did not experience a population bottleneck, which is consistent with our studied populations (Additional file 1: Fig. S12c). Similarly, a large majority of populations experienced a population bottleneck, in line with our newly generated populations, and GXZ has also occurred in a similar condition (Additional file 1: Fig. S12d). In addition, we also found that different language families in the same region have different demographic histories at the group level. We observed GPH, Jing, Miao, and Sui experienced a recent bottleneck around ten generations ago, while Yao and Zhuang did not (Fig. 5a–f). Furthermore, there was another bottleneck event only in Miao and Sui speakers around 50 generations ago, suggesting different demographic histories of linguistically diverse ethnic groups in the same region.

Fig. 5figure 5

Estimated recent effective population size and fine-scale population structure. af Effective population size (Ne) inferred by IBDNe, using 619 individuals from 6 ethnolinguistic groups in Guangxi Province. g The maximum a posteriori (MAP) tree produced by fineSTRUCTURE exhibits clustering patterns among 30 different East Asian groups. These two percentages (~84% and ~16%) represent the proportion of GPH clustered with TK and Central Han-related branches, respectively. Adjacent to the tree is an ADMIXTURE plot for the same data with four predefined ancestral sources. The figure uses these abbreviations: TK, Tai-Kadai; HM, Hmong-Mien; CH, Central Han; Mon, Mongolia; NH, Northern Han; GPH, Guangxi Pinghua Han people

Genetic admixture analysis and demographical modeling based on the pattern of shared alleles between individuals can only capture the primary information of population history. The combination of high-density SNP data and advanced computation capacity enables the exploration of population history through the linked DNA segments [51]. We used the phased haplotype fragments of 1716 East Asian individuals to explore the fine-scale population structure. Patterns of shared ancestry inferred from haplotype data revealed the previously unknown population sub-cluster. Population dendrogram among GPH and other 29 East Asian groups based on the average chunk showed two main branches (Southern branch represented by Southern indigenous groups and Northern branch represented by Mongolian and Han Chinese populations) and finer-scale subbranches (Fig. 5g). We observed that GPH was mainly formed a clade from the Southern branch (~ 84%), and few GPH populations were located in the subbranches of the Central Han branch (~ 16%), suggesting that GPH had a closer genetic affinity with the Southern indigenous groups relative to Han populations and complex multifaceted admixture scenarios occurred between GPH and neighboring ethnolinguistic groups. Besides, we observed the genetic homogeneity within GPH from the pairwise coincidence matrix outputted by fineSTRUCTURE, which was in line with the results from the f4 statistics analysis (Additional file 1: Fig. S13).

Highly differentiated genetics stratification within Guangxi populations

To address the genetic history between GPH and other Guangxi indigenous populations, we performed analyses of 619 individuals comprising six ethnolinguistic groups to explore the fine-scale evolutionary process across Guangxi Province. The PC1 in Guangxi regional PCA separated the GPH, Jing, and Yao people from Miao and Sui speakers. Miao and Sui formed two sub-clusters, one of which was close to the other Guangxi group, indicating a complex structure within ethnolinguistically diverse Guangxi groups (Fig. 6a). We then used an unsupervised clustering method implemented in ADMIXTURE to investigate population structure. At K = 2, an orange component associated with Miao and a blue component dominant in Zhuang were observed (Additional file 1: Fig. S14a). For K = 3, a pink component enriched in GPH was found (Additional file 1: Fig. S14a); for K = 4, another green component appeared in Jing speakers (Fig. 6b). This revealed significant genetic differences among groups in the same region. TreeMix analysis with one gene flow from Miao to Sui showed genetic affinity between Guangxi Miao and Sui populations, consistent with PCA results (Fig. 6c). Besides, the six Guangxi populations were separated into three splits, and GPH had a close relationship with the Yao groups. The genetic relationship and population structure were further evaluated using a pairwise Fst heatmap (Fig. 6d). The total average degree of IBD sharing between population groups suggested that Miao was genetically closest to Sui and GPH and Yao was closest related to Zhuang (Fig. 6e), which is consistent with observations from PCA, TreeMix, and Fst. Demographic reconstructions further emphasized diverse profiles among Guangxi populations. The ROH showed a characteristic profile for the Sui and Miao groups, with ROHs longer than 10–20 Mb and the previous bin (5–15 Mb), which might suggest recent intermarriage happened within these two groups relative to the other four populations (Fig. 6f; Additional file 1: Fig. S14b). This was also supported by the distribution of the total number of ROH fragments (Additional file 1: Fig. S14c). This testified that high inbreeding in Sui and Miao individuals occurred and suggested a distinct demographic history for the other four groups. Such a genetic discrepancy among ethnically different Guangxi communities was also confirmed by the fineSTRUCTURE topology (Fig. 6g), where the Miao branch split earlier than the other Guangxi clusters. The Zhuang cluster was next split from other Guangxi groups before the subcluster in the rest of the populations. We observed a subcluster within GPH, which might be consistent with the genetic diversity between Northern and Southern GPH [35]. Yao scattered in two subclusters distributed in GPH, which suggested a close genetic affinity between these populations. Jing split from one of the subclusters of GPH and formed an independent clade before the Miao and Sui separated. The two clusters in Miao groups indicated a substructure within Miao groups, which was in line with the result of PCA. There were complex population structures and significant genetic discrepancies among individuals representing six diverse cultural communities from the same geographical region, which might reflect that GPH was not a Southern indigenous group.

Fig. 6figure 6

Demographic history of five indigenous communities and GPH in Guangxi. a 619 modern individuals covering six ethnic groups were projected onto the first two principal components. K = 4 clustering analysis of six distinct cultural communities from the Guangxi Province using the ADMIXTURE method. c Inferred phylogenetic tree with one migration event. The migration arrow is assigned with appropriate weights. d The heatmap of pairwise Fst between GPH and five aboriginal groups of Guangxi. This shows the pattern of genetic closeness among six ethnic populations from the same geographic regions. e Heat map of the average length of paired IBD shared between populations. f Comparison of the long ROH (10–20 Mb) between populations. g Fine-scale population structure reconstructed based on the shared haplotypes. ADMIXTURE plot (K = 3 and K = 4) for 619 modern individuals, spanning six populations. Abbreviations of population names: Jing, Guangxi Jing; Sui, Guangxi Sui; Yao, Guangxi Yao; Zhuang, Guangxi Zhuang; GPH1, Guangxi Pinghua Han1; GPH2, Guangxi Pinghua Han2; HM1, Guangxi Miao1; HM2, Guangxi Miao2

Uniparental genetic history suggested admixture models of GPH

To reconstruct the paternal and maternal structures of GPH, we comprehensively explored their non-recombining Y-chromosome (NRY) and mtDNA haplogroups. Four hundred seventy-four unrelated male individuals in Guangxi Province could be assigned into several terminal Y-chromosomal lineages. The constructed phylogenetic topology showed that O2a and O1 lineages were prevalent in GPH with the highest proportions and were regarded as the founder lineages of GPH (Additional file 1: Fig. S15). The O1b1a paternal lineage was dominant in GPH, which was also prevalent in SEA indigenous groups (AA and HM). The O2a2 was identified in GPH with a high frequency, which was consistent with the overall paternal profile of the other Han Chinese [52,53,54]. Besides, a few NEA lineages (N1a1, R1b1, R1a1, and Q1b1) were sporadically distributed in GPH, suggesting that our studied groups obtained gene flow from NEA. We also constructed the network relationship among 317 male individuals, suggesting the same haplogroup distribution pattern (Additional file 1: Fig. S16). As for the mtDNA haplogroup results, we constructed the network relationship among 619 female people. The phylogenic tree revealed that the main mtDNA haplogroup of GPH was primarily contributed by M* (M7b1a1 and M7c1) and followed by D4 lineages, which exhibited a pattern similar to Southern indigenous groups, especially TK-related populations. Other haplogroups (B4 and F1) were also identified in GPH with low frequency (Additional file 1: Fig. S17). The uniparental haplogroup distributions were consistent with previous observations [36].

Shared and divergent adaptation signatures among ethnolinguistically diverse populations

To identify the potential genes responsible for the population-specific adaptation of GPH, we looked for adaptive signatures by applying Population Branch Statistic (PBS) using the Han_Changchun and European from the HGDP as the ingroup and outgroup reference groups, which allowed identifying genes that were potentially under selection in GPH but not in Northern Han. One hundred fifty-five candidates were detected after filtering the top 0.001 percentile. Among them, one of the most vital PBS selection signals was located on chromosome 2, which comprises eleven genes [HADHA, LRPPRC, ectodysplasin A receptor (EDAR), and ITGA6, etc.]. Hydroxyacyl-CoA dehydrogenase trifunctional multienzyme complex subunit alpha (HADHA), also known as tri-functional protein alpha, a monolysocardiolipin acyltransferase-like enzyme, is essential for fatty acid beta-oxidation and cardiolipin remodeling and plays a vital role in functional mitochondria in heart organism of human beings [55]. Leucine-rich pentatricopeptide repeat containing (LRPPRC) is a mitochondria protein that maintains the stability of the mitochondrial transcriptome [56]. We also observed the EDAR, the identified biological targets influenced by mutation, associated with multiple phenotypes, including the shovel shape of upper incisors, mammary and eccrine glands, and hair straightness [57,58,59]. Previous studies also found that EDAR is related to facial characteristics in Uyghurs [60]. To further elucidate the population-specific biological adaptation for ethnolinguistic groups in Guangxi Province, we changed the selected groups of the trios mentioned above as (Zhuang/Miao/Jing/Sui/Yao_Guangxi)-Han_Changchun-European and the top 0.001 percentile distribution o

留言 (0)

沒有登入
gif