Advances in genomic tools for plant breeding: harnessing DNA molecular markers, genomic selection, and genome editing

Since plant domestication around 10,000 years ago, plant breeding has successfully developed crops and varieties essential to modern society, consistently defying Malthusian predictions [44]. Traditional pre-genomics breeding methods have resulted in modern cultivars, significantly increasing the yields of major crops since the mid-twentieth century. Today, genomics offers breeders advanced tools and techniques for whole-genome analysis, representing a significant shift by enabling direct examination of the genotype and its connection to the phenotype [166, 167]. This new era of crop development leverages genomics-based approaches, such as molecular markers, genomic selection, and genome editing tools, for precise and efficient improvements [166, 167]. Gene position or genomic regions that regulate important traits in plants are discovered using molecular markers. Markers are typically categorized into two main groups: classical and molecular markers. The limitations associated with phenotype-based markers prompted the development of direct DNA-based markers, often known as molecular markers, which exhibit greater versatility. Classical markers encompass morphological, cytological, and biochemical markers, while DNA markers include a variety of types such as Restriction Fragment Length Polymorphism (RFLP), Random Amplified Polymorphic DNA (RAPD), Amplified Fragment Length Polymorphism (AFLP), Simple Sequence Repeats (SSRs), Single Nucleotide Polymorphism (SNP) markers etc. [60]. In modern plant breeding, SNPs are extensively utilized as DNA markers to pinpoint genomic regions associated with key traits, thereby accelerating the breeding process. Recognized as the most prevalent variations within plant genomes, SNPs are invaluable for high-resolution genotyping, offering the highest map precision.

Moreover, SNPs are both more efficient and cost-effective compared to other markers. Their popularity surged in the twenty-first century, largely due to genotyping by sequencing (GBS) technique advancements. Some other novel marker techniques, such as Intron Length Polymorphism (ILP), Diversity Array Technology (DArT), Penta-Primer Amplification Refractory Mutation System (PARMS), Inter small RNA polymorphism (iSNAP), etc., have been employed in plant breeding, which has enabled precise selection of desirable traits in plants, genetic diversity analysis, and accelerated breeding (Amiteye, 2021). They pinpoint the precise genetic differences connected to desirable qualities, making it possible to pick individuals with the best genomic profiles for breeding with accuracy and efficiency. Recent advances in genomics are producing new plant breeding methodologies and ways (e.g., association mapping, marker-assisted selection, genomic selection, genome editing, etc.).

Genomic resources for plant breeding encompass genetic markers, reference genomes, genomic and protein databases, transcriptomes, and gene expression profiles. These resources facilitate the identification of genes associated with desirable traits, understanding genetic diversity, and acceleration of breeding programs [112]. Key techniques include genome-wide association studies (GWAS), marker-assisted selection (MAS), and genomic selection (GS), which allow for precise trait selection and prediction of breeding outcomes. By integrating these genomic tools, plant breeders can improve crop yield, disease resistance, and stress tolerance, enhancing agricultural productivity and sustainability. The existing variability among crop species is utilized for plant breeding activities, which can also be generated through crossing or induced mutagenesis. In addition to the identification of genetic markers and the availability of published genomes, clustered regularly interspaced short palindromic repeats-associated protein 9 (CRISPR/Cas9) is promising for application to modern breeding and is a novel technology for genome editing in major crops [132]. CRISPR/Cas9-based directional breeding is highly efficient and saves more time than other breeding techniques that use genome editing. It further enables targeted genetic modifications, opening new avenues for crop improvement.Genomics approaches are beneficial when dealing with complex traits, as these traits usually have a multi-genic nature and a significant environmental influence [102]. Genomic tools provide genomic information and facilitate the detection of QTLs and the identification of existing favorable alleles of small effect, which have frequently remained unnoticed and have not been included in the gene pool used for breeding [102]. Genomic tools have revolutionized plant breeding by enabling more precise, efficient, and targeted approaches to developing new plant varieties through genome-wide association study, marker-assisted selection, genomic selection, and gene editing. In this review, we present and discuss the most relevant advances in the development of genomic tools and provide examples of applying these tools to plant breeding.

Molecular markers: tool for the genetic analysis

Based on nucleotide sequence polymorphisms, molecular markers include insertions, deletions, point mutations, duplications, and translocations. They are ideal when codominant, evenly distributed, highly reproducible, and detect significant polymorphism. The first molecular marker technique, RFLP, was introduced by Botstein et al. [15]. RFLP, RAPD, AFLP, and Isozyme markers are first-generation molecular markers that have been developed and used in genetic analysis and plant breeding (Table 1). Advancements in molecular markers have significantly enhanced their efficiency, resolution, and application scope in genetic analysis and plant breeding (Table 2) [156]. These advancements can be broadly categorized into the development of new types of markers, improvements in marker technologies, and the integration of molecular markers with other genomic tools. Here are some key advancements:

Table 1 Salient features of major molecular markerTable 2 Principle of advance molecular markers and their use in crop improvementSSRs or microsatellites

Microsatellites, alternatively known as short tandem repeats (STRs) or simple sequence repeats (SSRs), are short DNA sequences with lengths typically ranging from one to six base pairs in contrast to minisatellites (VNTRs), which feature longer repeat sequences spanning from 11 to 60 base pairs [115]. Microsatellites are found throughout the genome, including in chloroplasts and mitochondria [125, 127]. Due to the different numbers of repeats present in these locations, SSRs exhibit high polymorphism that is simple to detect using polymerase chain reaction (PCR). Mismatches, recombination, mobile element transfer (retrotransposons), and DNA strand slippage are some of the mechanisms that contribute to the occurrence of SSRs. Common SSR motifs encompass mononucleotide (A, T), dinucleotide (AT, GA), trinucleotide (AGG), and tetranucleotide (AAAC) repeats. The creation of primers often uses flanking sequences that are conserved around SSRs. Developing SSR markers involves creating an SSR library, identifying specific microsatellites, designing primers in favorable regions, and conducting PCR. Banding patterns are then interpreted and evaluated for polymorphism. SSR markers are highly favored due to their codominant inheritance, abundance, allelic diversity, and ease of assessment via PCR with flanking primers. McCouch et al. [107] conducted a pivotal study on Simple Sequence Repeat (SSR) markers in rice, focusing on developing and mapping these markers to enhance genetic research. They identified numerous SSR loci across rice chromosomes and designed primers for their use. This comprehensive set of SSR markers has become instrumental in rice genetic studies, including linkage mapping, trait association, and breeding. SSR markers, are used in plant genetics for various applications. For example, in rice, SSR markers have been used to map QTLs related to drought resistance and yield enhancement [65]. In wheat, SSR markers help identify genetic diversity and select for traits like disease resistance and stress tolerance [174, 175]. A 2023 study on Brassica napus utilized 304 SSR markers to evaluate genetic diversity, uncovering a 76% polymorphism rate and pinpointing loci associated with oil content and disease resistance [173]. The research emphasized SSR markers’ effectiveness in detecting allelic variation and mapping important agricultural traits. Additionally, SSR markers showed cross-species transferability, proving valuable for identifying traits beneficial for breeding programs and conserving genetic diversity. SSR markers are crucial for crop improvement and understanding genetic variation [83].

Inter small RNA polymorphism (iSNAP)

Endogenous noncoding small RNAs, typically 20–24 nucleotides long and have important regulatory roles, are widely distributed in eukaryotic genomes [54]. These small RNAs offer a valuable resource for molecular marker development due to their conserved flanking sequences, enabling primer design for PCR-based fingerprinting. The Inter small RNA polymorphism (iSNAP) technique, pioneered by Gui et al. [54], capitalizes on this characteristic. To detect length polymorphisms brought on by insertions and deletions (InDels) within the small RNA pool, primer pairs flanking small RNAs are used to start PCR reactions. It is a noncoding, sequence-based marker system and is suitable for genotyping and genome mapping [3]. Unlike traditional markers that focus on coding sequences or microsatellites, iSNAP markers explore variations in non-coding regulatory regions. This opens up new possibilities for studying gene expression and complex traits influenced by small RNA pathways, including plant stress responses, development, and epigenetic regulation. iSNAP markers have functional relevance, as they are closely associated with gene regulatory mechanisms. This makes them particularly useful for marker-assisted selection (MAS), enabling the identification of traits governed by post-transcriptional gene regulation, such as disease resistance and stress tolerance. A recent case study by Zhang et al. [196] illustrates the application of iSNAP markers in identifying disease-resistant genes in tomatoes (Solanum lycopersicum). In this study, researchers focused on identifying polymorphisms in intergenic regions flanked by microRNAs (miRNAs) associated with defense responses. They developed a set of iSNAP markers and used them to screen tomato varieties for resistance to Phytophthora infestans, the pathogen responsible for late blight disease. In maize, iSNAP markers have been used to identify genetic loci associated with disease resistance and yield traits, enabling more efficient breeding [97, 98]. In wheat, they assist in mapping quantitative trait loci (QTLs) for drought tolerance, enhancing the development of resilient varieties [26].

Intron length polymorphism (ILP)

In eukaryotic genomes, introns are prevalent and found throughout various gene components. Due to their lower selective pressure, Introns exhibit greater variability compared to coding sequences, rendering them valuable as highly polymorphic genetic markers. Recently, researchers have focused on annotating and leveraging gene introns to create intron-length polymorphism (ILP) markers on a genome-wide scale. An Intron Length Polymorphism marker (ILP marker) is a genetic marker used in plant and animal breeding to identify variations in the length of introns—non-coding regions of a gene that are transcribed but not translated into proteins. These markers exploit natural differences in intron lengths among individuals or populations, aiding in genetic mapping, diversity studies, and breeding programs by distinguishing between different genotypes or assessing genetic diversity [7]. These markers have proven invaluable for large-scale genotyping in major food crop plants, such as rice [7, 170], wheat [147], and maize [94]. PCR, a widely used technique, conveniently detects ILP markers. The amplification of introns via PCR involves designing primers in flanking exons, a method known as exon-primed intron-crossing PCR (EPIC-PCR). Notably, exon sequences tend to be more evolutionarily conserved, enhancing the versatility of primers designed within exons compared to noncoding sequences. ILP markers are particularly advantageous when they target multiple insertions and deletions (InDels) within a single intron during amplification. This strategy significantly increases the likelihood of identifying genetic polymorphism. ILP markers are highly transferable across related species, allowing for comparative genomics and evolutionary studies [27]. For example, in cereals such as rice, maize, and wheat, ILPs have shown consistent results, making them invaluable for research across different species without needing species-specific markers. Liu et al. [99] developed intron length polymorphism (ILP) markers for plants, identifying 1507 ILP markers in Oryza sativa (rice). These markers were highly transferable across species and showed polymorphism rates of 85.3%. ILP markers proved useful for genetic diversity analysis and breeding applications in various crops. A recent study by Chen et al. [27] demonstrated the effectiveness of ILP markers in mapping drought tolerance in rice (Oryza sativa). The researchers used ILP markers derived from conserved intron regions of the Dehydration-Responsive Element-Binding (DREB) gene family. These markers were employed to screen a diverse set of rice germplasm, leading to the identification of several drought-tolerant varieties. The study showcased how ILP markers could be used to identify quantitative trait loci (QTLs) associated with drought tolerance. The QTLs identified were then validated across multiple rice populations, proving the reliability of ILPs in marker-assisted selection (MAS) for complex traits like drought resistance. These developments underscore the importance of introns and their applications in molecular genetics, enabling more effective crop breeding and resource management.

Single nucleotide polymorphism (SNP)

Single Nucleotide Polymorphisms are single nucleotide differences seen in the genomic sequences of individuals within a population. These are the most prevalent molecular markers, and their distribution varies between species. Contrastingly, while humans exhibit an average of one Single Nucleotide Polymorphism (SNP) per every 1000 base pairs [137], rice displays a higher frequency, with approximately one SNP occurring within every 130–140 base pairs [51]. SNPs are frequently discovered in noncoding areas. According to Sunyaev et al. [158], SNPs in coding areas can be synonymous or nonsynonymous, changing phenotypic features and amino acid composition. SNPs are the smallest units of genetic variation, providing a simple and abundant source of markers crucial for genetic mapping, marker-assisted breeding, and map-based cloning. [188]. Significant techniques for SNP genotyping include primer extension, invasive cleavage, oligonucleotide ligation, and allele-specific hybridization [154]. SNP markers discovered by two methods, including SNP discovery from PCR and SNP discovery from High Throughput-Next generation sequencing (NGS) – RNA-Seq, RAD-Seq, Genotyping by Sequencing (Fig. 1), WGS (Whole-Genome Sequencing), WGR (Whole-Genome Regression), etc.

Fig. 1figure 1

SNP discovery in plants through genotyping by sequencing (GBS) system and its application in crop improvement

Next-Generation Sequencing (NGS) is a high-throughput DNA sequencing technology that enables rapid, parallel sequencing of millions of DNA fragments for comprehensive genomic analysis. In recent years, NGS technologies have identified thousands to millions of SNPs in various crops, facilitating genetic diversity studies, trait mapping, and breeding improvements. Numerous tools are available for SNP discovery, including BioEdit, DNASTAR Lasergene Genomics Suite, SAMtools, SOAPsnp, Stacks, Ddocent, PyRAD, and GATK. Typically, biallelic SNPs are straightforward to assay. SNP is detected when a nucleotide from an accession read differs from the reference genome at the corresponding position. Without a reference genome, this comparison is made by examining reads from different genotypes using de novo assembly methods. SNP calling is performed using read assembly files generated by mapping programs. Various empirical and statistical criteria, such as read depth, quality scores, and consensus base ratios, are employed in the SNP calling process. SNP discovery is more effective when multiple and diverse genotypes are analyzed simultaneously, as this approach captures the genetic variability within a species. There are three main types of SNP genotyping platforms: single SNP genotyping (using PCR with Taqman from Life Technologies or KASP genotyping from LGC Genomics), multiple SNP genotyping (using SNP chips from Illumina and multiplexing from Sequenom), and SNP genotyping by next-generation sequencing methods such as Genotyping by Sequencing (GBS) and Restriction site Associated DNA sequencing (RADSeq). For large-scale genotyping, high-throughput methods such as Genotyping by Sequencing (GBS), Restriction site- associated DNA sequencing (RADSeq), and allele-specific PCR are used [37]. These technologies have been extensively used to discover and genotype SNPs in food crops, including cereals [barley [31, 48, 134, 162], rice [28, 185], and wheat [20]], oil crops [oilseed rape [29] and sunflower [100]], horticultural crops [cowpea [30], potato [59], tomato [149]], soybean [155], and among others. SNP markers are crucial in genetics and agriculture, aiding genetic mapping, identifying disease associations, and improving crops through selective breeding. In their 2011 study, Kump et al. used SNPs to pinpoint disease resistance genes in maize, specifically targeting Southern Leaf Blight (SLB), caused by Cochliobolus heterostrophus. They identified 32 quantitative trait loci (QTLs) significantly associated with SLB resistance. The identified SNPs and associated QTLs can be used in marker-assisted selection (MAS) and genomic selection (GS) programs to develop SLB-resistant maize varieties. SNPs also play a pivotal role in integrating multi-omics data for crop improvement by acting as key genetic markers that link DNA variations to other molecular levels, such as gene expression (transcriptomics), protein abundance (proteomics), and metabolite profiles (metabolomics). These connections help to unravel the complex genetic architecture of important agronomic traits, including yield, stress tolerance, and disease resistance. By mapping SNPs to different omics layers, researchers can identify critical genes, pathways, and molecular interactions responsible for these traits [90]. This comprehensive approach enhances breeding accuracy, enabling the development of superior crop varieties with enhanced performance through more informed and precise selection methods.

Diversity array technology (DArT)

The DArT sequencing technique is a highly reproducible microarray-based method for discovering polymorphic markers [177]. DArT is a genomic analysis method designed to enhance the detection of SNPs (Single Nucleotide Polymorphisms) across the genome, particularly insertions and deletions. It begins with creating a purposefully randomized fragment library, which serves as a genomic representation. DArT libraries are tailored for specific research purposes, utilizing suitable individuals, whether individual or pooled samples. The subsequent steps involve identifying the genetic representations, hybridizing them onto the chips, and printing the genomic library onto microarray chips. DArT simplifies the genome by initially subjecting it to restriction digestion and then hybridizing the resulting DNA fragments onto microarray chips. Data analysis is done after scanning. This method allows thousands of genomic loci to be simultaneously genotyped in a single reaction test, requiring as little as 50–100 ng of genomic DNA. Once markers are identified, the need for specific assays for genotyping is eliminated, except for consolidating polymorphic markers into an array for a particular genotype. These genotyping arrays are equipped with these polymorphic markers and are commonly employed in genotyping tasks [66]. DArT markers are primarily dominant and require specialized software, laboratory facilities, a substantial investment, and skilled personnel (Sinha et al., 2023).

Penta-primer amplification refractory mutation system (PARMS)

The Penta-Primer Amplification Refractory Mutation System (PARMS) is a specialized genotyping technique used for identifying specific single nucleotide polymorphisms (SNPs) or mutations in DNA sequences. It is particularly useful in plant and animal genetics, as well as in medical research, for detecting alleles associated with certain traits or diseases. It is an extension of the traditional Amplification Refractory Mutation System (ARMS), which relies on the specificity of DNA primers to distinguish between different alleles at a given genetic locus [159]. PARMS involves the use of five primers to achieve high specificity and efficiency in identifying specific alleles. Two universal primers bind to conserved regions of DNA surrounding the SNP or mutation of interest. Two allele-specific forward primers and a reverse shared primer are designed to match perfectly with either the wild-type allele, mutant allele, or a third variant allele, with mismatches at critical positions near the SNP or mutation. This allows for selective amplification of only the specific alleles in question. It employs competitive allele-specific polymerase chain reaction (AS-PCR) and a fluorescence-based reporting system to detect genetic variations, specifically single-nucleotide polymorphisms (SNPs) [159]. It can efficiently handle different numbers of SNPs and samples to be analyzed. The process requires only standard liquid handling, thermal cycling instruments, and plate reading instruments.

Furthermore, it is compatible with DNA samples from various sources and extraction methods, including alkaline lysis. This makes it ideal for a direct PCR-based SNP marker-assisted selection system (D-MAS), known for its simplicity, cost-effectiveness, and labor efficiency in SNP genotyping. In a practical application, Gao et al. [49] developed a PARMS marker for the TAC1 gene, illustrating its usefulness in rice plant architecture breeding. Having outlined the significance of molecular markers, we now delve into the applications of genomic resources in crop improvement.

Genomic resources for plant breeding

The availability of whole genome sequences is invaluable for plant breeding. Arabidopsis (125 Mb) and rice (466 Mb) were early models for plant genetics due to their small genomes among dicots and monocots. Their genome sequences, announced in 2000 and 2005, have been pivotal in understanding key genes and biological functions. The advent of next-generation sequencing (NGS) technologies has revolutionized genomics. Among these, the 454 (Roche) and Illumina platforms are widely used for crop sequencing. NGS technologies have significantly increased sequencing capacity; for instance, the Illumina HiSeq 2000 can generate 55 Gb per day, far exceeding the human genome size. The development of third-generation sequencing platforms like PacBio RS (Pacific Biosciences, https://www.pacb.com/), Helicos (Helicos, https://seqll.com/), and Ion Torrent has further advanced the field. These platforms enable the production of long reads, resulting in more accurate and contiguous genome assemblies. Third-generation sequencing is particularly effective for assembling genomes de novo, especially in regions with highly repetitive sequences and clarifying structural variants.

Additionally, isoform sequencing from these platforms facilitates detailed studies of exons, splice sites, and alternatively spliced regions, improving genome annotation. NGS-generated sequences are typically deposited in the NCBI Sequence Read Archive, making them accessible for further research. The emergence of third-generation sequencing has enabled the generation of long reads and allowed the production of more accurate and contiguous genome assemblies [25]. Third-generation sequencing enhances the creation of high-quality whole genome de novo assemblies by providing long reads that cover complex regions with highly repetitive sequences. This technology also elucidates other complex repeat sequences and structural variants. Third-generation sequencing techniques, such as isoform sequencing, produce full-length transcripts, enabling detailed analysis of exons, splice sites, and alternatively spliced regions, which aids in refining genome annotations. Sequences generated through NGS are typically archived in the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra) for public access.

Two standard analyses performed on NGS reads are genome assembly and mapping. Assemblers like Roche's 454 Gsassembler, Celera Assembler, and Mira are frequently used for genome assembly. Once a reference genome is available, variation studies are typically conducted using mapping software such as Bowtie, BWA, and TopHat, which align reads to the reference genome. SNPs can then be detected with tools like SAMtools or GigaBayes. The algorithms for processing raw genomic data vary based on the data type and desired results. Bioinformaticians must present their findings to breeders via user-friendly interfaces, often through easily navigable websites. General-purpose web databases like GenBank (http://www.ncbi.nlm.nih.gov/genbank/), EMBL (http://www.ebi.ac.uk/embl/), DDBJ (http://www.ddbj.nig.ac.jp/), UniProt (http://www.uniprot.org), and Swiss-Prot (http://expasy.org/sprot/) provide researchers and breeders with essential biological information. Genomic sequence databases GenBank, EMBL, DDBJ, Ensembl, UCSC Genome Browser, and dbSNP offer extensive genomic data, analysis tools, and resources. Protein function databases, integrating sequence data, structural information, and functional annotations, include UniProt, Swiss-Prot, Gene Ontology, Protein Data Bank, InterPro, KEGG, Pfam, STRING, BioGRID, and PhosphoSitePlus (Table 3). Additionally, specialized databases for specific species useful to breeders, such as SGN, Phytozome, Gramene, and CropNet, provide targeted information for breeding programs. These resources collectively support the plant breeding process by enabling detailed genetic and protein analyses, aiding in developing improved crop varieties.

Table 3 Important Databases and Repositories of Genomic Information

New genomic tools are crucial for advancing and speeding up gene expression studies. Gene expression analysis provides breeders with valuable biological insights, helping them understand the molecular basis of complex plant processes and identify new targets for manipulation. While QRTPCR is an affordable, quantitative technique, it can only analyze a limited number of genes per experiment. Other methods, such as differential display and cDNA-AFLPs, allow the study of thousands of genes but lack quantitative precision and struggle with low-abundance transcripts (M Perez-de-Castro et al. 2012). More advanced techniques like serial analysis of gene expression (SAGE) and massively parallel signature sequencing (MPSS) address some of these limitations. However, the most popular methods today for transcript profiling are hybridization-based platforms or microarrays. Expression arrays offer several advantages, including measuring tens of thousands of transcripts simultaneously, semi-quantitative results, and sensitivity to low-abundance transcripts. Several web resources facilitate microarray data analysis, such as Babelomics (http://babelomics.bioinfo.cipf.es/), and software packages like Bioconductor (http://www.bioconductor.org/help/workflows/oligo-arrays/) and MeV (http://www.tm4.org/mev/) specialize in microarray analysis. Babelomics was used to analyze transcriptomic data in Arabidopsis thaliana to identify genes differentially expressed under drought stress. The tool facilitated functional annotation and identified key genes involved in stress responses [108] In Solanum tuberosum (potato), Bioconductor was utilized to analyze gene expression profiles under biotic stress conditions, identifying genes linked to pathogen resistance [194]. Genevestigator (https://www.genevestigator.com/gv/doc/plant_biotech.jsp) is a handy database containing extensive microarray data from various species, with the most comprehensive data from Arabidopsis thaliana. Data from crops like maize, wheat, rice, barley, and soybean are increasingly becoming available. Published expression data are publicly accessible in databases such as GEO (http://www.ncbi.nlm.nih.gov/geo/), ArrayExpress (http://www.ebi.ac.uk/arrayexpress/), and species-specific repositories, providing valuable resources for analyzing gene expression in these and other crops. A summary of genomic resources related to genome sequence and functional analysis is presented in Table 3.

Genomic tools for plant breeding

Traditional plant breeding involves selection and crossing of plants with desirable traits over several generations. Techniques include selection of superior individuals, hybridization, and backcrossing. Breeders aim to enhance traits like yield, disease resistance, and quality. The process relies on natural genetic variation and careful observation to achieve desired improvements in crops over time. Genomic tools enhance traditional plant breeding by providing precise insights into genetic variations, speeding up trait identification, and enabling targeted modifications. They use DNA sequencing and markers to identify desirable traits more accurately and rapidly, reducing the time and cost of developing improved plant varieties compared to traditional methods that rely on broader, less precise selection processes [87]. These tools help identify desirable traits, speed up the breeding process, and improve the overall outcomes of breeding programs. Some key genomic tools in plant breeding include:

Quantitative trait Loci (QTL) mapping

QTL mapping is a statistical technique that combines phenotypic data (traits) with genotypic data (molecular markers) in a specific population to identi

留言 (0)

沒有登入
gif