High-resolution chromosome-level genome of Scylla paramamosain provides molecular insights into adaptive evolution in crabs

Sample preparation and genomic DNA isolation

A wild adult male S. paramamosain with stage III testis was collected off the coast of Shantou, China, and genomic DNA was extracted from its testis for sequencing. A Grandomics Genomic BAC-long DNA Kit was used to isolate ultralong DNA according to the manufacturer’s guidelines. The total DNA quantity and quality were evaluated using a NanoDrop One UV–Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, MA) and a Qubit 3.0 Fluorometer (Invitrogen Life Technologies, Carlsbad, CA). Large DNA fragments were obtained through gel cutting with the Blue Pippin system (Sage Science, Beverly, MA). Qualification of DNA involved visual inspection for foreign matter, assessment of degradation and size via 0.75% agarose gel electrophoresis, a check on purity (OD260/280 between 1.8 and 2.0; OD260/230 between 2.0 and 2.2) using Nanodrop 2000, and precise quantification with a Qubit 3.0 Fluorometer (Invitrogen, USA).

Library construction and sequencing

Approximately 8–10 µg of genomic DNA (> 50 kb) was selected using the SageHLS HMW library system (Sage Science, Beverley MA, USA) and processed with the Ligation Sequencing 1D Kit (Oxford Nanopore Technologies, Shanghai, China) following the manufacturer’s instructions. Library construction and sequencing were conducted on the PromethION (Oxford Nanopore Technologies) at the Genome Center of Grandomics (Wuhan, China). After quality inspection, large DNA fragments were recovered using the BluePippin automatic nucleic acid recovery instrument. Terminal repair, A-tailing, and ligation were performed using the LSK109 connection kit. Qubit was used to assess the constructed DNA library precisely. The library was then loaded into a flow cell, transferred to the PromethION sequencer, and subjected to real-time single-molecule sequencing. In the ONT sequencing platform, base calling and the conversion of nanopore-generated signals to base sequences [62] were executed using the Guppy toolkit (Oxford Nanopore Technologies). Pass reads with a mean qscore_template value ≥ 7 were obtained and used for subsequent assembly (https://github.com/nanoporetech/taiyaki).

Genome assembly, evaluation, and correction

A pure three-generation assembly was employed using filtered reads post quality control in NextDenovo software (reads_cutoff:1 k, seed_cutoff:28 k) (https://github.com/Nextomics/NextDenovo.git). The NextCorrect module was employed to correct the original data, yielding a consistency sequence after 13 Gb of error correction. De novo assembly using the NextGraph module produced a preliminary assembly of the genome. ONT three-generation data and Pb HiFi three-generation data (Additional file 1: Table S7) were then utilized with Nextpolish software (https://github.com/Nextomics/NextPolish.git) for genome correction. The corrected genome (Polish Genome) was obtained after three rounds of correction for both ONT and Pb HiFi data. Bwa mem default parameters were used to compare RNA sequencing data to the genome, and Pilon was iteratively calibrated three times to derive the final genome sequence. GC depth analysis and BUSCO prediction (https://busco.ezlab.org/) were used to assess genome quality and completeness.

Chromosome anchoring by Hi-C sequencing

To anchor hybrid scaffolds onto chromosomes, genomic DNA was extracted from crab testes for Hi-C library construction. The process involved cross-linking cells with formaldehyde, lysing cells, resuspending nuclei, and subsequent steps leading to proximity ligation. After overnight ligation, cross-linking was reversed, and chromatin DNA manipulations were performed. DNA purification and shearing to 400 bp lengths were followed by Hi-C library preparation using the NEBNext Ultra II DNA Library Prep Kit for Illumina, which was then sequenced on the Illumina NovaSeq/MGI-2000 platform. Cell samples were fixed with formaldehyde and subjected to lysis and extraction for sample quality assessment. After passing the quality test, the Hi-C fragment preparation involved chromatin digestion, biotin labeling, end ligation, DNA purification, and library construction. The library was sequenced on the MGI-2000 platform, and data were processed to extract high-quality reads. The analysis included filtering for adapters, removing low-quality reads, and eliminating reads with an N content exceeding five. Reads were aligned using Bowtie2 [63], and contig clustering was performed using LACHESIS software [64].

Gene annotation

Tandem repeats were annotated using GMATA (https://sourceforge.net/projects/gmata/?source=navbar) and Tandem Repeats Finder (TRF) (http://tandem.bu.edu/trf/trf.html), identifying simple repeat sequences (SSRs) and all tandem repeat elements. Transposable elements (TEs) were identified through an ab initio and homology-based approach, with RepeatMasker (https://github.com/rmhubley/RepeatMasker) used for searching known and novel TEs. Gene prediction employed three methods: GeMoMa (http://www.jstacs.de/index.php/GeMoMa) for homolog prediction, PASA (https://github.com/PASApipeline/PASApipeline) for RNAseq-based prediction, and Augustus (https://github.com/Gaius-Augustus/Augustus) for de novo prediction. EVidenceModeler (EVM) (http://evidencemodeler.github.io/) integrated gene sets, which underwent further filtering for transposons and erroneous genes. Untranslated regions and alternative splicing regions were determined using PASA based on RNA-seq assemblies which were from our lab and downloaded from NCBI database using the keyword “Scylla paramamosain” (Additional file 1: Table S8). Functional annotation involved comparisons with several public databases, including SwissProt, NR, KEGG, KOG, and GO. InterProScan identified putative domains and GO terms. BLASTp (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used against public protein databases to assess gene function. The prediction of noncoding RNA (ncRNA) entailed using tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/) and Infernal cmscan (http://eddylab.org/infernal/) for tRNAs and other noncoding RNAs. BUSCO was used to evaluate gene predictions, aligning annotated protein sequences to evolution-specific BUSCO databases.

Evolutionary analysis

To ascertain homologous relationships between S. paramamosain and other animal species, protein sequences were acquired and aligned using OrthMCL (https://orthomcl.org/orthomcl/). Initially, protein sets were gathered from 14 sequenced animal species, selecting the longest transcripts for each gene after excluding miscoded and prematurely terminated genes. Subsequently, pairwise alignment of these extracted protein sequences was conducted to identify conserved orthologs, employing Blastp with an E-value threshold of ≤ 1 × 10–5. Further identification of orthologous intergenome gene pairs, paralogous intragenome gene pairs, and single-copy gene pairs was achieved using OrthMCL. Species-specific genes, including S. paramamosain-specific genes and unclustered genes, were extracted. Functional annotation and enrichment tests of species-specific genes were performed utilizing information from homologs in the online GO (http://www.geneontology.org/) and KEGG (https://www.genome.jp/kegg/) databases.

Building upon the orthologous gene sets identified with OrthMCL, molecular phylogenetic analysis was executed using shared single-copy genes. Coding sequences were extracted from single-copy families, followed by multiple alignments of each ortholog group using MAFFT (https://mafft.cbrc.jp/alignment/software/). To eliminate poorly aligned sequences, Gblocks were applied, and the GTRGAMMA substitution model of RAxML (https://cme.h-its.org/exelixis/web/software/raxml/hands_on.html) was employed to construct phylogenetic trees with 1000 bootstrap replicates. The resulting tree file was visualized using Figtree (http://tree.bio.ed.ac.uk/software/figtree/), after which the RelTime tool (https://www.megasoftware.net/) of MEGA-CC was used to compute mean substitution rates along each branch and estimate species divergence times. Three fossil calibration times obtained from the TimeTree (http://www.timetree.org/) database served as temporal controls, including the divergence times of S. paramamosain.

The detection of significant expansions or contractions in specific gene families, often indicative of adaptive divergence in closely related species, was carried out based on the OrthoMCL results. To examine this, a birth and death process to model gene gain and loss over a phylogeny was used (https://github.com/hahnlab/CAFE). Furthermore, the ratio of the nonsynonymous (Ka) to the synonymous (Ks) substitution rates of protein-coding genes was calculated in accordance with the neutral theory of molecular evolution [65]. The average Ka/Ks values were determined to identify positively selected genes within the S. paramamosain lineage with a branch-site likelihood ratio test using Codeml (http://abacus.gene.ucl.ac.uk/software/) from the PAML package. Genes with a p value < 0.05 under the branch-site model were considered to be positively selected.

Whole-genome duplication events were investigated using fourfold synonymous third-codon transversion (4DTv) and synonymous substitution rate (Ks) estimation. Initial steps involved extracting protein sequences and conducting all-vs.-all paralog analysis through self-Blastp in these species. After filtering by identity and coverage, the Blastp results were analyzed with MCScanX [66], and the respective collinear blocks were identified. Subsequently, Ks and 4DTv were calculated for the syntenic block gene pairs using the KaKs-Calculator (https://sourceforge.net/projects/kakscalculator2/), and potential whole-genome duplication events were evaluated based on their Ks and 4DTv distribution.

Genome-wide association studies (GWAS) analysis

GWAS was conducted to identify genomic regions associated with sex traits in S. paramamosain. Genomic DNA was extracted from a population of 146 individuals collected from a crab culture farm in Niutianyang (Shantou, Guangdong, China) located at N23°22′23″, E116°42′16″. Based on whole-genome sequence data, a GWAS analysis of the sex trait was conducted using the PLINK software with logistic regression. Following completion of the GWAS analysis, significant SNP loci were identified by applying the Bonferroni correction method, considering loci with p-values less than 0.05 divided by the number of SNPs tested. The visualization of the results was performed using the qqman package in R to generate Manhattan and QQ plots. Finally, genomic structural annotation of important candidate SNP loci was carried out using SnpEff software, and gene function annotation was performed based on protein sequences from candidate genes using DIAMOND software with the Nr protein database.

Microinjection and RNA interference

To identify the function of the genes, RNA interference combined with microinjection were used. Based on the gene sequences, primers were designed for the amplification of the Abd-A, fru2, and elovl6 gene fragments and subsequent dsRNA synthesis. Specific primers were designed incorporating protective bases and T7 promoter sequences at the 5' end of both the positive and negative primers to facilitate dsRNA synthesis. The plasmid containing the Abd-A, fru2 and elovl6 gene sequence fragments served as the template for PCR amplification, respectively, and the resulting product underwent purification. The PCR product equipped with the T7 promoter was used as the template for the synthesis of dsRNA, a crucial step for gene interference through in vitro transcription. Next, larvae (zoeal stage IV) of S. paramamosain were positioned on a custom-made agarose gel plate. Then, dsRNA-Abd-A and injection indicator mixture was injected between the cuticle and abdominal space of the larvae using a microinjector. The development of the abdomen and limbs was observed through electron microscopy as the larvae progressed to the zoeal V stage.

Hepatopancreas treatment in vitro

To detect the function of Elovl6 in synthesizing fatty acid, the hepatopancreas of S. paramamosain were cultured in vitro. Crabs were anesthetized on ice for 10 min, followed by sterilization in 75% ethanol for 5 min. Hepatopancreas tissues were subsequently dissected and infiltrated with phosphate-buffered saline (PBS) containing 1% penicillin and streptomycin for 30 min, then washed eight times for 5 min each time using the PBS solution as above. Next, hepatopancreas tissues were cut into approximately 20 mg fragments using scissors. The fragments were then precultured at room temperature (25 ℃) in a 24-well plate with 150 µL of Leibovitz L15 medium (containing 1% penicillin and streptomycin). The 24-well culture plates were supplemented with dsRNA-Elovl6 and dsRNA-EGFP (control group) and placed in a 28 ℃ incubator for cultivation. After 18 h, 24 h, and 36 h of culture, tissue fragments were collected from each treatment for total RNA extraction and subsequent qRT-PCR analysis, with corresponding parallel samples of hepatopancreas tissue also collected for fatty acid analysis.

Sequence and phylogenetic analysis

To provide clues for predicting functions or evolution of the elovl6 genes, multiple sequence alignments were performed with the DNAMAN software (v6.0.3.99, Lynnon Corporation, USA). The phylogenetic tree was mapped using the neighbor-joining method with MEGA software (v11.0.13, Arizona State University, USA), based on deduced amino acid sequences of Elovls from S. paramamosain, as well as representative mammals, teleosts, and crustacean species downloaded from NCBI Genbank. Confidence in the resulting phylogenetic tree branch topology was measured by bootstrapping 1000 replicates.

Quantitative real-time PCR (qRT-PCR)

To detect the expressions of genes and miRNAs, qRT-PCR was used. Firstly, miRNAs were isolated from distinct developmental stages (embryo, zoea, megalopa, and crablet) and tissues (ovary I–V and testis I–III) of S. paramamosain using the miRcute miRNA Isolation Kit. Subsequently, the miRcute enhanced miRNA cDNA First Strand Synthesis Kit was deployed for the reverse transcription process. Finally, the miRcute enhanced miRNA Fluorescence Quantitative Detection Kit was applied during the qPCR analysis to elucidate the expression dynamics of both miRNAs and the genes. The total RNA was extracted using the RNAiso Plus kit (Takara Co. Ltd., Japan). Preceding the quantitative real-time polymerase chain reaction (qRT-PCR), the RNA samples were treated with RQ1 RNase-Free DNase (Takara Co. Ltd.) to eradicate genomic DNA contamination. Next, cDNA synthesis was conducted using 500 ng of DNase-treated RNA and the Talent qPCR Premix (SYBR Green) kit (TIANGEN Biotech Co., Ltd., Beijing), in accordance with the manufacturer's instructions. The qPCR primers were designed using Primer 6.0 software (Additional file 1: Table S9), and the reactions were performed with a Mini Option real-time detector (Roche LightCycle@480). Each reaction mixture contained 10 µL of Talent qPCR Premix (2 ×), 0.6 µL of PCR forward primer (10 µM), 0.6 µL of PCR reverse primer (10 µM), 2.0 µL of RT reaction solution containing cDNA at an amount of 20 ng, and 6.8 µL of RNase-free water. The amplification protocol involved an initial denaturation step at 95 ℃ for 3 min, followed by 40 cycles of denaturation at 95 ℃ for 5 s, annealing at 60 ℃ for 10 s, and extension at 72 ℃ for 15 s. Next, there was a melting curve analysis by sustaining the reaction at 72 ℃ for 6 s, followed by a denaturation step at 95 ℃ for 5 s. All products were initially resolved through agarose gel electrophoresis to verify amplicon sizes. Gene expression levels were normalized to the reference gene (18 s rRNA) and calculated using the optimized comparative Ct method (2−ΔΔCt).

Protein expression detection using western blotting

To detect the Abd-A protein expression in the cephalothorax and abdomen, western blotting was performed. Proteins were extracted from the cephalothorax and abdomen of S. paramamosain larvae using a cryogenic protein lysis solution following flash freezing with liquid nitrogen. Total protein was then extracted from each sample. After quantifying the protein concentration using the BCA protein assay kit (Sangon Biotech Co. Ltd., Shanghai, China), an equal number of proteins were loaded for SDS-PAGE electrophoresis. Following separation based on molecular weight, the proteins were transferred onto a PVDF membrane from the polyacrylamide gel and probed with specific antibodies. The next steps included incubation with HRP-conjugated secondary antibodies and detection using an ECL hypersensitive chemiluminescence kit (Sangon Biotech).

Fluorescence in situ hybridization (FISH)

To localize the expression of fru2 gene in the gonads of S. paramamosain, fluorescence in situ hybridization (FISH) was utilized. The FISH analysis was performed on S. paramamosain gonads that were fixed in 4% paraformaldehyde prepared with DEPC-treated water 2 h prior to sectioning at a thickness of 10–12 µm using a cryostat set at − 20 to − 25 ℃. Then, no more than three sections per slide were thaw-mounted onto charged Superfrost Plus slides. The fru2 probe, modified with FITC fluorescence at the 3′ end, was synthesized commercially (Sangon Biotech). The prehybridization and hybridization procedures followed the method described by Pinaud et al. [67]. Initially, a prehybridization solution diluted with high-grade formamide in a 1:1 ratio was added to each slide (50 µL per sample). Then, the slides were placed in a humid chamber for 30 min at room temperature before removing the coverslips in a 2 × SSC solution. After prehybridization, the probe was added to the hybridization buffer at a concentration of 1 ng/µL. A 25-mL hybridization mix was applied to each slide before placing them in a metal slide holder immersed in a mineral oil bath maintained at 65 ℃ for 14–16 h. Following hybridization, the slide holder was removed from the mineral oil bath and slides were rinsed twice (30 s each) in chloroform and immersed in a fresh batch of 2 × SSC solution and washed at room temperature for 1 h; they were then washed again in an SSC solution containing 50% formamide at 65 ℃ for 1 h; finally, they were washed twice (each time lasting 30 min) in tenfold diluted (0.1 ×) SSC solution while maintaining the temperature at 65 ℃. The signals resulting from hybridization on each slide were ultimately detected using fluorescence microscopy.

Yeast expression and fatty acid detection

To verify and compare the functions of elovl6 genes in different species of crabs (S. paramamosain, E. sinensis, and P. trituberculatus), the yeast expression system was used. PCR amplification of open reading frames (ORFs) corresponding to Eselovl6, Ptelovl6, and Spelovl6 was performed using a high-fidelity DNA polymerase, KOD-Plus-Neo (Toyobo, Japan), and cDNA templates. Specific primers that incorporated BamHI (GGATCC) and EcoRI (GAATTC) restriction sites were used according to the manufacturer’s instructions. The primers designed for ORF cloning, along with their respective restriction sites, are detailed in Additional file 1: Table S9. For Eselovl6 and Ptelovl6, the amplification process involved an initial denaturation step at 96 ℃ for 3 min, followed by 23 cycles of denaturation at 95 ℃ for 30 s, annealing at 58 ℃ for 15 s, and extension at 72 ℃ for 20 s, culminating in a final extension at 72 ℃ for 1 min. The PCR conditions for Spelovl6 included an initial denaturing step at 94 ℃ for 2 min, followed by 35 cycles of denaturation at 98 ℃ for 30 s, annealing at 94 ℃ for 30 s, extension at 68 ℃ for 30 s, and a final extension at 72 ℃ for 7 min.

The resulting DNA fragments were purified using the TIAN quick mini purification kit (Tiangen Biotech), after which the fragments were digested with the corresponding BamHI and EcoRI restriction endonucleases (Thermo Scientific, USA). These were ligated into pYES2 yeast expression vector (Invitrogen, USA) using the DNA Ligation Kit Mighty Mix (Takara, Japan). The resulting plasmid constructs, pYEselovl6, pYPtelovl6, and pYSpelovl6, were then introduced into E. coli DH5α competent cells (Takara, Japan) screened for the presence of recombinants using PCR.

The transformation and selection of yeast with recombinant plasmids and yeast culture followed modified protocols from Monroig et al. [68]. Purified plasmids inserted with either one of the elovl6 ORFs or no insert were used to transform yeast (Saccharomyces cerevisiae strain INVSc1) competent cells through the polyethylene glycol/lithium acetate (PEG/LiAc) method. The transformed yeast containing the corresponding recombinants were selected on S. cerevisiae minimal solid media plates without uracil (SC-Ura, containing 2% glucose and 2% yeast agar) and cultured at 30 ℃ for 3 days. Single colonies were then picked and transferred into liquid SC-Ura medium with 2% glucose and then allowed to grow again in a shaking incubator at 30 ℃ under 250 rpm for 2 days. Yeast genomic DNA was extracted from the bacterial solution using a yeast genomic DNA rapid extraction kit (Solarbio, Beijing, China), and the presence of the resulting plasmid constructs in S. cerevisiae was identified and confirmed by PCR screening.

For heterologous expression, successfully transformed yeast were cultured for 24 h in SC-Ura broth containing 2% raffinose at 30 ℃ and 250 rpm, diluted to an OD600 of 0.4 in SC-Ura broth and allowed to grow until the OD600 reached 1. The suspensions were centrifuged to obtain precipitated yeast, which was then resuspended in SC-Ura broth (containing 2% galactose) at an OD600 of 0.4. Next, a mixture of the suspensions, 10% tergitol-type (NP-40, Beyotime Biotechnology, Shanghai, China), and corresponding fatty acid substrates (including 16:0, 18:1n-9, 18:2n-6, 18:3n-3, 18:3n-6, 18:4n-3, 20:4n-6, and 20:5n-3) at final concentrations of 0.50 mM for C16, 0.50 mM for C18, and 0.75 mM for C20 were added to the centrifuge tubes. The tubes were placed into a shaker and incubated at 30 ℃ for 2 days at 250 rpm. Subsequently, the yeast was collected via centrifugation, washed with Hanks’s balanced salt solution (Solarbio), and processed for fatty acid analysis. This experiment was replicated with three separate recombinant colonies for each recombinant yeast strain.

Fatty acid analysis by gas chromatography‒mass spectrometry

After the yeast collection, the fatty acid contents were detected by gas chromatography‒mass spectrometry (GC–MS) to analyze the ability of elovl6 genes to synthesize fatty acid in different crabs. The yeast obtained above underwent a 48-h freeze-drying process and were ground into powder. Approximately 200 mg of this yeast powder was then thawed at 4 ℃ and placed in a 12-mL screw-top glass tube with a Teflon-sealed lid and 0.25 mg/mL of a methyl esterification solution, which was composed of 99 mL methanol, 1 mL sulfuric acid, and 0.025 g 2,6-Di-tert-butyl-4-methylphenol as an antioxidant. After vortexing for 2 min and ultrasonic disruption for 30 min, the samples were incubated in a water bath at 80 ℃ for 4 h and cooled. Next, 1 mL of n-hexane was added to the mixture and shaken vigorously for 1 min, after which 1 mL of ultrapure water was added to facilitate layer separation. The resulting supernatant was then filtered through a nylon syringe filter with a 0.22-µm ultrafiltration membrane (SCAA-104, ANPEL, China) and collected into a clean ampoule. The fatty acid methyl ester (FAME) solution was concentrated under a stream of nitrogen gas in a Termovap sample concentrator and resuspended in 600 µL of n-hexane and stored at − 20 ℃ until analysis. The FAME samples were separated and analyzed by GC–MS using an Agilent 7890B-5977A GC–MS (Agilent Technologies, Santa Clara, CA, USA) equipped with a fused-silica ultra inert capillary column (DB-WAX, 30 m × 250 µm internal diameter, film thickness 0.25 µm; Agilent J & W Scientific, CA, USA). The oven temperature was increased from 100 ℃ to 200 ℃ at a rate of 10 ℃/min, with a hold time of 5 min at 200 ℃. Next, the temperature was increased to 230 ℃ at 2 ℃/min with a hold time of 10 min at 230 ℃, followed by a final increase from 230 to 240 at 10 ℃/min. The injection, interface, and ion source temperatures were adjusted to 250, 240, and 230 ℃, respectively. High-purity helium (99.999%) served as the carrier gas with a constant flow rate of 1 mL/min. A 0.5 µL sample was injected at a 1:20 split ratio by an autosampler. The collision energy was set at 70 eV, and mass spectra data were acquired in full scan mode (scanning range 40–500 m/z). Fatty acids were identified through mass spectrometry using a commercially available standard library (National Institute of Standards and Technology Mass Spectral Library 2011) and the relative retention times of standards. The elongation conversion efficiencies of the fatty acid substrates were determined by calculating the proportion of exogenously added fatty acids (FAs) to the elongated FA products, given by the following equation.

$$\text(\%)=\frac}+\text}\times100$$

留言 (0)

沒有登入
gif