Comparative genomic characterization of Cellulosimicrobium funkei isolate RVMD1 from Ma’an desert rock varnish challenges Cellulosimicrobium systematics

1 Introduction

The extreme habitats, such as arid and semi-arid desert regions, have become a focal point for researchers due to their unique microbial diversity (Shu and Huang, 2022). A prominent feature in arid and semi-arid regions is the formation of rock varnish, also referred to as desert varnish. This varnish forms a thin, dark layer on rocks, typically not exceeding a thickness of 200 μm. Its composition is quite complex and consists mainly of elements like oxygen, silicon, and aluminum (Alnaimat et al., 2017).

The exploration of the microbial and chemical characteristics of rock varnish and its formation has been significantly advanced by employing a variety of methods. Techniques such as shotgun metagenomics, elemental analysis, lipid analysis, biomarker analysis, and rock surface analysis have been crucial in investigating these properties (Lang-Yona et al., 2018; Ren et al., 2019). Recently, the discovery of the role of Cyanobacteria, particularly Chroococcidiopsis, in forming rock varnish and enhancing bacterial resilience, has opened new perspectives on the biogeochemical processes of Earth and Mars (Culotta and Wildeman, 2021).

The genus Cellulosimicrobium was initially introduced by Schumann and colleagues in 2001, with its first characterized and newly reclassified species, C. cellulans gen. nov., comb. nov., designated as the type species (Schumann et al., 2001). As of now, this genus has expanded to include seven validly published species, namely C. composti, C. fucosivorans, C. funkei, C. marinum, C. protaetiae, and C. terreum. Recently, C. fucosivorans was identified as a later heterotypic synonym of C. composti (Jiang and Han, 2023). As of September 26, 2023, there are 56 genome sequences from various members of the Cellulosimicrobium genus deposited in the NCBI database, accessible via. The majority are either C. cellulans or unspecified species within the Cellulosimicrobium genus.

This genus generally characterized as Gram-stain-positive, although they tend to decolorize quickly and they generate a substrate mycelium that eventually fragments into irregular, curved, and club-shaped rods, often arranged in a ‘V’ formation. As the medium becomes depleted, these rods transform into shorter rods or even spherical cells. These bacteria are non-spore-forming, catalase-positive, and display varying degrees of nitrate reduction. There are differences in motility among species in this genus. For example, C. funkei cells have one to five polar and/or lateral flagella and are motile, while C. cellulans are nonmotile (Schumann et al., 2001; Schumann and Stackebrandt, 2015). Cellulosimicrobium species have been isolated from soil (Yoon et al., 2007), clinical samples (Brown et al., 2006) and from marine sponges (Li, 2009; Antony et al., 2009). Despite being considered a rare pathogen, Cellulosimicrobium spp. may cause life-threatening conditions, such as endocarditis, particularly in immunocompromised individuals and those carrying foreign objects (Monticelli et al., 2019; Rivero et al., 2019). A specific case demonstrated C. cellulans being isolated from a patient who developed acute renal failure after a prolonged ICU stay for sepsis encephalopathy. This evidence supports its potential as an opportunistic pathogen in critically ill patients (Delport et al., 2014).

The use of whole-genome sequencing (WGS) for the comprehensive characterization of bacterial genetic diversity has become a standard approach, enabling a deeper understanding of their genomic structures and functional attributes. As part of our ongoing research focusing on microbial diversity, particularly the diversity of actinobacteria in desert varnishes, the purpose of this study is to determine the taxonomic status and comprehensively describe the phylogenetic, genomic, and taxonomic characteristics of C. funkei isolate RVMD1, isolated from rock varnish in the Ma’an Desert, Jordan. This is achieved by employing a multiphase classification approach that integrates whole-genome shotgun sequencing with rRNA gene amplicon analyses. To the best of our knowledge, this study represents the first instance of whole-genome sequencing for bacteria isolated from rock varnish in Jordan.

2 Materials and methods 2.1 Site description and sample collection

Several samples of desert rock varnish were aseptically collected from a semi-arid region close to Ma’an city, Jordan, at coordinates 30.188836, 35.639121. Approximately 50 mm of rain falls annually in this region. Aseptically selected rocks with relatively flat surfaces were immediately sealed in sterile aluminum foil for transportation. Once the rock specimens were transported to the laboratory, they were placed in a laminar flow bench, where a flame-sterilized coarse bit was used to grind them into varnish powder. The powder was then stored at −4°C. Bacteria cultivation was performed within 24 h of sample collection.

2.2 Bacteria isolation and enumeration

A 0.1 g sample of powdered rock varnish was placed directly onto Luedemann medium (DSMZ medium 877), supplemented with 50 μg/mL of Cycloheximide to prevent fungal growth. Following a 72-h incubation period at 37°C, colonies exhibiting distinct shapes and colors were identified, isolated, and then streak purified for further analysis.

2.3 Identification of isolate through 16S rRNA gene amplification and sequencing

The isolated samples underwent DNA extraction using the G-spin Total DNA Extraction Mini Kit (iNtRON Biotechnology, Suwon, Korea). Resultant DNA then served as a template for 16S rDNA amplification via PCR. The extraction process was carried out in line with the manufacturer’s guidelines. The SSU rRNA gene was amplified using the bacterial forward primer 27F (5′-AGRGTTYGATYMTGGCTCAG-3′) paired with the 1492R primer (5′-RGYTACCTTGTTACGACTT-3′). PCR product was purified using the PCR quick-spin PCR Product Purification Kit as per the manufacturer’s instructions, and sequenced by MACROGEN (Korea) using the Sanger method.

The obtained 16S rRNA gene sequences were analyzed and cross-referenced with the EzTaxon database and EzBioCloud Pro 16S-ID app (Yoon et al., 2017). A partial sequence of the 16S ribosomal RNA (rRNA) gene, encompassing 1,251 base pairs (bp), was obtained from the isolate RVMD1. This sequence has been successfully deposited in the NCBI GenBank database and is accessible under the accession number OR570906. 16S rRNA gene-based phylogenetic tree was constructed using Protologger. Sequences were aligned using MUSCLE (v3.8.31) with default settings, and further processed using FastTree (v2.1.7) with the GTR model to generate the phylogenetic tree. Taxonomic assignment involved identifying the closest relatives within the SILVA Living Tree Project based on sequence identity. Only species with validly published names from the DSMZ nomenclature list were included to ensure accurate classification (Hitch et al., 2021). The phylogenetic tree was visualized using the Interactive Tree Of Life (ITOL) online tool (Letunic and Bork, 2021) and subsequently edited in Inkscape (version 1.0).

2.4 Whole genome sequencing, assembly, annotation, and analysis of genomic features

As outlined in the protocol for Gram-positive bacteria, genomic DNA was extracted using the G-spin Total DNA Extraction Mini Kit (iNtRON Biotechnology, Suwon, Korea). The purity and integrity of obtained DNA were assessed using the Nabi-UV/Vis Nano Spectrophotometer from MicroDigital, South Korea.

The genome of isolate RVMD1 was sequenced using Illumina NextSeq 2000 (PE 150 bp, 15 M reads/sample) at EzBiome Inc., (Gaithersburg, MD, USA). Quality control of raw reads was checked with MultiQC, v.1.11 (Ewels et al., 2016). The reads were assembled de novo using SPAdes v.3.13.0 (Prjibelski et al., 2020). CheckM v1.0.18 (Parks et al., 2015) and QUAST v4.4 (Gurevich et al., 2013) through (Arkin et al., 2018) were used to assess genome quality.

The genome sequence was fully annotated using NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (Tatusova et al., 2016) and Bacterial Bioinformatics Resource Center (BV-BRC) (Olson et al., 2023), which provided a comprehensive analysis of its genetic makeup. Additional insights into various genomic features were obtained through the use of other tools, including the Microbial Genomes Atlas (MiGA) webserver (Rodriguez-R et al., 2018), the GTDB-Tk in the KBase platform (Chaumeil et al., 2019), EzBiome Genome-ID and Galaxy Protologger (see text footnote 3) (Hitch et al., 2021). Circular genome map was generated through Proksee web-based tool (Grant et al., 2023). The analysis of gene clusters and loci associated with antibiotic resistance and iron acquisition systems were retrieved using the SEED viewer of the RAST platform (Aziz et al., 2008; Overbeek et al., 2014).

2.5 Whole genome-based taxonomic placement, relatedness indices, and comparative phylogenomic analysis

The initial taxonomic status of isolate RVMD1 was identified by the whole genome-based bacterial identification service provided by EzBiome Genome-ID (see text footnote 8) (Ha et al., 2019). This service uses average nucleotide identity (ANI) to compare similarity values with a reference database during its identification process.

After initially identified through 16S rRNA and whole-genome sequencing, the taxonomic status of C. funkei RVMD1 was further clarified by a comprehensive phylogenomic analysis of 43 genomes from its genus in RefSeq as of September 2023, excluding atypical. This analysis encompassed a multifaceted approach, including genome-scale GBDP tree analysis via TYGS (Meier-Kolthoff et al., 2022), the analysis also involved calculating pairwise average nucleotide identity (ANI) using BLAST (ANIb) through JSpeciesWS (Richter et al., 2016). Digital DNA–DNA hybridization (dDDH) percentages were determined in silico using the Genome-to-Genome Distance Calculator (dGGDC 3.0) formula d4 (a.k.a. GGDC formula 2) (Meier-Kolthoff et al., 2022) applying a 70% species delineation threshold (Chun et al., 2018), leveraging the BLAST method as per the formula available at. Additionally, ANI/AAI matrix values for our isolate, C. funkei RVMD1, along with 43 other genomes, were calculated using the enveomics collection (Rodriguez-R and Konstantinidis, 2016). Matrix data were represented as a heatmap, generated using TBtools-I (Chen et al., 2023). The AAI-profiler server (Medlar et al., 2018) was employed for visualizing scatter plots and taxonomic profiles, confirming taxonomic identities, facilitating proteome-wide sequence searches, and validating isolate classification based on AAI-profile analysis, by computing AAI between a query proteome and all target species in the UniProt database.

In-depth comparative genome analyses were undertaken for C. funkei RVMD1 in relation to six reference species of Cellulosimicrobium genus (C. protaetiae BI34, C. cellulans ORNL-0100, C. funkei JCM 14302, C. marinum NBRC 110994, C. composti SE3 and C. arenosum KCTC 49039), genome sequences were retrieved from the National Center for Biotechnology Information (NCBI) up to September 2023. The analysis involved constructing a phylogenetic tree using OrthoANI values computed with the Orthologous Average Nucleotide Identity Tool (OAT)(Lee et al., 2016) on the EzBioCloud platform. Heatmap techniques were applied for visualizing OrthoANI to identify nucleotide similarities between RVMD1 and reference species. Circos chord diagram (Krzywinski et al., 2009) was employed to display OGRI metrics (AAI, ANI, dDDH) for an in-depth comparative view. Detailed visual fast whole-genome similarity analysis was performed using the FastANI 1.3.3 tool (Jain et al., 2018) available on the Proksee website (Grant et al., 2023). Genes associated with siderophore biosynthesis, heavy metal resistance, virulence, and cold and heat shock responses were analyzed using the BV-BRC platform (Olson et al., 2023). The results are illustrated in a horizontal stacked bar chart created using Google Sheets. The distribution of MicroTrait bioelement family genes was analyzed using HMMs of MicroTrait Bioelement families (v1.9.1) available on KBase (Karaoz and Brodie, 2022), the heatmap was created using TBtools-I (Chen et al., 2023). Pangenomic comparisons were executed using Build Pangenome with OrthoMCL - v2.0 in the Kbase platform (Arkin et al., 2018) and with IPGA, web server available at, configured with its default settings (Liu et al., 2022). The contraction and expansion of gene families in C. funkei RVMD1 and reference genomes were analyzed using CAFE5, with the process facilitated through OrthoVenn3 (Sun et al., 2023), accessible at.

3 Results and discussion 3.1 Isolation and 16S rRNA gene sequencing based characterization

In our ongoing exploration of microbial diversity, particularly focusing on the diversity of actinobacteria present in extreme habitats, a bacterium designated RVMD1 was isolated from a desert rock varnish sample. A characterization based on 16S rRNA gene sequencing was initially started. The obtained partial 16S rRNA gene sequence (1,251 bp) of isolate RVMD1, submitted to GenBank with the accession number OR570906, has been analyzed and summarized in Table 1, using the EzBioCloud Pro 16S-ID app (Yoon et al., 2017). This table ranks 10 species based on their sequence similarity to RVMD1, highlighting both the species name and their respective genome group. At the top of the list is C. cellulans from the C. cellulans group, showing the highest similarity to RVMD1 with 99.84% identification. It is followed by C. funkei, also of the same group, with a 99.68% match. Both species are well-known for their genetic similarity (Brown et al., 2006). Other species like C. marinum, C. fucosivorans (from the C. composti group), and C. composti (same group) also display high similarities, ranging from 99.52 to 99.36%. Notably, C. fucosivorans, originally listed in the C. composti group with a 99.44% similarity, was recently identified as a later heterotypic synonym of C. composti, according to Jiang and Han (2023). The list also includes species without specified genome groups like C. protaetiae, C. terreum, Krasilnikoviella muralis, C. arenosum, and K. flava, with similarities spanning from 99.2 to 97.45%. The alignment of the top 16S rRNA gene sequencing matches for RVMD1 is neatly outlined in Supplementary Figure S1. An intricate analysis identified taxa closely related to RVMD1 isolate detailed in Supplementary Table S1, used genes extracted from the whole genome, including 16S rRNA, recA, rplC, and Mash identity. The 16S rRNA gene sequences of isolate RVMD1 were compared between Sanger sequencing (1,251 bp) and whole-genome sequencing (WGS, 1,467 bp). Two minor nucleotide differences were identified (T ↔ C and C ↔ T), as detailed in Supplementary Figure S2.

www.frontiersin.org

Table 1. A summary of top hits from 16S rRNA gene sequencing analysis of bacterial isolate RVMD1 in EzBioCloud Pro 16S-ID app.

The 16S rRNA gene-based mid-point rooted maximum-likelihood phylogenetic tree for isolate RVMD1, constructed with Protologger and referenced against the SILVA Living Tree Project, is illustrated (Figure 1). The tree confirms that RVMD1 belongs to the C. cellulans group, with C. aquatile shown as the closest relative, followed by C. funkei and C. cellulans. However, according to a recent nomenclatural revision, C. aquatile, as identified by Sultanpuram et al. (2015), is now considered a synonym of C. funkei (Nouioui et al., 2018). This means that what was previously thought to be separate species, C. aquatile and C. funkei, are now recognized as the same species, which is reflected in the close clustering of these names in the phylogenetic tree.

www.frontiersin.org

Figure 1. 16S rRNA gene-based mid-point rooted maximum-likelihood phylogenetic tree of C. funkei RVMD1, constructed using Protologger, highlights closest relatives from the SILVA Living Tree Project. The tree was modified using the Interactive Tree Of Life (iTOL) v5. Proportional circles on the branches represent bootstrap values. Species belonging to the Sanguibacteraceae family serve as the outgroup.

3.2 Genomic features and annotations

The integration of data from bioinformatics resources such as MiGA, BV-BRC, NCBI PGAP, GTDB-Tk on KBase, and Galaxy Protologger has yielded a comprehensive overview of the genomic features of the C. funkei RVMD1, as shown in Table 2 and Figure 2. The genome is comprised of 857 contigs with a total length of 4,264,015 base pairs, with an N50 value of 9,300 bp. The largest contig identified measured 36,920 bp nucleotides. Factors such as the starting material, the sequencing technique used, and the quality of the produced sequences impacted the number and lengths of the contigs generated (Schwengers et al., 2020). The G + C content of 74.59% is particularly high, however, the values of genome size and G + C contents fall within the range observed for all recognized strains within the genus Cellulosimicrobium, as documented in Supplementary Table S4. Quality metrics, including completeness (98.55%) and contamination (0.87%), along with consistency scores, coarse consistency of 98.4 and fine consistency of 94.3, suggest a high-quality genome assembly (Supplementary Figures S3, S4). The quality of the genome data exceeds the minimum standards for taxonomic applications, as outlined by Chun et al. (2018) and Riesco and Trujillo (2024). Gene prediction analysis indicates a substantial proteome, with 4,449 proteins predicted, each averaging 288.3805 amino acids in length. This count of proteins is the highest ever recorded among all submitted genomes within this genus (Supplementary Table S4). It is important to note that this elevated protein count may be influenced by assembly inaccuracies, possibly due to the high number of contigs. The coding density stands at 90.27%, suggesting an efficient genomic coding capability. Isolated from a harsh desert environment, the identification of 103 out of 106 essential genes in C. funkei RVMD1, with duplications in some, could underlines its evolutionary adaptations for resilience and survival (Rosconi et al., 2022). Functional analysis uncovered a considerable number of transporters (144), secretion genes (14), and unique enzymes (643).

www.frontiersin.org

Table 2. Comprehensive genomic features of C. funkei isolate RVMD1.

www.frontiersin.org

Figure 2. Circular genome map of C. funkei isolate RVMD1 generated through Proksee web-based tool. The tracks display coding sequences on forward and reverse strands (CDS), tRNAs, rRNAs, GC content, and GC skew (black/gray graph), with antibiotic resistance genes (CARD) and open reading frames (ORF) interspersed, illustrating the detailed genomic structure.

The functional annotation of the genome, illustrated in Sankey diagram (Figure 3A), outlines the allocation of 1,560 genes into 238 subsystems, with a pronounced emphasis on metabolic functions (643 genes across 80 subsystems), energy (267 genes in 28 subsystems), protein processing (228 genes in 40 subsystems), DNA processing (105 genes in 16 subsystems), and notably, stress response defense virulence (113 genes in 27 subsystems), supporting the RVMD1 isolate adaptation to its extreme habitat. The annotation revealing 1,717 hypothetical proteins and 2,794 with functional assignments underscores the vast unknown in microbial genetics and the potential for groundbreaking discoveries (Figure 3B). This insight highlights the challenge of functionally uncharacterized genes in microbial systems and points to significant avenues for future exploratory research (Wirbel et al., 2024). Among the proteins with functional assignments were 981 with Enzyme Commission (EC) numbers (Schomburg et al., 2004), 845 with Gene Ontology (GO) assignments (Ashburner et al., 2000), and 743 mapped to KEGG pathways (Kanehisa et al., 2016). Additionally, the PATRIC annotation revealed 4,089 proteins belonging to genus-specific protein families (PLFams) and 4,216 proteins associated with cross-genus protein families (PGFams) (Davis et al., 2016).

www.frontiersin.org

Figure 3. Functional characterization of the C. funkei RVMD1 genome. (A) Sankey diagram illustrating the distribution of genes into subsystems classified by biological functions. (B) Summary of protein annotations detailing cross-genus and genus-specific family assignments, pathway connections, GO categorizations, EC number designations, and other functional annotations, along with hypothetical proteins, all as annotated by BV-BRC.

Encouraged by reports that species such as C. cellulans and C. funkei within the genus Cellulosimicrobium have been documented as pathogenic to humans, with 15 cases of Cellulosimicrobium sp. bacteremia reported (Rivero et al., 2019; Le Ho et al., 2024), virulence factors were investigated in the RVMD1 genome. The results are shown in Figure 4A, which illustrates chloramphenicol resistance gene clusters demonstrating conservation across species including Corynebacterium jeikeium and Streptomyces griseus. Figure 4B presents rifampin resistance genes, with colored blocks indicating specific gene functions, involving a variety of bacterial species. Figure 4C highlights siderophore transport system gene clusters, featuring ATPase and permease components, across species such as Corynebacterium glutamicum and Kineococcus radiotolerans. Table 3 details the aforementioned antibiotic resistance determinants, including a chloramphenicol resistance gene (cml) with accession WP_005297378.1, located on contig 241 at coordinates 4,408–5,592 in the forward direction, displaying 63.78% identity and 99.74% reference coverage. Similarly, a rifamycin resistance gene (arr) with accession WP_063857695.1 was identified on contig 77, from 9,294 to 9,704, also in the forward direction, with 51.16% identity and 84% reference coverage. The presence of full-length genes associated with antimicrobial resistance (AMR) in a genome should not be regarded as conclusive evidence of a resistant phenotype (Elarabi et al., 2023). Supplementary Tables S2, S3 outline in detail the potential virulence and antimicrobial resistance genes in the studied genome. Supplementary Table S2 classifies virulence factor genes identified through TrueBac™ and VFDB, showing gene similarity to known sequences, while Supplementary Table S3 details AMR genes detected using the Genome Annotation Service in PATRIC (BV-BRC) with a k-mer-based method. While the findings were sourced from TrueBac™ and (BV-BRC), independent validation of these results was not possible by our team.

www.frontiersin.org

Figure 4. Gene clusters and loci associated with antibiotic resistance and iron acquisition systems in C. funkei RVMD1 in comparison to other bacteria, as characterized by the RAST SEED viewer. The graphic is centered on the focus gene, which is red and numbered (1). (A) Presents chloramphenicol resistance genes. (B) Displays rifampin resistance genes. (C) Depicts siderophore transport system genes with ATPase components (red) and permease components (green). Conserved regions across species are boxed.

www.frontiersin.org

Table 3. Antibiotic resistance determinants in bacterial isolate RVMD1 as identified by whole genome analysis using EzBiome Genome-ID.

3.3 Whole genome-based taxonomic placement

Building upon the initial findings that suggested the RVMD1 genome belonged to the genus Cellulosimicrobium based on the 16S rRNA gene sequence, a more comprehensive investigation was undertaken through whole-genome analysis. This was due to increasing evidence which suggests that the 16S rRNA gene sequence offers limited resolution in distinguishing between closely related species within certain genera (Nouioui et al., 2018; Wu et al., 2020; Rahi et al., 2024). This subsequent analysis employed EzBiome Genome-ID (Ha et al., 2019), a web-based commercial service, to leverage Average Nucleotide Identity (ANI) for in-depth comparison with a reference database. This approach aimed to definitively elucidate and confirm the accurate taxonomic position of isolate RVMD1. As shown in Table 4, this crucial step identified C. funkei and C. cellulans, both members of the C. cellulans group, as the closest relatives. The ANI results revealed C. funkei as the top match, exhibiting a nucleotide identity of 97.71%, along with query and reference coverage of 66.43 and 58.20%, respectively. C. cellulans closely followed, with an identity of 95.70%, and query and reference coverage values of 68.82 and 58.95%, respectively. These findings, surpassing the species boundary threshold of 94.0–96.0% (Chun et al., 2018), indicate a significant genetic closeness and suggest that the organisms belong to the same species (Aviles and Kyndt, 2021). While the remaining top hits (C. protaetiae, C. fucosivorans, C. composti, C. marinum, C. terreum, C. arenosum, Isoptericola jiangsuensis and I. dokdonensis) scored ANI values from 88.99 to 79.48, which are below the threshold boundary for species delineation. This confirms that our isolate is a species from C. cellulans group and is closely related to the funkei species; therefore, we designated it as C. funkei RVMD1. These results also support the latest reclassification that C. fucosivorans and C. composti are the same species (Jiang and Han, 2023).

www.frontiersin.org

Table 4. Top average nucleotide identity (ANI) hits for bacterial isolate RVMD1 whole genome: a comparative analysis of closely related species using EzBiome Genome-ID.

On the other hand, the taxonomic resolution analysis employing Average Amino-acid Identity (AAI) (Figure 5), conducted on C. funkei RVMD1 against the UniProt database using AAI-Profiler, also revealed a high proteomic similarity between RVMD1 and both C. funkei and C. aquatile, with respective similarities of 97.6 and 98.1%. These values exceed the species delineation threshold, further supporting the conclusion drawn from the aforementioned ANI analysis that our isolate is C. funkei. Notably, C. aquatile is no longer considered a separate species; it is now recognized as a synonym of C. funkei, as described by Nouioui et al. (2018). The AAI-Profiler web server was used to generate a similarity matrix heatmap for essential genes extracted by the Microbial Genomes Atlas (MiGA) from the C. funkei RVMD1 genome, comparing them against the UniProt database, which substantiated that the top related species is, as previously concluded, C. funkei (Supplementary Figure S5).

www.frontiersin.org

Figure 5. Proteomic analysis of C. funkei RVMD1 using AAI-Profiler. (A) AAI-Profiler scatterplot analysis of C. funkei RVMD1 proteome including distribution of sequence identity matches for proteins against the UniProt database. (B) AAI histograms of proteome similarity between C. funkei RVMD1 and its closest UniProt matches, C. funkei and C. aquatile.

3.4 Genomic insights challenge current systematics within the Cellulosimicrobium genus

Our investigation into the taxonomic position of our isolate revealed significant uncertainties surrounding the classification of existing genomes within the Cellulosimicrobium genus, particularly in light of recent whole-genome sequencing data (Nouioui et al., 2018). This observation aligns well with the many reclassifications documented in recent literature (Nouioui et al., 2018; Jiang and Han, 2023). It also supports the observation that this genus exhibited significant heterogeneity, with only a few well-defined species (Aviles and Kyndt, 2021), highlighting the limitations of the current framework. As shown in Supplementary Table S4, around 42% (18/43) of the analyzed genomes are assigned as C. cellulans species, while a significant portion (33%) remains unclassified.

In response, initiation of a thorough phylogenomic study was deemed necessary to gain clearer insights into the taxonomic place of the isolate within the genus hierarchy, specifically targeting the resolution of unclear species boundaries across the genus. The employed approach, incorporating the latest genomic data, has the potential to yield a more accurate and robust taxonomic distribution with refined positions for all genomes in Cellulosimicrobium genus. The analysis focused on 43 genomes retrieved from the Cellulosimicrobium genus within NCBI RefSeq as of September 2023 (excluding atypical genomes). To address the observed classification uncertainties, we employed a multifaceted approach encompassing GBDP (Genome BLAST Distance Phylogeny) tree analysis at the genome scale through the Type (Strain) Genome Server (TYGS) (Meier-Kolthoff et al., 2013; Meier-Kolthoff et al., 2022), alongside Genome-to-Genome Distance Calculator (GGDC) analysis, implementing a 70% species delineation threshold. Additionally, the study included the computation of pairwise average nucleotide identity (ANI) using BLAST (ANIb) via JSpeciesWS, and the determination of ANI/AAI (Average Amino Acid Identity) matrix values through the enveomics collection (Rodriguez-R and Konstantinidis, 2016).

As indicated in Supplementary Table S4 and Figures 6, 7, C. funkei RVMD1 has a typical genome size for the genus (4,264,015 base pairs). Surprisingly, its protein counts of 4,449 greatly exceeds the genus average. This suggests a high coding density, as confirmed by the 90.27% value found in Table 2, according to MiGA (Rodriguez-R et al., 2018). Within the phylogenomic analysis, the genomes are categorized into 16 species clusters. Cluster 1 is of particular interest as it is linked to the C. cellulans group and includes the C. funkei RVMD1 isolate. This cluster is the most extensive, encompassing a variety of strains: C. sp. XJ-DQ-B-000, C. cellulans strain NEB113, C. cellulans strain NBRC 103059, C. cellulans strain ATCC 21606, C. sp. 72–3, C. sp. TH-20, C. funkei strain P112, C. aquatile strain 3 bp, an uncultured C. sp. isolate SRR6216767 MAG genomic, C. sp. TH-20 strain DE0020, C. sp. SL-1, C. sp. KWT-B, C. sp. JZ28, C. sp. TH-20 strain DE0282, C. funkei strain JCM 14302, C. aquatile strain WB02 D5 03, C. cellulans ORNL-0100, and C. funkei NBRC 104118. This cluster encompasses the majority of the genomes, indicating a close genetic relationship among the strains within cluster 1, and is likely indicative of their belonging to the same species group, a conclusion that is supported by ANIb values for all members of the cluster when compared with C. funkei RVMD1, which range from 97.5 to 98.49% (Chun et al., 2018). It is clear from the phylogenetic analysis that the most genetically close strain to C. funkei RVMD1 is C. sp. JZ28, a root endophytic bacterium from the desert plant Panicum turgidum, with genes for stress resistance, volatile production, biocontrol, and biopolymer breakdown (Eida et al., 2020).

www.frontiersin.org

Figure 6. Comparison of genome size and protein counts in 43 Cellulosimicrobium genomes and C. funkei RVMD1 based on TYGS analysis.

www.frontiersin.org

Figure 7. The genome tree inferred with FastME 2.1.6.1 from GBDP distances calculated from genome sequences. The branch lengths are scaled in terms of GBDP distance formula d5. The numbers above branches are GBDP pseudo-bootstrap support values >60% from 100 replications, with an average branch support of 96.1%. Rooted at the midpoint, this analysis includes 43 genomes from the Cellulosimicrobium genus, along with our isolate, C. funkei RVMD1, as referenced in NCBI. The color strips indicate the ANIb values for each species compared to our isolate, C. funkei RVMD1.

Clusters 2 through 16 display a more streamlined composition compared to cluster 1, with clusters 4, 6, 7, 8, 9, 11, 12, 13, 14, 15, and 16 each uniquely containing a single strain. This distinct arrangement underscores the unique genetic differences that each of these strains has from others within the genus. In contrast, clusters 2, 3, 5, and 10 are characterized by multiple strains. Cluster 2 is a diverse group, including C. cellulans isolate CTOTU50488, C. cellulans F16, C. funkei strain U11, and C. sp. I38E. Cluster 3 is home to two strains of C. cellulans, namely NBRC 15516 and ZKA17, cluster 5 combines C. cellulans strain DSM 20106 with strain PSBB019. Cluster 10 is comprised of two strains of C. composti strains SE3 and BIT-GX5.

Moreover, a remarkable tight genomic consistency in G + C content within species of the same cluster is observed, with variations not exceeding 1%. This observation is in line with the findings of Meier-Kolthoff et al. (2014), which reported that within-species differences in G + C content are typically less than 1%. Such consistency in G + C content further supports the clustering method utilized by TYGS as an effective tool for identifying genetically coherent groups within the Cellulosimicrobium genus.

The ANI/AAI matrix heatmap (Figure 8), providing all-vs-all genomic distance estimates, strongly supports the clustering and topologies seen in the Cellulosimicrobium phylogenomic tree. Both analyses use ANI and AAI metrics for comprehensive similarity assessment. This agreement is visually clear the 43 species, including our isolate, form 16 cluster. Strains closely grouped on the tree share higher ANI/AAI values (blue on the heatmap), while those more distant shift toward red, further confirming their phylogenetic separation. ANIb and dDDH analyses (Figure 9) reveal significant genomic variation within the Cellulosimicrobium genus. Strains within cluster 1 (including the C. cellulans group) exhibit high ANIb (>95%) and dDDH (>70%) values (Chun et al., 2018), indicating strong similarity and justifying their shared species cluster. Other clusters show lower values, highlighting their distinction from cluster 1 and the differences between C. funkei RVMD1 and the C. cellulans group. This variation underscores the need to carefully analyze species boundaries and potentially revise classification criteria within the genus.

www.frontiersin.org

Figure 8. ANI/AAI matrix heatmap of Cellulosimicrobium strains with hierarchical clustering. This heatmap represents the all-vs-all comparison of Average Nucleotide Identity (ANI) and Average Amino Acid Identity (AAI) among various strains of Cellulosimicrobium, including C. funkei RVMD1. The color gradient from blue to red indicates the degree of genomic similarity, with blue representing higher similarity (closer species) and red indicating lower similarity. The dendrogram on both the x and y axes classify the strains based on genomic relatedness.

www.frontiersin.org

Figure 9. Delineating C. funkei RVMD1 among 43 genomes from the Cellulosimicrobium genus. (A) The digital DNA–DNA hybridization (dDDH) values were calculated using Genome-to-Genome Distance Calculator (GGDC) analysis, utilizing a 70% species delineation threshold. (B) The Average Nucleotide Identity by BLAST (ANIb) analysis, with a (>95%) species delineation threshold. (C) A scatter plot of pairwise digital DNA–DNA hybridization (dDDH) vs. ANIb values for C. funkei RVMD1 against the 43 genomes highlights matches that exceed the species thresholds, clarifying the taxonomic standing of C. funkei RVMD1.

The aforementioned analyses make it clear that studied C. cellulans strains (18/43) are distributed across various species clusters, suggesting potential inaccuracies in current taxonomy assignments and raising the possibility that some strains may represent divergent species. In response to this observation, we conducted all-vs-all OrthoANI and ANIb analyses among all C. cellulans species.

Figure 10, Supplementary Figure S6 and Supplementary Table S5, demonstrate a clear division of C. cellulans into nine different genomospecies, indicating a shift from a singular species classification to potentially up to nine distinct species. This is particularly evident when examining the pairwise ANIb indices against the reference strain ORNL-0100. Strains including NEB113, NBRC 103059, ATCC 21606, LMG 16121, 1C2-5, and 1B1-1 show ANIb values that exceed the 95–96% threshold for species designation (Chun et al., 2018; Wu et al., 2020), ranging from 96.72 to 97.42%. Their close proximity to the reference strain confirms that only these strains likely should be designated as “cellulans.” Our findings support the results of a similar comparative study, although that study examined fewer species (Aviles and Kyndt, 2021). The remaining strains, including NBRC 15516 and ZKA17, DSM 20106 and PSBB019, isolate CTOTU50488 and F16, 2020WEIHUA, MP1, PW, DE0111, and JZ5, form eight distinct genomospecies. These clusters exhibit ANIb values, much lower than the threshold, ranging from 75.24 to 88.71% when compared to both the reference strain and each other (Meier-Kolthoff et al., 2013). This strongly supports the hypothesis of species misidentification and suggests the need for taxonomic revision within the Cellulosimicrobium genus. This revision could potentially double the number of official Cellulosimicrobium species.

www.frontiersin.org

Figure 10. All-vs-all ANIb heatmap of all C. cellulans species from different species clusters.

3.5 Comparative phylogenomic analysis

As mentioned previously, the Cellulosimicrobium genus contains six valid, distinct species. Each species has a designated reference strain: C. protaetiae BI34, C. cellulans ORNL-0100, C. funkei JCM 14302, C. marinum NBRC 110994, C. composti SE3 and C. arenosum KCTC 49039. For in-depth genome analysis, these reference strains were chosen to facilitate a comparative study with our isolate, C. funkei RVMD1. As anticipated, OGRI metrics (OrthoANI, ANI, AAI, and dDDH), illustrated in Figures 11A,B, confirm the systematic position of RVMD1 as a species within the C. cellulans group. Of note, C. cellulans ORNL-0100 and C. funkei JCM 14302 demonstrated the highest OrthoANI, ANI, AAI, and dDDH values with C. funkei RVMD1, exceeding the species cutoff delineation (Chun et al., 2018). OrthoANI offers a reliable and faster way to determine average nucleotide identity (ANI) for classifying organisms. It strongly aligns with results from traditional ANI methods (using BLASTn) (Lee et al., 2016). In contrast, other reference species demonstrated significantly lower OrthoANI (81.88–88.53%), ANI (82.83–88.57%), AAI (77.43–89.03%), and dDDH (30.50–58.40%) values. To further assess genome similarity, a detailed whole-genome similarity analysis was conducted using the FastANI 1.3.3 tool, as shown in Figure 11C. This analysis provides a comprehensive visualization of genomic similarities between C. funkei RVMD1 and the reference strains. Each red line in the figure represents reciprocal mappings, indicating conserved sequences and evolutionary relationships, with varying line intensity corresponding to different ANI values. Orthologous matches (e.g., 962/1,042 with C. funkei JCM 14302 and 729/1,042 with C. arenosum KCTC 49039) reflect the degree of genomic similarity. Consistent with the OGRI results, C. cellulans ORNL-0100 and C. funkei JCM 14302 exhibited the highest degree of similarity, supporting their close relationship with our isolate, C. funkei RVMD1, while the other reference species showed more divergence. Here, the calculated OGRI values and whole-genome similarity analysis based on the WGS demonstrate that species boundaries are distinctly observed between all reference species and our isolate C. funkei RVMD1, except for C. cellulans ORNL-0100 and C. funkei JCM 14302, potentially necessitating their reclassificat

留言 (0)

沒有登入
gif