Long-term persistence of diverse clones shapes the transmission landscape of invasive Listeria monocytogenes

The clinical Lm population is genetically diverse

We retrieved high-quality short-read genome sequences of 936 Lm isolates collected from human clinical samples in 58 counties across the State of New York (Fig. 1A; Additional file 1: Table S1; Additional file 2: Fig. S1). The counties with the highest number of isolates were Nassau (n = 133), Suffolk (n = 124), Westchester (n = 85), Erie (n = 68), and Monroe (n = 48), which altogether made up 48.93% of the entire dataset. Four counties were not represented in the population owing to limited or lack of samples submitted by healthcare providers from those counties. The dataset included isolates sampled from 2000 to 2021 (mean = 43 isolates per year; range = 21–62) (Fig. 1B; Additional file 2: Fig. S4). An isolate from 1987 was also included in the dataset for historical comparison. Most of the isolates were derived from blood (n = 760; 81.19%) and cerebrospinal fluid (n = 109; 11.64%) (Additional file 2: Fig. S5).

Fig. 1figure 1

Distribution of the 936 clinical Lm in New York. A A map of the USA showing the location of New York (colored in purple) and the counties in New York State. Counties where Lm was sampled in this study are colored according to the number of isolates. New York State has a total area of 54,556 sq. miles (141,300 km2), a maximum length of 330 miles (530 km), and a maximum width of 285 mi (455 km). B Number of Lm isolates according to lineages per year. C Midpoint-rooted maximum likelihood phylogenetic tree based on the sequence alignment of 2421 core genes. Tree scale represents the number of nucleotide substitutions per site. Colored branches represent the four lineages I–IV. Colored outer rings representing the clonal complex (CC) and sequence types (STs) show only those with ≥ 10 representative genomes. CCs and STs with ≤ 10 representative genomes are denoted as others. D Number of isolates per major ST. E Number of isolates per major CC

The New York Lm population was derived from different genetic groups (Fig. 1C; Additional file 1: Table S3). Classification schemes based on the 7-gene MLST [7] revealed a total of 200 STs and 89 known CCs. The most frequent STs in this study were ST1 (122 genomes; 13.03%), ST6 (58 genomes; 6.19%), ST5 (52 genomes; 5.55%), and ST217 (43 genomes; 4.59%) (Fig. 1D). The most common CCs were CC1, CC6, and CC5 (Fig. 1E). All known four monophyletic lineages of Lm I, II, III, and IV [8, 47] were detected in our study, comprising 584, 306, 43, and 3 genomes, respectively. Genomes belonging to lineage I consisted of 62 STs and 29 known CCs, whereas 97 STs and 50 CCs made up lineage II. Lineage III consisted of 38 STs and 10 known CCs. Lastly, the three genomes in lineage IV represented three STs (ST3131, ST3163, and ST3173). Lineages I and II were detected every year throughout the sampling period (Fig. 1B). The two major lineages I and II have widespread distribution across New York State and were detected in 54 and 53 counties, respectively.

Variation in the somatic (O) and flagellar (H) antigens is also used to distinguish Lm because serogroup designation tends to be associated with virulence potential [48]. In silico prediction of PCR-serogroups showed genomes from this study belonged to six previously known PCR-serogroups (IIa, IIb, IIc, IVb, IVb-v1, L). PCR-serogroups were differentially distributed across the Lm phylogeny. PCR-serogroups IIa, IIb, and IIc were detected in 290, 127, and 5 genomes, respectively, whereas PCR-serogroups IVb, IVb-v1, and L were detected in 403, 45, and 36 genomes, respectively. We detected a total of 30 genomes with no match in the BIGSdb Listeria PCR-serogroup database [24] (Fig. 1C; Additional file 1:Table S3B). The unknown serogroups belonged to genomes in lineages I (n = 8 genomes), II (n = 11 genomes), and III (n = 11 genomes).

Lm has multiple AMR, some of which are carried by mobile genetic elements

The three lineages (I, II, III) differ in their accessory gene content (p < 0.0001, Wilcoxon rank sum test; Additional file 2: Fig. S6). We detected a total of 12 AMR genes representing eight antimicrobial classes in the entire Lm population (Additional file 1: Table S4). The most prevalent were the intrinsic AMR genes fosX, norB, sul, and lin (Fig. 2A). The first three genes were detected in all genomes, whereas the lincosamide resistance gene lin was present in 935 (99.89%) genomes. Acquired AMR genes in this study were detected at much lower frequencies. These included genes conferring resistance to tetracycline (either tetM or tetS present in five genomes from ST5, ST59, ST199, ST1039, and ST2928 recovered from blood; lineages I and II), macrolide-lincosamide-streptogramin B (ermG in two ST2 genomes from cerebrospinal fluid; lineage I), macrolide (msrD and mefA in three ST2 genomes from cerebrospinal fluid, n = 2 and blood, n = 1; lineage I), biocides (emrC in one from ST8 from blood; lineage II), and aminoglycosides (aacA4 in one ST155 genome from blood; lineage II). No significant difference in the number of AMR genes per genome was detected among lineages I–III (all p values > 0.05 for every pair of lineages, Wilcoxon rank sum test).

Fig. 2figure 2

Antimicrobial resistance (AMR) and mobile genetic elements in clinical Lm. A Midpoint-rooted maximum likelihood phylogenetic tree showing the four Lm lineages and the distribution of AMR genes, plasmid replicons, Listeria genomic islands (LGI), and Listeria pathogenicity islands (LIP). Colored blocks represent the presents of these genetic elements. The tree is identical to that in Fig. 1B. Comparison of phage count (B) and phage coverage (C) among the four lineages (B). For panels B and C, the boxplot shows the 25th, 50th, and 75th percentiles and black dots show data points outside the interquartile range. D Comparison of the number of genomes per lineage that harbor AMR genes in phage and plasmids. The size of the circles is proportional to the number of genomes

Mobile genetic elements contribute considerable diversity and functionality in bacterial cells, including the mobility of AMR genes [49]. First, we took a closer look at the flanking DNA of acquired AMR genes detected in this study to better understand their genetic environment. The tetracycline resistance genes tetM and tetS were flanked by conjugal transfer proteins and DNA-binding proteins (Additional file 2: Fig. S7). The genetic environment of contigs (> 120 kbp) harboring ermG, msrD, and mefA were frequently flanked by phage-like structures and were further identified as phage genetic material (Additional file 1: Table S5).

We sought to determine the presence and diversity of putative plasmids and phages. A total of 12 plasmid replicon types were detected (Additional file 1: Table S4). At least one plasmid replicon was identified in 171 genomes (or 18.26% of the population), of which 98, 70, and three genomes came from lineages I, II, and II, respectively. The genome with the highest number of plasmid replicons (n = 4) was detected in an ST5 isolate from a blood sample in Onondaga County (Accession no. SRR5451748). Another genome harbored three plasmid replicons (ST371) and was isolated from a blood sample in Nassau County. The most frequently detected plasmid replicon types were rep25_2_M640p00130(J1776plasmid) (lineage I = 62 genomes, lineage II = 20), rep26_2_repA(pLGUG1) (lineage I = 28, lineage II = 22), and rep26_4_repA(pLM 5578) (lineage I = 3, lineage II = 26). We found significant differences in the number of plasmids per genome between lineages I and II (p value = 0.027) and between lineages II and III (0.016), but not between lineages I and III (0.088) (Wilcoxon rank sum test).

At least one phage DNA element was detected in 919 genomes (Additional file 1: Table S5). All four lineages contained phage DNA, with a higher number of phage DNA elements per genome detected in lineages I and II (Fig. 2B). We found significant differences in the number of phage DNA per genome between lineages I and II (p = 0.026), lineages II and III (p = 7.2 e − 07), and lineages I and III (p = 3.1 e − 05) (Wilcoxon rank sum test). Because phage DNA may occupy a substantial portion of the bacterial chromosome [50], we estimated the combined sizes of all phage DNA per genome. Isolates from the four lineages contain total phage DNA of less than 5% (median) of their genome (Fig. 2C). Two ST5 isolates (accession no. SRR14404494 and SRR14214577; lineage I) with eight and five identified phage elements harbored the largest combined phage DNA per genome of 576 kbp and 484 kbp, respectively. These were recovered from blood samples in Erie and Suffolk counties, respectively. An ST9 isolate (accession no. SRR3277646; lineage II) obtained from a blood sample in Suffolk also carried 466 kbp of total phage DNA.

We next screened the putative plasmid and phage DNA for the presence of genes conferring resistance to antimicrobials, biocides, and heavy metals. We identified four AMR genes in phage DNA (Fig. 2D; Additional file 1: Table S5). Phage-associated fosX was the most frequently detected, which we identified in isolates belonging to lineage I (n = 33 genomes from eight STs) and lineage II (n = 3 genomes from three STs). Other resistance genes that we identified in our study such as ermG (two genomes), msrD (three genomes), and mefA (three genomes) were detected in phage DNA. Also present but less commonly detected were the heavy metal resistance genes arsABDD2R (arsenic), cadAC (cadmium), regulatory proteins encoded by merR1 and merR2 (mercury), and biocide resistance genes bcrBC. We were able to reconstruct putative plasmids from 152 genome assemblies, which were subsequently used as templates to predict the presence of the AMR genes they carry (Additional file 1: Table S6). The putative plasmids carried the AMR genes lin (n = 45 genomes) and emrC (one genome). The lin gene (lincosamide resistance) is predominantly chromosome-borne in Lm [51, 52]; however, we identified an Lm plasmid (NCBI accession number: NZ_LR134399.1) isolated from human blood harboring the lin gene. Mechanisms surrounding its mobility are unclear and require further investigation. In this study, some plasmids were also associated with genes conferring resistance to arsenic (arsBCR) in two genomes and cadmium (cadC in 83 genomes) as well as the disinfectant tolerance genes bcrBC in 51 genomes.

Lm lineages have multiple genomic and pathogenicity islands

Genomic islands are large syntenic blocks of genes that are integrated into the bacterial chromosome, often carrying genes conferring a selective advantage for the host bacterium and can be mobilized via horizontal gene transfer [36, 53]. We identified Listeria genomic islands LGI-1 (present in two lineage II genomes) and LGI-2 (present in 93 genomes belonging to lineage I [n = 82] and lineage II [n = 11]) (Fig. 2A, Additional file 1: Table S7).

Pathogenicity islands are a subset of genomic islands that carry virulence determinants and promote an infection cycle to enable the invasion of host cells, evasion of host’s defenses through phagocytosis, and dissemination to nearby cells to re-initiate the infection cycle [36, 53]. We detected the Listeria pathogenicity islands LIPI-1, LIPI-3, and LIPI-4 in our dataset (Additional file 1: Table S8). LIPI-3 and LIPI-4, which are associated with hypervirulence, were detected in 435 and 195 genomes, respectively (Fig. 2A). LIPI-3 encodes listeriolysin S that functions both as a bacteriocin and hemolytic cytotoxic factor [54]. Previous studies report that LIPI-3 is commonly associated with epidemic outbreaks and is reported to be present primarily in lineage I and only in certain serogroups (I/IIb and IVb) [9, 10, 55]. The presence of LIPI-3 and LIPI-4 in lineage II genomes is rarely reported. In our dataset, we detected LIPI-3 in lineages I (n = 414 genomes), II (n = 7 genomes), and III (n = 6 genomes) spanning multiple serogroups and STs. LIPI-3 is present in lineage II genomes belonging to ST380 (CC380; 2 genomes) and one genome each representing ST938, ST1867, ST1921, and ST3175 and are all members of CC938; and ST768 (CC768). LIPI-3 gene clusters were present on large chromosomal contigs (> 1.19 Mbp) in these genomes except in ST768 (~ 22 kbp).

LIPI-4 is a cluster of six genes implicated in neurological and placenta infections [10]. We detected LIPI-4 in all four lineages in our dataset (n = 170, 2, 20, and 3 genomes in lineages I–IV, respectively). LIPI-4 gene clusters were detected on 113 kbp and 548 kbp contigs belonging to ST1072 (SL1072) and ST1864 (CC1864, SL1864) in genomes isolated from Albany and New York counties, respectively. Other virulence genes of various functions were distributed across the breadth of the Lm phylogeny and among the four lineages (Additional file 1: Table S8; Additional file 2: Fig. S8).

Geographical dissemination of epidemiologically linked Lm isolates in New York

Previous molecular studies of Lm established a threshold of ≤ 20 SNPs in a core genome alignment to define epidemiological linkages [41,42,43]. First, we used this threshold to determine the impact of geographical location to the genetic relationships of Lm isolates. The core genetic distance between every pair of isolates in lineage I is significantly higher between isolates from different counties than between isolates from the same county. Similar results were observed in lineage II (Additional file 2: Fig. S9; p < 0.0001 for both lineages I and II, Wilcoxon rank sum test).

Based on previously described criteria for cluster identification using SNP thresholds in the core genome alignment (see methods), we identified 23 and 14 core genome SNP clusters in lineages I and II, respectively (Figs. 3 and 4), and a single cluster in lineage III (Additional file 2: Fig. S13). In lineage I, five core genome SNP clusters (labeled 2, 3, 9, 16, and 18 in Fig. 3) corresponded to previously reported multistate outbreaks from the CDC PulseNet program, a national laboratory surveillance network of foodborne diseases [18]. Cryptic outbreaks, undetected transmission events, and shared contamination sources likely explain the remaining 18 core SNP clusters in lineage I. We identified 20 persistent clusters (from STs 1, 2, 4, 5, 6, 217, 382, and 55) and three outbreak clusters (from STs 1 and 5). Here, we highlight a few notable sequence clusters in lineage I (Fig. 3). Cluster 3 consisted of isolates from blood sampled in 2001 (n = 13) and 2004 (n = 1) that spanned seven counties on both eastern and western parts of New York State (approximately 191 miles or 468 km). They belonged to ST6 (cgMLST CT12957 and serogroup IVb). Pairwise SNP difference between genomes ranged from 0 to 3. Within this cluster, 10 genomes were reported by PulseNet to be associated in a multi-state outbreak. All genomes from this cluster harbored the pathogenicity island LIPI-3. The presence of an isolate collected in 2004 with identical genetic characteristics suggests the multi-year persistence of this cluster. The largest core SNP cluster in lineage I was cluster 16 (n = 35 genomes) with pairwise core SNP difference ranging from 2 to 20. Isolates were derived from multiple body sources (blood = 27, cerebrospinal fluid = 5, placenta = 1, others = 2) between 2000 and 2021 from 24 counties. This cluster belonged to CC1, ST217, and serogroup IVb. All genomes harbor the pathogenicity islands LIPI-3 and LIPI-4. A total of 13 genomes in this cluster were reported by PulseNet to be associated with a multi-state outbreak.

Fig. 3figure 3

Phylogenetic relationship and core genome SNP clusters in Lm lineage I. A Maximum likelihood phylogenetic tree of Lineage I based on sequence alignment of 2610 core genes. The columns of colored blocks next to the tree show the clonal complexes (CC), sequence types (ST), and year of isolation. Outbreaks reported by PulseNet (PN_Outbreak) are represented by pink arrows, while clusters defined using a threshold of ≤ 20 core single nucleotide polymorphisms (SNP) are represented by a blue bar and numbered 1–23 (CG_Clusters). CCs and STs with ≥ 10 representative genomes are colored, whereas those with ≤ 10 representative genomes are denoted as others. B Minimal spanning grape trees representing select core genome SNP clusters colored by county of isolation. The scale represents the number of SNPs and the length of the scale is proportional to the number of SNP differences. The number in brackets next to the county name indicates the number of genomes

Fig. 4figure 4

Phylogenetic relationship and core genome SNP clusters in Lm lineage II. A Maximum likelihood phylogenetic tree of lineage II based on sequence alignment of 2585 core genes. The columns of colored blocks next to the tree show the clonal complexes (CC), sequence types (ST), and year of isolation. Outbreaks reported by PulseNet (PN_Outbreak) are represented by pink arrows, while clusters defined using a threshold of ≤ 20 core single nucleotide polymorphisms (SNP) are represented by a blue bar and numbered 1–14 (CG_Clusters). CCs and STs with ≥ 10 representative genomes are colored, whereas those with ≤ 10 representative genomes are denoted as others. B Minimal spanning grape trees representing select core genome SNP clusters colored by county of isolation. The scale represents the number of SNPs and the length of the scale is proportional to the number of SNP differences. The number in brackets next to the county name indicates the number of genomes

In lineage II, three core genome SNP clusters (labeled 3, 10, 13) corresponded to outbreaks reported by PulseNet and were also part of multi-state outbreaks (Fig. 4; Additional file 1: Table S10). All 14 core SNP clusters in lineage II persisted for ≥ 6 months and included members of STs 7, 11, 21, 29, 155, 204, 321, 360, 378, 573, and 635. LIPI-3 and LIPI-4 were not detected in the genomes from these clusters. Similar to lineage I clusters, there were clusters in lineage II that also spanned multiple geographically distant counties from across the entire length of the State and were detected for many years.

In lineage III, we identified one core genome SNP cluster consisting of three isolates derived from blood (n = 2) and cerebrospinal fluid (n = 1) in Suffolk County between January and March 2010 (Additional file 2: Fig. S13). Genomes in this cluster were identical (i.e., zero SNPs apart), belonged to ST3171 (cgMLST type CT13941, serogroup L), and harbored LIPI-3 and LIPI-4.

We also sought to determine if the distribution of Lm isolates in New York is associated with the distance between counties of isolation. We carried out a Mantel test of pairwise Lm genetic distances (based on pairwise core genome SNPs) and geographical distances between counties. When considering the entire Lm dataset, results revealed a significant but very weak correlation between genetic and geographic distances (R = 0.03612, p = 0.006) (Additional file 2: Fig. S14). We also carried out a Mantel test for only the outbreak genomes identified in lineages I and II (i.e., genomes labeled as CG in Figs. 3 and 4). We detected a significant but very weak correlation between genetic and geographical distances in lineage II (R = 0.1238, p = 0.004), but not in lineage I (R =  − 0.07276, p = 0.984) (Additional file 2: Fig. S15).

Overall, these results show that invasive Lm associated with disease transmission and outbreaks were derived from multiple genetic backgrounds in lineages I–III. Many of the Lm clones that have epidemiological linkages can persist over many years and traverse geographically distant sites.

留言 (0)

沒有登入
gif