A large-scale genomic snapshot of Klebsiella spp. isolates in Northern Italy reveals limited transmission between clinical and non-clinical settings

Sequencing, species assignments and phylogenetic analysis

After quality control, 3,483 high-quality read sets and assemblies were retained: 2,796 from diverse sources recovered using Simmons Citrate Agar with Inositol (SCAI) media and 687 from ongoing clinical surveillance projects (summaries in Extended Data Fig. 1 and Supplementary Tables 1 and 2). Full isolate metadata, including species and lineage assignments, source, genotypic and phenotypic resistance data and phylogenetic trees are available for download from the Microreact project (https://microreact.org/project/KLEBPAVIA). A summary of the main metadata fields used in the Microreact project is provided in Supplementary Table 3. Figure 1 provides a summary of the sampling. A summary of the species assignments and sources of the 3,482 sequenced Klebsiella isolates and phylogenetic trees are given in Fig. 2. We used three-letter species abbreviations throughout this article; these are provided in the main text and the legend to Fig. 2.

Fig. 1: Summary of the sampling effort.figure 1

a, Geographical summary of the whole sampling area. b, More detail of the region around the city of Pavia as highlighted by the red box in a. The size of each point indicates the number of samples and the colours represent the source. c, Timeline of the sampling effort broken down by source. Further details are hidden to preserve anonymity.

Fig. 2: Phylogenetic tree with metadata and sample and source distributions.figure 2

a, Maximum-likelihood phylogenetic tree constructed from core genes, coloured by species, with the SPECs shown. Only one isolate from each species is shown as this tree is intended to show the distances between species. b, Neighbour-joining phylogenetic tree constructed from pairwise Mash distances between all isolates, coloured by species, with the SPECs shown. The metadata rings show sources (inner rings) and resistance and virulence scores (outer rings). c, Bar plot showing the number of sequenced samples from each species. The dark bars show samples from SCAI media and the transparent ones show diagnostic samples. d, Bar plot showing the number of sequenced samples from each high-level source. The dark bars show samples from SCAI media and the transparent ones show diagnostic samples. With the following exceptions, the three-letter species abbreviations used are explained in the main text: Klebsiella quasipneumoniae subsp. similipneumoniae (K. qps); Klebsiella planticola (K. pla).

We inferred a neighbour-joining tree of all 3,483 isolates using Mash24 distances (Fig. 2b) and generated a more statistically robust RAxML25 tree (GTR+Γ) based on a representative subset of 703 isolates (Fig. 2a). These trees were consistent with each other and with Kleborate26 species assignments, except for those cases where clusters were not present in the Kleborate database. We identified 15 recognized Klebsiella species, including Klebsiella pasteurii (K. pas) and Klebsiella spallanzanii (K. spa), which were first isolated during the course of this study27, and 8 isolates of Klebsiella huaxiensis (K. hua), which was previously only recovered from a urine sample in China28. Our data also resolved a new cluster of six isolates, to which we have assigned the label Klebsiella quasiterrigena (K. qte) (Extended Data Fig. 2) and two isolates from a hospital carriage that are positioned approximately equidistantly from Klebsiella grimontii (K. gri) and K. pas (labelled NA; Extended Data Fig. 3). A single isolate recovered from the surface of an automated teller machine (ATM) was assigned as a new species belonging to the genus Superficiebacter, designated Superficiebacter maynardsmithii29, and was retained as a convenient outgroup. Previous WGS studies did not support the assignment of the Raoultella species as a separate genus7,23,30, which is consistent with our data. Hence, we refer to these species as Klebsiella. Phylogenetic analyses revealed four higher-order clusters, which we have referred to as species complexes (SPECs), extending those used in26. These were named according to the canonical species in each group: K. pneumoniae (K. pne SPEC), Klebsiella oxytoca (K. oxy SPEC), Klebsiella ornithicolytica (K. orn SPEC (Raoultella)) and Klebsiella aerogenes (K. aer SPEC) (Extended Data Figs. 24).

K. pneumoniae (K. pne) was by far the most commonly sampled species, accounting for approximately half of the isolates (n = 1,705). This proportion was inflated by the inclusion of the 687 diagnostic isolates, 676 of which were sampled from healthcare settings. Of these isolates, 571 were K. pne (84%), confirming its dominance as a cause of human infection, with the opportunistic pathogen K. oxy being the second most common (n = 40; 6%) (Supplementary Table 4).

Species clonality and population structures

We compared the population structures of the different species by delineating SCs using PopPunk31 (Extended Data Fig. 5 and Fig. 3). This revealed high levels of diversity in all species, as previously described for K. pne32,33. In total, we identified 1,168 SCs across all species, of which only 41 (3.5%) were represented by more than 10 isolates and 50% of all isolates corresponded to SCs that were observed no more than 6 times. The most common SC within each species represented between 3 and 10% of the population (Fig. 3a) and the SC accumulation curves were not close to saturation (Fig. 3b). K. orn showed particularly high diversity; 147 SCs were identified from 258 isolates and the most common SC accounted for 3% of the isolates. Pairwise divergences tended to be distributed around a modal average of approximately 1% (Fig. 3c) and each lineage was roughly equidistant to every other lineage (Extended Data Fig. 6). In some cases (for example, K. pne, K. gri), a much smaller peak was also evident at a much lower divergence, reflecting expansion of individual SCs. Klebsiella michiganensis (K. mic), K. hua, K. spa, Klebsiella terrigena (K. ter) and K. aer also showed more diverged modal peaks, with core genome distances up to 3%; this reflects the presence of deep subdivisions within these species, which is consistent with nascent speciation; this was also evident in the individual species trees (Extended Data Fig. 6 and Supplementary Table 5).

Fig. 3: Clonality and population structure.figure 3

a, Composition of the eight most common species as determined by SC frequencies. For each species, the isolates were grouped by SC and the SCs were ranked by their frequencies as a proportion of the dataset (top 30 SCs shown). b, The number of unique SCs as isolates were sampled. Accumulation curves were produced by randomizing the order of the isolates and counting the SCs, and then repeating this 100 times (mean values plotted). The dashed grey line indicates the x = y line. c, Distribution of pairwise core genome distances for each species. The distances were estimated using PopPunk and the points were arranged in the x direction by density to show their distributions.

Species are distributed non-randomly across different sources

We explored the prevalence and distribution of the 15 recognized Klebsiella species and K. qte across different epidemiological and ecological sources (Figs. 2 and 4). Most of the disease isolates from hospital patients were recovered from diagnostic plates; thus, it was not valid to compare source prevalence between the diagnostic and SCAI samples. Therefore, the analysis presented in Fig. 4 was restricted to the 2,795 Klebsiella isolates recovered using the SCAI sampling strategy from any of the major source categories (n = 23 isolates excluded). Considering all sources, prevalence (calculated as the percentage of samples that were positive for at least 1 species) was highest for water samples (river, 100%; environmental, 85.2%; farm 86.1%) and turtles (82.6%), most of which are riverine. The source with the next highest prevalence was humans (hospital carriage, 58.5%; community carriage, 62.9%) and livestock (cows, 59.6%; pigs, 49.4%). The prevalence from soil was 44.6% and from plants 26.8%. While a high prevalence was observed from farm surfaces (53.1%), the prevalence from environmental and hospital surfaces was much lower (15.9%). Most species can be isolated from most sources; 20 sources harboured at least 7 species and 11 sources harboured at least 10 species.

Fig. 4: The distribution of species according to source.figure 4

Only Klebsiella samples from the SCAI dataset (n = 2,795) are shown and 23 of these samples were removed either because they were from very poorly sampled sources (21) or could not be confidently assigned to a species (2). The rows represent species delimited according to SPECs and the columns represent sources delimited according to source categories. The grey shaded rows at the bottom of the table give the total number of positive samples for the corresponding source, and below, the total number of samples for that source. The grey shading reflects the percentage prevalence from each source. The number of positive samples are shown for each species from each source and a blank cell indicates zero positive samples. The red shading shows the relative enrichment of each species from each source, given the overall prevalence from that source and assuming a null hypothesis whereby all species would be equally likely to be observed from any given source. The dark red and blue borders show those categories where the number of samples is significantly higher or lower than expected, respectively, as determined by a permutation test. The bar plot to the right shows the number of samples from each species and the total sampling effort.

We used a permutation test to gauge whether different species were non-randomly distributed between sources (Fig. 4). K. pne was significantly overrepresented in hospital carriage and in livestock (cows and pigs), as expected16,34, but was underrepresented in sheep, water (and turtles), invertebrates and soil/plants. Species within the K. orn SPEC were significantly overrepresented in soil and plants and underrepresented in hospital carriage, which is also consistent with previous work34. Other distributions were more surprising; for example, we did not find any evidence that Klebsiella variicola (K. var) is associated with plants, contrary to its original description35 and species from the K. oxy SPEC tended to be overrepresented in invertebrates. While this is consistent with a previous report of a symbiotic relationship between houseflies and K. oxy36, to our knowledge this specialism has not been described before in other species of this SPEC. These data also point to a significant overrepresentation of K. mic in hospital carriage and we note that a small but consequential proportion (17 out of 600; 2.8%) of the diagnostic isolates from hospital disease correspond to this species.

An important caveat with this analysis is that statistical association can result from clonality rather than ecological adaptation. For example, the apparent overrepresentation of K. oxy in turtles is due to the clonal expansion of a single lineage (K. oxy SC1) within a population of turtles in a pond at a botanical garden. However, we did not find evidence for clonal expansion of K. mic within hospital settings nor for certain K. mic lineages being more strongly associated with humans than others.

Distribution of resistance genes

Kleborate26 assigns isolates to 1 of 4 resistance scores: 0 = low level resistance; 1 = extended-spectrum beta-lactamase (ESBL)-positive; 2 = carbapenemase-positive; and 3 = carbapenemase plus colistin-positive. The distribution of species according to these categories and to each source is shown in Figs. 2 and 5a; a full breakdown of resistance classes is shown in Extended Data Fig. 7; 82.4% (2,870 out of 3,482) of the isolates were category 0, with those scoring 1–3 being either K. pne from multiple sources or isolates of other species from hospital patients (exceptions are discussed below). None of the isolates, including K. pne, recovered from outside a hospital setting harboured a carbapenemase gene or showed phenotypic non-susceptibility to carbapenems.

Fig. 5: Distribution of resistance and virulence genes according to species and source.figure 5

a, Resistance genes were identified and grouped into levels 0–3 by Kleborate. b, Virulence genes were identified and grouped into levels 0–5 by Kleborate. The area of the circles is proportional to the number of isolates and the text shows the number of isolates. The shading shows the proportion of isolates from a given species and source, which correspond to a given resistance or virulence level.

Only three isolates of species other than K. pne from outside the hospital setting harboured an ESBL gene; in each case, the gene in question was blaSHV-12. These were a K. orn isolate recovered from a fly caught within a hospital (SPARK_2923_C1), a K. orn isolate from environmental water (SPARK_1613_C1) and a Klebsiella quasivariicola (K. qva) isolate from a pig (SPARK_1906_C1). Excluding K. pne, there were 9 isolates from other species recovered from hospital patients that harboured ESBLs (blaCTX-M-15, n = 4; blaSHV-12, n = 5). Of note are a pair of clonally related isolates (SPARK_1773_C1, SPARK_2031_C1) belonging to clone K.qpq_SC_11_ST571, which harboured blaCTX-M-15 plus the virulence factors ybt, iro and rmpA. These isolates were recovered from urine samples from two inpatients at the same hospital in April 2018. This is consistent with hospital transmission of a new Klebsiella quasipneumoniae subsp. quasipneumoniae (K. qpq) clone exhibiting both resistance and virulence genes.

Excluding K. pne, only three isolates from other species harboured a carbapenemase gene; these were all isolated from the hospital environment and carried blaVIM-1. Two of these (K. mic SPARK_1816_C1 and K. gri SPARK_1652_C1) presented nearly identical genotypic and phenotypic resistance profiles to each other and to five isolates of K. pne. This resistance profile is characterized by the presence of the blaSHV-12, blaVIM-1, mph(A) and qnrS genes, harboured by a class 1 integron (GenBank accession no. MN783743) associated with the conjugative IncA plasmid pR210-2-VIM37. This plasmid is known to circulate in multiple Enterobacteriaceae species in Italy38 and the re-emergence of VIM-1 in this region is thought to reflect the increased use of ceftazidime-avibactam against K. pneumoniae carbapenemase (KPC)-producing bacteria. Closer analysis revealed the presence of this plasmid in distinct K. pne clones within a single patient and in other Klebsiella species (Extended Data Fig. 8).

Regarding the 1,705 isolates of K. pne, 1,105 (64.8%) exhibited a low level of resistance (category 0), 411 (24.1%) carried an ESBL (category 1), 175 (10.3%) carried a carbapenemase gene (category 2) and only 14 (0.8%) carried a carbapenemase gene and colistin resistance (category 3). Two ESBL genes were dominant; blaCTX-M-15 and variants of blaSHV-27, which together accounted for 83.5% of all ESBL genes. These were distributed non-randomly between sources; 238 out of 256 (93%) of the K. pne isolates bearing blaCTX-M-15 were from humans, the exceptions being from hospital surfaces and companion animals. In contrast, only 51 out of 170 (30%) of the K. pne isolates bearing blaSHV-27 variants were from human sources compared to 87 out of 170 (51%) from cows. Of the 175 K. pne isolates harbouring carbapenemase genes, all were isolated from the hospital environment and the majority (n = 161; 92%) carried blaKPC and corresponded to the healthcare-associated clones ST258/512 or ST307.

K. pne_ST307_SC1 was the most abundant clone in the dataset and was isolated from hospital surfaces and companion animals as well as hospital patients, although none of the ST307 isolates from non-human sources harboured blaKPC. Eleven K. pne isolates harboured blaVIM-1, including those discussed above, and 3 K. pne isolates harboured blaOXA-48. Of the 192 isolates with a carbapenemase gene for which phenotypic resistance data were also available, 91% showed phenotypic resistance to ertapenem, 71.7% to imipenem and 77.7% to meropenem. In contrast, the values were 0.8% (27 isolates), 0.18% (6 isolates) and 0.28% (9 isolates), respectively, for isolates (from all species) without a carbapenemase gene; these exceptions are likely due to changes in membrane permeability39. Consistent with the genotypic data, there was no evidence for any phenotypic resistance to carbapenems outside of the hospital environment.

There were 14 isolates in the highest resistance category; these were all K. pne isolates from hospitals and all harboured the carbapenemase gene blaKPC plus a mutated mgrB gene known to confer colistin resistance. All except one of these isolates belong to the common healthcare-associated clone ST258/512, with the exception of a single ST307 isolate. Phylogenetic analysis of the 95 ST258/512 isolates suggested at least 5 acquisitions of the mgrB chromosomal mutation into this clone (Supplementary Fig. 1). The available phenotypic data confirmed resistance to colistin in 13 out of 14 of these isolates, 1 of which (SPARK_1222_C1) was originally assigned as sensitive using the BD Phoenix 100 automated system but was subsequently found to be resistant using the Sensititre platform. Phenotypic sensitivity to the other isolate containing mgrB (SPARK_372_C2) could not be confirmed because this isolate lost viability. In total, phenotypic resistance to colistin was observed in 46 K. pne isolates, 41 of which were from humans. Besides the 12 phenotypically resistant isolates harbouring an mgrB mutation, Kleborate did not detect a mechanism for colistin resistance in the other cases, including three K. pne isolates from pigs and a single K. aer isolate from a goat. This was not unexpected since many mcr variants are not included in the Kleborate database and colistin resistance can also be conferred through mutations responsible for membrane synthesis40. The final non-human colistin-resistant isolate was a single K. pne isolate from a cow that harboured mcr-1.

Distribution of virulence genes

Like the genotypic resistance profiles, all isolates were assigned to 1 of 6 categories based on the presence of genes encoding the known virulence factors yersiniabactin (ybt), aerobactin (iuc), salmochelin (iro) and colibactin (clb), as identified by Kleborate (Figs. 2 and 5b); 2,749 out of 3,482 (78.9%) of all isolates and 1233 out of 1705 (72.3%) of K. pne isolates were in the lowest virulence category and 669 out of 3,483 (19.2%) of all isolates corresponded to virulence category 1, reflecting the presence of ybt, but the frequency of this locus varied markedly between species: ybt was present in 410 out of 1706 (24%) of the K. pne isolates, 249 out of 258 (96.5%) of the K. orn isolates, 6 out of 279 (2.1%) of the K. var isolates and 2 out of 171 (1.1%) of the K. aer isolates. The ybt locus in K. orn was assigned as an ‘unknown’ type by Kleborate, chromosomally located close to an transfer RNA-Asparagine site (with no evidence for an associated integrative conjugative element) and was phylogenetically distinct from the ybt locus in K. pne41,42. Despite being a core locus in K. orn, this distinct ybt variant was not found in any other species, including related species from K. orn SPEC.

While only 7 isolates corresponded to virulence category 2 (ybt + clb), 45 K. pne isolates and 1 K. oxy isolate were assigned as virulence category 3. These isolates harboured the iuc locus that encodes the siderophore iuc and 38 out of 46 were recovered from pigs. In total, 42 out of 87 (48%) of the pig isolates harboured iuc and in 40 out of 42 (95%) cases harboured iuc3. Three K. pne isolates and one K. oxy isolate from the farm environment (water and surfaces) also harboured iuc3; a similar association between iuc3 and pig isolates has recently been described in Germany43. The high frequency of iuc3 in porcine isolates contrasts with clinical isolates, in which iuc1 and iuc2 are more common44. The porcine iuc3 was observed on multiple sequence types (STs) and from different farms; hence, it is not a simple consequence of clonal spread. Preliminary analysis also suggests that iuc3 is carried by diverse plasmids (Extended Data Fig. 9).

Twelve isolates were predicted to show a high level of virulence (categories 4 and 5). The two category 5 isolates corresponded to the hypervirulent lineage K. pne ST23 and contained all five virulence loci. These isolates were from patients in different hospitals and were sufficiently diverged to rule out epidemiological linkage. One of these ST23 isolates, SPARK_1158_C1, isolated from the urine of a hospital inpatient, had also acquired the resistance genes qnrS1 and blaTEM and exhibited phenotypic resistance to ciprofloxacin and levofloxacin. Of the ten K. pne isolates corresponding to category 4, four were from cases of hospital disease, four from pigs (all containing iuc3) and two from dogs. The two isolates from dogs, representing STs 5 and 25, harboured ybt, iuc, iro and rmpA.

The distribution of sublineages below the species level

We examined the distribution across sources of subspecies SCs as defined using PopPunk, using the same permutation test used to examine species distributions (Supplementary Figs. 217). This analysis revealed that different K. pne lineages were associated with either cows or humans (Supplementary Fig. 2) and this was also borne out by phylogenetic analysis (Supplementary Fig. 18). The lineages SC1_ST307, SC2_ST17, SC3_ST512, SC4_ST45 and SC11_ST392 were mostly associated with humans, although these varied in the degree to which they were associated with hospital carriage versus hospital disease. For example, 66% of the SC1_ST307 isolates were associated with hospital disease and 28% with hospital carriage. These figures contrasted with K.pne_SC2_ST17, the second most common lineage in our dataset (n = 128), for which the equivalent figures were 20% and 57%, respectively. Other K. pne SCs were associated with cows rather than humans (for example, SC5_ST661, SC9_ST3068, SC10_ST2703, SC17_ST3345). Some intermingling occurred, particularly in SC5_ST661, which contained clonal expansions of both bovine and human isolates. This lineage was previously observed from both human and bovine sources16 and may represent a more generalist clone that is adapted to and able to transmit between both cows and humans.

Fewer statistically significant SC enrichments were apparent for other species, likely due to smaller sample sizes. However, a number of observations were notable. As discussed, K. mic was enriched within hospital carriage (Supplementary Table 4) but this was not due to the expansion of a single SC. Twenty-five of the 30 most common SCs of this species were present in hospital carriage samples but no single SC was significantly more commonly associated with hospital carriage relative to the others (Supplementary Fig. 3). In contrast, the association of K. gri with invertebrates was largely driven by K. gri SC1 (Supplementary Fig. 7). This is unlikely to reflect clonal expansion or sampling bias since K. gri SC1 was associated with different invertebrate hosts (a cockroach, fly, wasp and an unspecified bug) sampled in different locations. This clone, which has no notable resistance or virulence attributes, was also recovered from a cockroach caught in a hospital environment; an isolate very closely related to this clone was recovered from an outpatient of the same hospital (Supplementary Fig. 19).

Quantifying transmission

To quantify and compare transmission events between different settings, we used a single-nucleotide polymorphism (SNP) threshold-based approach (thresholds: 0, 1, 2, 5, 10, 20). It was clear from the resulting transmission matrices and networks (Fig. 6 and Supplementary Table 6) that most of the transmission occurred within a single source and, most importantly, that acquisition by humans almost always originated from other humans rather than from animals or the environment. In particular, our analysis further reinforces the view that transmission of K. pne, and other species, between cows and humans, which are the two most deeply sampled sources, is limited. Despite this, we note that sporadic transmission events occur relatively commonly between humans and companion animals and very occasionally between humans and other sources, including river water and invertebrates.

Fig. 6: Transmission heatmaps and networks.figure 6

a,b, Heatmaps showing the number of transmission events between each pair of sources, as determined by SNP thresholds of 1 (a) and 10 (b). The shading is proportional to the number of events and does not account for the number of samples from each source. c,d, Transmission networks showing the number of transmission events between each pair of sources, using the same data as in the heatmaps in a (c) and b (d), except that within-source events are not shown. The nodes represent the sources and the area of the node is proportional to the number of samples from that source. The edges show the number of transmission events and the thickness of the edge is proportional to the number of events between the two sources.

留言 (0)

沒有登入
gif