Rapid identification of mosquito species and age by mass spectrometric analysis

REIMS spectra are a rich and informative data source

To generate a REIMS signal, the insect sample is combusted by the application of a high-frequency electric current. The resulting aerosol is collected by an airflow leading to the ion source of the instrument, where molecules are de-clustered/ionised and subsequently detected by the mass spectrometer [43]. In a single burn event, a small insect, such as an adult An. gambiae (adult dry weight, 250 µg, [44]) can be completely consumed in about 5–10 s (Additional file 1: video file V1). The mass spectra acquired at 1 Hz throughout the burn event are summed, processed and binned into 0.1 mass to charge (m/z) bins. The negative ion mass spectra largely reflect the lipid profile of the insect and are richly informative in the range of 50–1200 m/z. Moreover, different individuals from the same cohort yielded consistent mass spectra, but which also differed from the mass spectral patterns from other, different cohorts. The rapidity of data collection means several hundred individuals can be analysed per day. After data acquisition, the resultant data matrix, containing sample identification and binned m/z values (typically over 100 samples and over 11,000 m/z bins) is then subjected to multivariate analysis using machine learning and classification approaches (Fig. 1).

Fig. 1figure 1

Overview of REIMS analysis of mosquito specimens. (a) General information about the mosquito samples used in this study. Details are listed in the methods section or mentioned with the corresponding results. Collection and treatment procedures were varied to explore the stability of the REIMS analysis. (b) Samples were combusted by the diathermy electric current and the resulting aerosol was evacuated through a tube and introduced to the mass spectrometer, where the ionised molecules were detected. One ‘burn event’ was generated for each mosquito and the corresponding mass spectrum was integrated over the duration of the event (typically, 5 s). (c) Mass spectral data were pre-treated and collapsed into 0.1 m/z wide bins from 50 m/z to 1200 m/z. The resulting data matrix is then used to identify patterns through principal component and linear discriminant analysis (PCA-LDA) as well as random forest analysis, allowing sample classification and identification

Application of REIMS for species identificationLaboratory-reared mosquitoes

First, we examined whether REIMS was able to discriminate three species of the An. gambiae species complex. We acquired REIMS mass spectra from 202 specimens, all 4-day-old females, comprising 54 specimens of An. coluzzii (strain Ngousso), 59 specimens of An. gambiae s.s (strain Kisumu) and 89 An. arabiensis mosquitoes, (strain Moz). All mosquitoes were killed by freezing, stored at − 20 °C and analysed by REIMS in a randomised order. In all instances, the burn yielded a satisfactory mass spectrum. Prior to data analysis, mass spectral data were pre-processed in Offline Model Builder (Waters) in which the background signal was subtracted, spectra were mass corrected using a lock-mass standard (leucine enkephalin, 554.26 m/z) that was co-infused with each sample and finally, spectra were discretised by binning signals into 0.1 m/z wide intervals.

Principal component analysis followed by linear discriminant analysis (PC-LDA) using the Offline Model Builder software resolved the three species with only a single An. coluzzii individual being co-localised with the individuals from An. gambiae s.s (Fig. 2a). The first discriminant function was responsible for resolution of An. arabiensis from the other two An. gambiae s.l species whereas the second function yielded a good resolution of An. gambiae s.s and An. coluzzii, a result which correlates with their genetic relatedness (Fig. 2d). This finding is mirrored in the kernel density (Fig. 2b) and scatter plots (Fig. 2c) based on PC-LDA. The mass spectra for the three species were very similar and exhibited slight variation in the region of 600–1200 m/z (the averaged spectra based on all individuals available for each species are in Additional file 2: Sup. Figure 1). The subtlety of REIMS discrimination and the ability to resolve different species, even when they are closely related and morphologically identical (Fig. 2d, e), resonates well with our prior observations on Drosophila species [43]. As a further test, the entire set of REIMS spectra from all individuals were randomly assigned to three equally sized categories — these could not be resolved (Additional file 2: Sup. Figure 2). The PCA-LDA model was also re-built using less variance (fewer principal components) and all models were cross-validated in Offline Model Builder (Additional file 2: Sup. Figures 3 and 4).

Fig. 2figure 2

Analysis of three Anopheles mosquito species by REIMS. A total of 202 specimens from An. arabiensis (n = 89), An. coluzzii (n = 54) and An. gambiae s.s (n = 59) were killed through freezing and stored at − 20 °C until REIMS analysis. All specimens were female and 4 days old. Principal component–linear discriminant (PC-LD) analysis of the REIMS data within the model building software Offline Model Builder led to a clear separation of the three species (a). After exporting the data matrix (incl. classifications and signal intensities after pre-processing) PC-LDA was repeated in R; results are displayed in form of kernel density (b) and scatter plots (c), shown for both linear discriminants 1 and 2. The group formation correlates with the genetic relatedness of the three groups (d, from [45]); the biggest variance (LD 1) supporting the separation of An. arabiensis, followed by the separation of An. gambiae s.s and An. coluzzii via LD 2. Images of females from all three species are displayed in (e)

Wild mosquito populations

Whilst laboratory specimens are reared under controlled conditions and feeding regimens, a critical test of the methodology arises when applied to specimens recovered from the natural environment. We therefore applied REIMS to the analysis of mosquitoes derived from the saltwater marshes and freshwater pools around the town of Neston on the Wirral peninsula (located in the Northwest of the U.K). This area and terrain support the proliferation of several different mosquito species that have been monitored over many years [46]. Larvae were collected and raised to adults of both sexes, which were identified to species level by morphological examination [47, 48]. In total, seven different species were collected in sufficiently large numbers for REIMS analysis: Culex pipiens, Culiseta annulata, Aedes caspius, Aedes punctor, Aedes rusticus, Aedes cantans and Aedes detritus. Eighty individuals of each species were analysed in a formally randomised sequence.

The data, acquired from 0- to 4-day-old female and male mosquitoes belonging to seven different species (80 individuals per species) were used to develop a species model (Fig. 3). PC-LDA in Offline Model Builder (OMB) readily resolves Cx. pipiens, Cs. annulata and Ae. caspius from a tight cluster containing the other four species (Fig. 3a). In fact, this four-species cluster was also resolved, as seen with a clipped and differentially projected location (Fig. 3c). Indeed, tighter clustering of data points for Ae. cantans, Ae. punctor, Ae.rusticus and Ae. detritus is consistent with their phylogenetic proximity [49] (Fig. 3d). The samples were also analysed through random forests with the resulting species determination model reaching an average accuracy of 91% compared to 98% for OMB (Fig. 3b, Additional file 2: Sup. Figure 5). When species classifications were randomly assigned to samples, separation of classes failed (Additional file 2: Sup. Figure 6), confirming that there are discriminant patterns within spectra from mosquitoes displaying natural variability, and establishing the potential of REIMS to the study of field-caught mosquitoes.

Fig. 3figure 3

Resolution of local mosquito species using wild-derived specimens. Mosquito larvae collected from the wild (Neston region, north west UK) were raised to adults (males and females, ages from 0 to 4 days) and identified by morphological examination, before being killed and stored at − 20 °C for varying lengths of time. Collection of larvae as well as REIMS analysis of stored adults occurred over several months. The data acquired for a total of seven species (80 individuals per species) was analysed in Offline Model Builder via PC-LDA using 100 PCs (a, b). Visualisation was aided by removing clearly separated species groups (Cs. annulata, Cx. pipiens, Ae. caspius) from the model (c) and axis rotation. The PC-LDA separation was resonant with the phylogenetic relationships of these species (d). The OMB model was cross-validated (using the option ‘Leave 20% out’ and a standard deviation of 5) and reached a classification accuracy of 98%. Additionally, the data was analysed using a random forest algorithm, which produced a separation accuracy of 91% (section b). Details of the cross-validation and random forest results can be found in Additional file 2: Sup. Figure 5

Blinded identification study

To test the utility of REIMS as a predictive tool, we used the seven-species model in Fig. 3. A further 187 larvae were grown to adults and assigned to species by independent morphological examination. Species was blinded to the analyst, and the REIMS data were submitted to the model for species prediction. Overall, 94% accuracy was achieved for blinded samples (Table 1).

Table 1 Species identification of unknown samples

The seven-species PC-LDA model built in Offline Model Builder with 100 PCs was exported to the recognition software and used to identify blind samples of unknown species (Unknowns). Samples were categorised depending on their time of analysis: samples that had been analysed at the same time as the samples used for model building (I), samples analysed within weeks of model samples, (II) and samples that were analysed over two months later than the last samples included in the model building (III). The number of tested samples is stated in brackets. The percentage of correctly identified samples and the likelihood that the identification is correct are highlighted in yellow for easier comparison. Not all species were represented in every test group and identification success varied in some categories with the species; a more detailed list can be found in Additional file 2: Sup. Figure 7.

Species identification of larval stage mosquitoes

We explored the ability of REIMS to resolve the immature (larval) stages of mosquitoes into species. Whilst it is very difficult to establish the species of the larvae by morphology, many of the pools in the Neston area from which the mosquitoes were collected are essentially mosquito monocultures. Many larvae were harvested, a subset of which was immediately analysed with REIMS, whilst the remainder were allowed to develop into adults. The identity of the emerged adults was then used to predict the species heterogeneity of the larval population in each marsh pool, confirming that the pools contained single species (100% of the adults identified were the same species, either Ae. punctor or Ae. detritus; detailed data in Additional file 2: Sup. Figure 8).

Separation of Ae. detritus and Ae. punctor larvae was very distinct with only 20 principal components (Fig. 4a, b). In random forest analysis, the test samples were virtually always identified correctly, with correct identification rates of 100% (Ae. detritus) and 99% (Ae. punctor) (Fig. 4d). Cross-validation of the OMB model resulted in one misclassification and one outlier from 250 samples (Fig. 4c). Seeing this distinct separation through PC-LDA, principal component analysis alone was performed in OMB, which resulted in samples clustering into species groups along PC3 (Additional file 2: Sup. Figure 8b). The larval-based species model was also re-built using randomly assigned classifications; the resulting model was devoid of any separation or sample clustering (Additional file 2: Sup. Figure 8c).

Fig. 4figure 4

Species identification of immature mosquito larvae. Immature mosquitoes were obtained by filtering water collected from different pools, followed by 2–3 rinsing steps before killing and storing them at − 20 °C. Samples, mostly 3rd instar larvae were analysed with the same REIMS settings used for adult specimens. Their species was confirmed by sampling the same pool of larvae and raising them to adults before, during and after taking samples to be used for this model. The differences detected by PC-LDA are visualised in the form of an OMB model (a) and kernel density and scatterplots produced in R (b). The separation was put to test via cross-validation in OMB (c) and random forest analysis with training/test dataset split of 70%/30% (d). The separation of Ae. detritus (n = 125) and Ae. punctor (n = 125) larvae required little information/variance, reaching very distinct separation with only 20 principal components for PC-LDA. When testing the random forest model, samples were identified correctly most of the time leading to correct identification rates of 100% (Ae. detritus) and 99% (Ae. punctor)

Application of REIMS for age profiling

A characteristic of particular significance in vector control is that of female mosquito age. Establishing the age of individual mosquito specimens could, in combination with regular sampling and high-throughput analysis, allow determination of population age profiles and hence the risk of disease transmission. Age distribution would inform and support vector control strategies, particularly when in control validation stages, with a reduction in the median age of the population being a more reliable proxy of the public health value of a new tool than metrics based on population density. Indeed, vector control trials frequently include measurements of the proportion of the population that are parous (having laid at least one egg batch) as a crude measure of the proportion of ‘older’ adults (over 3 days [50]). The ability to rapidly and reliably separate mosquitoes into different age groups would be transformative for mosquito surveillance. We therefore explored whether REIMS possessed the ability to resolve mosquitoes according to age.

Laboratory-reared specimens

Reports of the lifespan of female An. gambiae s.l mosquitoes range from 2 to 43 days, with longevity under laboratory conditions typically greatly exceeding that in the field where predation, environment and disease all reduce survival [51]. For the initial tests, we included female An. gambiae s.s (Kisumu strain) raised for 0–1 day, 2 days, 3 days, 4 days and 5 days under standard insectary conditions. Mosquitoes of all five age groups were killed by freezing and stored at − 20 °C the same day. After REIMS analysis, PC-LDA in OMB led to a clear clustering according to age, with the location of each age group reflecting the developmental age of specimens (Fig. 5a). Further PC-LDA in R and visualisation through 3D plots with different triads of the top four linear discriminants reveals a similar picture of age progression along LD 1. The values of LD 1 contained enough information to provide some distinction for all groups but with overlap; the second to fourth LDs added further resolution of specific groups, improving the overall separation.

Fig. 5figure 5

Discrimination of Anopheles gambiae mosquitoes by age. Two groups (a and b) of Anopheles gambiae s.s (Kisumu strain) specimens of different ages were killed by freezing before REIMS analysis. Mosquitoes were raised on sugar solution, regardless of age. Differences between age groups were explored by PC-LDA and visualised using OMB (i) as well as R, in form of 3D models (using different linear discriminant combinations) (ii) and kernel density plots for each LD. The difference between classes in model a (built with 100 PCs in OMB and R) is only 24 h; nonetheless, a distinct group formation can be observed with chronological positioning within the 3D space, leading to a transition from younger to older samples. Model b (built with 88 PCs in OMB and 85 PCs in R), comprising young and old mosquitoes (2 groups each), revealed greater variance between the young classes than between the old. Sample numbers per class, model a: 0–1 day (n = 47), 2 days (n = 84), 3 days (n = 39), 4 days (n = 27), 5 days (n = 30); model: 0 days (n = 17), 1–2 days (n = 29), 12 days (n = 46), 13 days (n = 83)

To test the discrimination with greater age separation, a second set of An. gambiae s.s was analysed, focusing on more broadly spaced age groups: very young (0 days or 1–2 days) and old (12 days or 13 days) (Fig. 5b). We adopted different age classes to protect against an artefactual separation across a single age class. Again, samples were grouped according to their age class, however, with unequal separation between the three classes. The difference between mosquitoes that had just emerged (day 0) and those that are 1–2 days old was greater than the difference between adults at 12 days and 13 days, which could reflect metabolic and developmental changes in the first 24 h after emergence from pupae. This difference in variance can also be clearly seen in either scatter or kernel density plots; both LD 1 and 2 are adding to the separation of the young mosquitoes, whereas LD 3 was able to provide limited variance to distinguish between 12- and 13-day mosquitoes.

Both age models were also built with fewer principal components (Additional file 2: Sup. Figure 9), to test robustness. Furthermore, as a control, mosquitoes were randomly assigned to the different age classes (Additional file 2: Sup. Figure 10) to confirm that separation is driven by age-related differences between classes.

The ease with which age could be resolved and the clear separation of the 0–1-day-old mosquitoes might reflect a major change in lipid deposits post-emergence. However, averaged mass spectra of 0- to 5-day-old mosquitoes are visually similar (Additional file 2: Sup. Figure 11). Nevertheless, sample clusters based on age-related variance were formed, making age a REIMS-accessible parameter. In contrast to species or sex, age is a continuous parameter and could benefit from conversion to discontinuous categorisation to help define class boundaries and increase separation and identification accuracy. A reduction in class numbers (from five to three and from four to three), achieved by combining neighbouring classes, reduced the overlap between age groups and improved the definition of class boundaries (Additional file 2: Sup. Figure 12) as well as increased correct classifications rates obtained through cross-validation of PC-LDA models in OMB (Additional file 2: Sup. Figure 13).

Following the successful separation of laboratory-reared mosquitoes according to their species or age, potentially confounding factors were introduced to the next sample set to further test the concept of mosquito characterisation through REIMS, particularly in the context of field-caught samples, in which circumstance, freezing might not be an option. Female specimens from An. arabiensis (strain Moz), An. gambiae s.s (strain Kisumu) and An. coluzzii (strain Ngousso) were each raised and sampled into three different age groups: 1 day, 5–6 days and 14–15 days. The specimens were then killed through dehydration and stored at room temperature with desiccant material for 1–1.5 weeks before REIMS analysis. These data were used to build two different models: one to resolve species, the other to explore resolution according to age. Thus, mosquitoes of different ages are part of the species model (Fig. 6a) and age separation was tested on aggregated data from all three species (Fig. 6b).

Fig. 6figure 6

Separation of species and age by REIMS. Mosquitoes (total n = 540) were raised to three different age groups (1 day, 5–6 days, 14–15 days) for each of the three species (An. coluzzii, An. gambiae s.s, An. arabiensis, see text). The specimens were killed by dehydration and stored at room temperature with desiccant for 1–1.5 weeks prior to analysis. The samples were used to build two models: one separating the three species (a) and one separating the three age groups (b). Data was processed using PC-LDA in Offline Model Builder (using 100 PCs) (i) and R (using 235 PCs), latter visualised for each linear discriminant (LD 1 and LD 2) separately in form of kernel density (ii) and scatter plots (iii). The data matrix, exported from the Offline Model Builder software, was additionally analysed using the random forest algorithm in R; 70% of the samples were used for model building, 30% as test samples. The classification results are depicted as a bar graph, showing percentages of correctly and wrongly identified samples (iv). Depicted are the average values of 10 random forest repeat runs ± the standard error of the mean, with the range of accuracy values that were achieved in brackets

The average performance statistics and number of correct and incorrect classifications confirm a high level of discrimination (Fig. 6). Despite increasing variability in the data set by inclusion of specimens of different ages, separation of species was still successful, even though samples within groups are slightly more scattered and overall group resolution was slightly reduced. The average accuracy achieved through random forest analysis was 87% correct identification for An. arabiensis, 81% for An. coluzzii and 83% for An. gambiae s.s. The greatest degree of misclassification was clearly between An. gambiae s.s and An. coluzzii (12–13%). Cross-validation of the PC-LDA model built in OMB resulted in an even higher correct classification rate of 98% (Additional file 2: Sup. Figure 14).

The second model, separating mosquitoes according to their age, displays a tight clustering of samples within age groups and a very distinct separation between age groups. As previously observed, the very young mosquitoes (1 day old) resolved readily from older specimens and show the biggest separation. Both analytical approaches, PC-LDA and random forest, reinforce the earlier observation that the 1-day-old specimens are easily resolved (Fig. 6b). Overall, the age model achieved an average accuracy of 91%, making this age separation species independent. Cross-validation of the PC-LDA-based model produced an average accuracy of 99% (Additional file 2: Sup. Figure 14). Thus, although samples had been treated very differently, compared to the previous species and age models (Figs. 2 and 5), REIMS analysis nevertheless resulted in information-rich mass spectral data allowing for clear separation of species and highly accurate age discrimination. Both models (species and age) were also re-built in R with fewer principal components to confirm that classes can still be separated using less variance and potentially a smaller number of separating factors (Additional file 2: Sup. Figure 15).

For species resolution, the m/z bins contributing a high level of discrimination were different from those driving age discrimination (Additional file 2: Sup. Figure 16), suggesting that it should be possible to derive a two-dimensional model. Both characteristics, species and age, would be of interest when identifying trapped mosquitoes, leaving two options: acquire data and test with two separate models or build a two-factor model capable of providing both types of information at once. To test whether a two-factor model could be a viable option, the same data set used to build the species and age models (Fig. 6) was split into nine classes, each representing two factors, species and age (Fig. 7). Regardless of the added difficulty of splitting variance to enable separation of nine classes based on two factors, samples were successfully grouped and separated. The average random forest performance of the species model was 84%, lower than the performance of the age model (91%). The two-factor model assigned more variance to the age separation than the separation of species, which is apparent in the clustering according to age rather than species, followed by the separation of the three clusters (1 day, 5–6 days, 14–15 days) along linear discriminant 1 (Fig. 7a). From comparison of the 5–6-day-old and the 14–15-day-old classes, age is separated based on LD 1, followed by species resolution of An. arabiensis from the other two Anopheles species through LD 2 and further separation of An. gambiae s.s from An. coluzzii via LD 3 (Fig. 7b, c). This analysis augurs well for profiling analyses with wild-caught specimens. Cross-validation following PC-LDA in OMB produced an average correct classification rate of 97% (Additional file 2: Sup. Figure 17) whereas random forest analysis identified 79% of samples correctly (Additional file 2: Sup. Figure 18, PC-LDA in Sup. Figure 19).

Fig. 7figure 7

Two-factor model combining species and age information. The same samples used to build the species and age models in Fig. 6 were used to construct a two-factor model, comprising nine classes, each containing species and age information. Separation of all nine classes (60 specimens each) was attempted using PC-LDA in Offline Model Builder (based on 100 PCs) (a). Due to the wide dispersion of the 1-day-old groups, the spatial resolution of the 5–6- and 14–15-day-old groups in the 3D space is hindered. To help visualise the separation, the 1-day-old sample groups were removed (b). The largest variance in the data set (LD 1) is correlated with age, followed by species separation of An. arabiensis enabled by LD 2 and lastly separation of An. gambiae s.s from An. coluzzii based on LD 3 (c)

Following random forest analysis, the variables driving the separation of the nine classes were aggregated and compared to m/z bins previously identified as important in the species and age models (Additional file 2: Sup. Figure 16). Three out of five variables had been identified previously, which confirms their importance for species and age separation respectively (Additional file 2: Sup. Figure 20). Lastly, the separate age and species models as well as the nine-class model were all rebuilt using randomly assigned classifications to ensure the separating patterns are not based on unrelated noise; separation failed for all three models (Additional file 2: Sup. Figure 21).

Wild-derived specimens

Larvae of Ae. detritus collected in the Neston area were allowed to develop to adults and were harvested at different ages. We used these adults to establish whether age-specific patterns could be retained in the presence of confounding factors, such as impact of the natural environment during larval development. PC-LDA of 0–4-day-old Ae. detritus mosquitoes (males and females) resolved into four age classes (1 day, 2 days, 3 days and 4 days) with clear resolution of the four groups (Fig. 8a). As expected for continous data, the classes are very close or slightly overlap, consistent with the previous analysis of 0–5 day old An. gambiae (Fig. 5). The three age groups ranging from 2 to 4 days were resolved by LD 2 and 3, whereas mosquitoes that had just emerged are separated by the largest variance component (LD 1, Fig. 8a). This finding is particularly encouraging, as it matches previous age models based on laboratory-raised specimens. To test more separated classes, in a subsequent analysis, the 2-day class was removed and additionally, the two older age groups were combined to increase separation and sample numbers. Separation of the two new age classes was clearly improved when analysed through PC-LDA in Offline Model Builder, repeated PC-LDA in R replicated the result, showing only a few samples are being misclassified in the process (Fig. 8b). The data matrix exported from OMB was also subjected to random forest analysis, based on 70% training data/30% evaluation data. The 1-day-old mosquitoes reached a correct identification rate of 93%, of the 3–4-day mosquitoes 89% were correctly identified (Fig. 8b). Aggregated age classes, with at least a 24-h period between them, are clearly beneficial for class separation and the gain in performance more than offsets the loss of resolution.

Fig. 8figure 8

Discrimination of Aedes detritus mosquitoes by age. Ae. detritus mosquitoes, emerged from larvae collected from natural pools, were killed by freezing at different ages ranging from 0 to 4 days (males and females). As a first step samples were sorted into four age groups for PC-LDA (based on 75 PCs), which resulted in a definitive grouping according to age (a), with linear discriminant 1 separating just emerged mosquitoes (1 day) (i) and linear discriminants 2 and 3 separating specimens which are between 2 and 4 days (ii). To improve separation a 1-day gap was introduced and two groups merged (b). PC-LDA shows a clear reduction in class overlap in OMB (using 50 PCs)(i), which can also be observed in the kernel density (ii) and scatter plots (iii) produced in R (using 55 PCs). Random forest analysis, using a 70%/30% ratio for training and testing, resulted in identification accuracies of 93% for 1-day-old specimens and 89% for 3–4 day old mosquitoes (iv). Sample numbers per class for the model in a: 1 day (71), 2 days (28), 3 days (21), 4 days (31). Sample numbers per class for the model in b:1 day (55), 3–4 days (52) — sample numbers from the 1-day class were reduced for random forest analysis

The laboratory mosquito-based age model proved to be stable even in the presence of multiple species. To test whether this also holds true for wild-derived specimens, mosquitoes from four different species (Aedes detritus, Culiseta annulata, Aedes rusticus, Aedes punctor) were used to build age models with the same classifications as seen in Fig. 8 (Additional file 2: Sup. Figure 22). The average percentage of samples correctly identified through random forest dropped only slightly from 91% (Fig. 9b) to 88% upon the introduction of different species. As before, this analysis was performed with combined classes and a 1-day gap between them. Reduction of classes alone can increase separation performance significantly, which was tested using 1-day and 3–4-day classes. A comparison of PC-LDA models, built using different sets of classes, shows the step-wise improvement of separation for the single-species as well as the multi-species model (Additional files 2: Sup. Figures 23 and 25). All models were cross-validated in Offline Model Builder (Additional file 2: Sup. Figures 24 and 26).

Fig. 9figure 9

Species independent separation of age groups using wild-derived mosquitoes. In addition to the Ae. detritus samples used in Fig. 8, samples from four other species and another age class (7–10 days) were introduced, including adult mosquitoes that were kept with or without water or fed sucrose solution. A portion of specimens was killed by freezing, some died due to other reasons shortly before being collected; all were stored at − 20 °C before analysis. PC-LDA achieved separation of the three age groups (1 day, 3–4 days, 7–10 days), as can be seen in the OMB model (based on 100 PCs) (a) as well as the scatterplots (b) and kernel density distributions (c) created in R (based on 95 PCs). An overall 90% of test samples were correctly identified during random forest analysis with group-specific accuracies of 94% for newly emerged specimens (1 day old), 84% for 3–4-day-old mosquitoes and 92% for the 7–10-day-old group (d). Sample numbers per class: 1 day (75), 3–4 days (75), 7–10 days (69)

After successful separation of age groups using wild-derived mosquitoes, which were unfed and killed by freezing, more samples were added to the existing 0–1 day/3–4 days model to test the stability of pattern-based age grading (Fig. 9). A broader range of samples was included in this extended model, purposely adding variance and potentially confounding factors. A third age group was introduced to represent older mosquitoes, ranging from 7 to 10 days. Whilst no intervention took place between emergence and killing when raising the previously used wild-derived mosquitoes (Figs. 3 and 8), these adults were raised in three different conditions: kept dry, kept with water or fed with sucrose solution. Furthermore, samples from different species were added; four species (Ae. detritus, Ae. rusticus, Ae. punctor and Cs. annulata) are represented in the first two age classes (1 day and 3–4 days) and 2 species (Ae. detritus and Ae. caspius) are part of the older age class (168–240 h). Whereas all 1-day-old specimens had been killed by freezing, some mosquitoes of the older age classes had died naturally before collection. They were nonetheless added to the sample pool to add further variance to the model. Linear discriminant analysis in OMB, based on 100 principal components, produced a distinct separation of the three age groups (Fig. 9a). Similar results can be seen when conducting LD analysis (based on 95 PCs) in R, the scatterplots show that only a few samples are confused between the 1-day and 3–4-day classes and that all 7–10 day old mosquitoes are correctly located (Fig. 9b). The latter can be explained when viewing the variance distribution in the kernel density plots; the oldest group is separated clearly via LD 1, whereas separation of 1 day and 2–3 day is mostly based on LD 2, meaning there is a slightly greater variance benefitting the distinction of 7–10-day-old specimens (Fig. 9c). In random forest analysis, the 1-day-old test samples scored the highest identification accuracy with 94%, followed by the 7–10-day-old class with 92% of samples currently identified and the 3–4-day-old group with 84% correct identification (Fig. 9d).

All three age models for wild-derived specimens were re-built in OMB after randomly assigning them age classifications to ensure that previous separations had been achieved through age-related variance (Additional file 2: Sup. Figure 27). A comparison of cross-validation results of models with correct or randomly assigned classifications underpins the presence of age-related differences in the collected data; classification rates for models with random sample assignment are drastically lower (Additional file

留言 (0)

沒有登入
gif