Getting the conclusive lead with investigative genetic genealogy – A successful case study of a 16 year old double murder in Sweden

1. IntroductionInvestigative genetic genealogy (IGG) or forensic genetic genealogy (FGG) has emerged as a powerful forensic tool to generate crucial leads to identify unknown perpetrators and to identify unknown human remains [Erlich Y. Shor T. Pe’er I. Carmi S. Identity inference of genomic data using long-range familial searches., Greytak E.M. Moore C. Armentrout S.L. Genetic genealogy for cold case and active investigations., Pedigrees and perpetrators: uses of DNA and genealogy in forensic investigations., Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes., Tillmar A. Sjölund P. Lundqvist B. Klippmark T. Älgenäs C. Green H. Whole-genome sequencing of human remains to enable genealogy DNA database searches - a case report., Perego U.A. Bodner M. Raveane A. Woodward S.R. Montinaro F. Parson W. Achilli A. Resolving a 150-year-old paternity case in Mormon history using DTC autosomal DNA testing of distant relatives., Kling D. Phillips C. Kennett D. Tillmar A. Investigative genetic genealogy: current methods, knowledge and practice.]. IGG includes the use of large genotype data sets, typically including hundreds of thousands of single nucleotide polymorphisms (SNPs), in combination with large public genealogy DNA databases in order to track biological relatives of an unknown donor by matching segments of shared DNA [Erlich Y. Shor T. Pe’er I. Carmi S. Identity inference of genomic data using long-range familial searches., Henn B.M. Hon L. Macpherson J.M. Eriksson N. Saxonov S. Pe'er I. Mountain J.L. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples., Forensic genealogy-a comparison of methods to infer distant relationships based on dense SNP data., Morimoto C. Manabe S. Kawaguchi T. Kawai C. Fujimoto S. Hamano Y. Yamada R. Matsuda F. Tamaki K. Pairwise kinship analysis by the index of chromosome sharing using high-density single nucleotide polymorphisms.]. One of the key success elements is that only a fraction of the population of interest needs to be present in the database in order to be able to, in theory, identify every individual in the population by applying genetic genealogy methods. Erlich and colleagues [Erlich Y. Shor T. Pe’er I. Carmi S. Identity inference of genomic data using long-range familial searches.] estimated that if 1% of the individuals, in the population of interest, are present in the genealogy DNA database there is more than 90% chance to find at least one 3rd cousin for every individual in the population.DNA typing using microarrays is an easy, cheap and fast way to establish the genotypes needed. However, the use of microarrays normally requires larger amounts of DNA (in the order of hundreds of nanograms) [Application of DNA microarray to clinical diagnostics.]. Such a high amount of DNA is not always present in forensic samples, which instead may be in the order of nano- to subnanogram levels. It is also not uncommon that forensic samples display various levels of degradation and enzymatic inhibition both of which most often have a negative effect on downstream analyses [Alaeddini R. Walsh S.J. Abbas A. Forensic implications of genetic analyses from degraded DNA--a review., Instability and decay of the primary structure of DNA., Sidstedt M. Radstrom P. Hedman J. PCR inhibition in qPCR, dPCR and MPS-mechanisms and solutions.]. Although successful use of microarrays for IGG purposes has been demonstrated [Greytak E.M. Moore C. Armentrout S.L. Genetic genealogy for cold case and active investigations.], progress in DNA sequencing technologies has significantly increased the possibility to process biological samples with degraded DNA of low quantity [Advancements in next-generation sequencing., Trends in next-generation sequencing and a new era for whole genome sequencing., Petersen B.S. Fredrich B. Hoeppner M.P. Ellinghaus D. Franke A. Opportunities and challenges of whole-genome and -exome sequencing.]. In this case study we used whole-genome sequencing for which standard protocols are available for as little as 50 pg of input DNA [].If the established SNP data set lacks observed genotypes for a large proportion of SNPs, missing genotypes can be inferred by methods referred to as genotype imputation [Browning B.L. Zhou Y. Browning S.R. A one-penny imputed genome from next-generation reference panels., Genotype imputation for genome-wide association studies.]. The aim of genotype imputation is to predict and estimate genotypes for SNPs not typed in the sample. The basic idea is that any two individuals, including apparently unrelated, can share short segments of DNA from a distant common ancestor. Such DNA segments are shared IBD (identity by descent). Factors like high levels of linkage disequilibrium (LD) and low recombination rates within small stretches of chromosomal segments will conserve haplotype variants through generations. Shared segments can be found if the observed genetic variants, in the studied sample, are compared with variants from a panel of reference individuals (e.g. 1000 Genomes Project [Auton A. Brooks L.D. Durbin R.M. Garrison E.P. Kang H.M. Korbel J.O. Marchini J.L. McCarthy S. McVean G.A. Abecasis G.R. A global reference for human genetic variation.]). From these shared segments, prediction of the missing genotypes in the sample can be performed based on the observed genetic variants in the reference individuals. There are a wide range of software available and a large number of studies have been conducted to study the performance and accuracy of genotype imputation [Shi S. Yuan N. Yang M. Du Z. Wang J. Sheng X. Wu J. Xiao J. Comprehensive assessment of genotype imputation performance., Das S. Abecasis G.R. Browning B.L. Genotype imputation from large reference panels.].Although the application of IGG has been shown to be successful, critical concerns have been raised regarding its use for law enforcement purposes. These opinions include issues related to ethical and legal aspects [The Golden State Killer investigation and the nascent field of forensic genealogy., The impact of investigative genetic genealogy: perceptions of UK professional and public stakeholders., Forensic genealogy: some serious concerns., Forensic genealogy, bioethics and the Golden State Killer case.] and the future use of IGG in the forensic field will therefore not only involve technical challenges. The US Department of Justices (DOJ) published an interim policy in September 2019 in which they, at a general level, described when and how IGG should be used, its limitations, how data should be administrated etc. []. In early 2020, recommendations were also published by the Scientific Working Group on DNA Analysis Methods (SWGDAM) on the use of IGG []. In addition to the reports in the US, similar reports have been published in Australia [Scudder N. Daniel R. Raymond J. Sears A. Operationalising forensic genetic genealogy in an Australian context.] and the UK [] with respect to the potential use in these countries.

The aim of this paper is to summarize and report how IGG successfully was used in a pilot case study to solve a double murder cold case in Sweden. In this paper, we share and discuss details from all parts of the case study including legal and ethical considerations, the extended DNA-analysis, genealogy database searches, the succeeding genealogy and finally the conclusive lead which ultimately resulted in the closure of the second largest criminal investigation in Swedish history. We believe that a high degree of transparency including the disclosure of details, as in our paper, is important to get an informed and fact-based discussion within the forensic community as well as the public.

2. Legal considerations and case description

After the reporting in Swedish media of the successful use of IGG to catch the “Golden State Killer” the question was brought up within the Swedish Police Authority if this tool could be used also by Swedish law enforcement. A legal inquiry concerning the use of IGG in Sweden was initiated in May 2018 by the National Forensic Centre (NFC) and performed in cooperation with the Legal Affairs Department within the Police Authority.

The legal inquiry was finalized in January 2019 and covered a suggested method with defined case inclusion criteria, as well as legal considerations and a methodological framework [

DNA-spår och släktforskning, Legal inquiry, the Swedish Police Authority, A637.388/2018, (2019).

]. As it turned out, the criteria and framework set up for the Swedish pilot case study was much in line with the interim policy later published by the US Department of Justice [] as well as SWGDAM recommendations [].

A data protection impact assessment was performed as part of the legal inquiry in accordance with Swedish and European Union laws and regulations. The possible infringement on privacy rights were estimated in the data protection impact assessment to encompass the person who left the DNA at the crime scene, users of the genealogy databases as well as their relatives. In the proportionality assessment, society’s interest of solving a major violent crime was assessed to carry more weight than the risks of infringement of privacy rights for the above mentioned persons or categories of persons.

The legal inquiry further dealt with aspects of the division of responsibilities between the different actors (NFC, the police crime investigators etc.) in regard to the different methodological steps. Genetic data refers to personal data relating to a person's inherited or acquired genetic characteristics. In Swedish law genetic data is considered to fall within the scope of sensitive personal data. According to the judicial inquiry, a DNA analysis of a trace and the documentation surrounding the analysis are covered by the definition of genetic data. Subsequent processing of the documentation, on the other hand, constitutes a processing of personal data and not genetic data. Furthermore, it was stated that there is legal support for NFC to process genetic data for forensic purposes while other units within the Police Authority do not have such support. According to the judicial investigation, NFC also has legal support to contract other laboratories or other Swedish expert bodies for assistance with expertise.

The legal inquiry also dealt with the provision of absolute necessity to be able to use the method in a specific case as well as the issue of transferring personal data to a third country (i.e. a country outside the European Union). According to legislation, as described in the legal inquiry, the Swedish Authority for Privacy Protection (IMY) would need to be informed in writing after each data transfer to a third country since sensitive personal data information had been transferred. Other important prerequisites were that the database company would not be allowed to use the information received from the Swedish police for any other purpose than the requested searches in the database, and also that following completion of the work all of the data transferred to and processed by the database company should be possible to erase upon request.

Regarding the different steps in the method, ethical concerns were assessed to primarily arise in connection to the use of the commercial genealogy databases. Prior to the pilot case study the Police Authority’s ethical council

The Swedish Police Authority Ethical Council constitutes external experts appointed by the government for periods of four years. The Ethical Council has an advisory role and is headed by the National Police Commissioner.

was thus consulted on the basis of the legal inquiry. The discussion in the council covered the methodology specified in the judicial inquiry as well as possible infringement on privacy rights. The council supported the continuation of the method development work with a so-called pilot case study.

Our aim was that a pilot case should cover a number of key steps, including: 1) to establish DNA datasets that could be used for searches in genetic genealogy databases, 2) to transfer DNA data and search in genealogy databases available for law enforcement (according to user terms and conditions and formal consent of other database users) and, 3) perform genealogy work based on information gathered from the database search, 4) generate investigative leads for the police investigation. In addition, the pilot case study as such also aimed to: 5) illustrate different aspects concerning the handling of sensitive personal data (genetic information), and 6) from both a technical and legal point of view evaluate the workflow as proposed in the legal inquiry.

Furthermore, a legal checklist was compiled. This checklist was a complement to the judicial inquiry and was written as a simpler user support to be used during the pilot case. The checklist was used primarily by NFC and to some extent by the crime investigators.

For the pilot case study a double murder cold case was selected. Early in the morning October 19th, 2004 an 8-year-old boy was on his way to school when he was fiercely attacked by an unknown perpetrator who stabbed him to death. A 56-year-old woman had just come out of her home close by and witnessed the assault. The perpetrator then attacked her and she received several stab wounds. The attacks were fatal and both the boy and the woman died from their injuries. The murder weapon, a butterfly knife, was found left at the crime scene and seized for forensic examination.

During the forensic examination of the butterfly knife, the police's technicians and experts, from the National Laboratory of Forensic Science (SKL) the predecessor of today’s NFC, found DNA traces from three persons. It was a mixture of DNA that matched the two victims as well as DNA from an unknown person. The unknown person's DNA was found on several additional exhibits seized in the case, together confirming the relevance of that trace.

The DNA profile from the unknown person has subsequently been searched for and was also searched continuously during the pilot case in the national DNA database as well as internationally against for example European countries through the Prüm Treaty. Extended analysis was carried out with multi-dimensional scaling (MDS) analysis based on 24 ancestry-informative autosomal SNPs and with four global reference populations, resulting in an assessment of the perpetrator's biogeographical origin to Western Eurasia. Y- and mtDNA-SNP analysis was also performed as well as hair and eye color predictions using the HIrisPlex system. Considering the results from the analysis it was concluded that the person likely was of European ancestry. From a knitted cap (probably worn by the perpetrator) left near the crime scene blond hair was recovered and witnesses testified that the perpetrator looked Swedish. Altogether, the investigation assumed the perpetrator to be of northern European origin. This assumption was not used for exclusion but for prioritizing between persons to interrogate. A familial search was also carried out in the national DNA database early 2019 without success. In the case, an extensive DNA-sampling of more than 6000 individuals had been carried out throughout the years (testing of selected persons was also ongoing during the project) and more than 9000 persons had been interrogated.

NFC decided, with the legal inquiry in mind, and together with the officer in charge of the murder investigation, that this murder case could be used in the pilot case study. The decision was based on the fact that relevant DNA traces were still available, and the extent of which available forensic DNA tools had been used to try and solve the case as well as case circumstances in general all met the criteria to justify IGG being tested in this specific case.

The case had already been discussed in connection with, and used as a case to “lean on”, in the development of the proposed methodology in the legal inquiry. The methods used for the DNA analysis, database searches and genealogy are described in detail in the next section. After the work was completed NFC made requests to the genealogy database companies to delete all entered DNA data files, account information, etc. This was done by both the database companies involved and confirmed within a couple of days.

An evaluation report was written after the end of this pilot case. The work was presented as well as experiences gained, conclusions and suggestions for the continuous work. It was concluded, that under the right circumstances and conditions, the use of IGG can truly be an extremely powerful tool for Swedish criminal investigations although it must be used with extensive care. With the goal of having a high degree of transparency the evaluation report, written in Swedish, was made publically available in November 2020 []. The pilot case study was the product of a successful cooperation between different parts of the Swedish Police Authority, including the Legal Affairs Department, region Öst and the National Forensic Centre, together with expertise from the National Board of Forensic Medicine, an external laboratory and a contracted genealogist.4. Results & discussion

The Results & Discussion section focus mainly on analytical and technical aspects encountered during the course of the pilot case work.

Regarding the DNA extract used, the DNA concentration was measured to 0.9 ng/µl and 10.8 ng/µl for DNA extract 1 and DNA extract 2, respectively. Analysis of the integrity of the DNA showed that the DNA was heavily degraded in both DNA extracts (Supplementary Fig. 1). Complete STR profiles were however obtained for both DNA extracts. The STR analysis also showed a single contributor. Genotypes for 129 out of the 131 SNPs in the MPS based targeted SNP assay met the quality criteria. The genotype calls were shown to have high coverage and to be well balanced (Supplementary Fig. 2).The first WGS analysis resulted in a much lower coverage and higher duplication rate than expected (Table 1), and merely 155,000 SNPs met the quality criteria for the genotype calling (Table 2). Due to the highly degraded DNA, the median insert size was only around 60 bp. Thus, the 150 bp reads only partially contained the actual DNA of interest. Standard Chelex-based extraction methods generate single stranded DNA [Casquet J. Thebaud C. Gillespie R.G. Chelex without boiling, a rapid and easy technique to obtain stable amplifiable DNA from small amounts of ethanol-stored spiders., Simon N. Shallat J. Williams Wietzikoski C. Harrington W.E. Optimization of Chelex 100 resin-based extraction of genomic DNA from dried blood spots.], however it is expected that a proportion of the DNA is renaturated after extraction, resulting in a mixture of double and single stranded DNA. As the library preparation starts from double stranded DNA it may have had an influence on the performance of the WGS. Also, impurities in the Chelex extract might have influenced the process [Sidstedt M. Radstrom P. Hedman J. PCR inhibition in qPCR, dPCR and MPS-mechanisms and solutions., Walsh P.S. Metzger D.A. Higuchi R. Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material.]. Furthermore, a relatively high degree of genotype errors were estimated when cross-validating this first WGS SNP dataset with the genotypes obtained from the targeted SNP assay. However, this estimate was interpreted with much care since only seven SNPs (out of the 131 SNPs) met the quality criteria in the WGS SNP data set. One of these genotypes was an allelic drop-out in the WGS SNP dataset (A/G in the targeted assay, A/A in the WGS SNP dataset). The conclusion from this analysis was that the WGS SNP dataset, and the potential results from database searches, should be handled with care. The search in GEDmatch, with this dataset, resulted only in very distant relatives (less than 30 cM in total shared segment lengths for top hits) from which limited genealogy work was performed.

Table 1Summary statistics from the WGS runs.

Table 2Summary of the established dataset and database searches.

Genotype imputation was applied on the dataset in order to increase the number of genotypes. Approximately 6 million genotypes were called and used as the observed genotypes from which the missing SNP genotypes were imputed. After the imputation, the number of genotypes increased from roughly 155,000 to 864,000 (Table 2). Interestingly, the estimated genotype error rate decreased at the same time to approximately 6%. The search in GEDmatch resulted, despite the many more SNPs, in a similar matching pattern as for the initial dataset (e.g. less than 30 cM in total shared segment lengths for top hits).Next, a new WGS was performed on a replicate library preparation (DNA library 1.2). The output was similar as for the first WGS run regarding coverage and duplication rate (Table 1). The output reads were therefore merged into the first dataset and a new imputation round was performed. This resulted in a slightly increased number of genotypes (Table 2) with an overall genotype error rate of approximately 4%. However, the search in GEDmatch still resulted in a similar matching pattern as with the initial dataset (e.g. less than 30 cM in total shared segment lengths for top hits). Despite the absence of close relatives, the matching lists were analyzed by the contracted genealogist and a cluster of potentially distant relatives originating from northern Germany was discovered. This trail proved hard to investigate further, and other actions were eventually made in order to bring the investigation forward. It should be noted that any German origin was not observed in the succeeding and final analyses. One can conclude that there is a certain risk that an investigation be led in the wrong direction by the genealogy searches and for future work it is important to establish quality parameters and thresholds for the genealogy data analysis.

From this later dataset, a separate SNP dataset was established and sent to FTDNA for evaluation. This dataset did however not meet FTDNA’s internal quality evaluation and were therefore not used in any search.

Due to absence of useful hits, it was decided to start over with the analyses using a completely new DNA extract. The new WGS analysis (DNA library 2.1) resulted, in contrast to the previous attempts, in a high coverage dataset with a low duplication rate (Table 1). The genotypes for approximately 1.3 million and 1.9 million SNPs were called (GEDmatch and FTDNA SNPs, respectively) and no genotype errors were detected when compared with the targeted 131 SNPs. Searches were performed in GEDmatch and FTDNA. Despite the now apparently good WGS SNP dataset still no close relatives were found in GEDmatch and, as with previous searches, all top hits had less than 30 cM in total shared segment lengths. The search in FTDNA was however more fruitful and yielded several hits that were used for family pedigree building and genealogy. In total, 890 hits were obtained in the first search of which the top 28 individuals were used for the genealogy analyses (top two hits shared about 60–100 cM with the unknown and later turned out to be 2nd cousin once removed and 3rd cousin once removed to the perpetrator). Family pedigrees were built back to the late 18th century, in search of common ancestors, and matching DNA segments were mapped looking for triangulation. During the process, 15 volunteers with known origin from a specific part of Sweden (that emerged as of high interest due to the genealogy work performed) provided their DNA samples to FTDNA whereas one of them turned out to be a closer match (shared about 347 cM in total) []. From the subsequent mapping of descendants of the common ancestors, including investigative information such as year of birth, a pair of brothers remained as candidates to be the unknown perpetrator. Buccal swabs were subsequently, following prosecutors decision, obtained from both brothers and with comparative routine STR profiling, one of these brothers was confirmed to match the crime scene sample. The suspect confessed and was later convicted for the double murder.

In this case report we describe how IGG was used to obtain conclusive investigative leads which led to the arrest and conviction of the perpetrator of a double murder cold case. From many aspects, this was not a trivial case. Not only due to legal and ethical challenges but also due to the application of DNA analysis methodologies not normally used in forensic genetic analysis.

From a technical point of view there were two main obstacles that needed to be addressed in order to take the investigation forward. Both of which were due to the limited quality and the low quantity of the DNA. Firstly, the large number of SNPs with missing genotypes. When performing searches in genealogy databases, with a limited number of SNPs, there is a risk that the output does not match the expected pattern as if one had searched with a complete SNP dataset. A low SNP density could result in a decreased total segment length since some shared segments will not meet the SNP density criteria and thus will not be included and added to the total shared segment length estimate. The consequence of this could be that true close relatives may be estimated as more distant relatives and therefore be ignored (e.g. considered not worth further investigation) by the genealogist. Ultimately, distant relatives may go completely undetected. In contrast, false positives could appear if the SNP density threshold is set too low, which can result in falsely shared segments being added to the total shared segment length [Forensic genealogy-a comparison of methods to infer distant relationships based on dense SNP data.]. These issues were discussed with the genealogist in the team and database searches were performed with this in mind so that the obtained shared cM was interpreted with care.The second obstacle was the presence of genotype errors in the established datasets. As noted above, since we had the possibility to cross-validate the WGS SNP datasets, with the genotypes from the targeted SNP assay, we could get a rough estimate of the proportion of genotype error and also the type of error (e.g. allelic drop-out [false homozygous], allelic drop-in [false heterozygous] etc.). We believe that it is crucial to have such a possibility to assess the quality of the established WGS SNP dataset to be able to make informed decisions for the intended usage. When it comes to the impact of genotype error for the segment sharing estimations, different types of errors will have different impacts. A larger proportion of false heterozygous genotypes may create false matching segments and/or too long matching segments, and thus increase the risk of false positive hits. False homozygous genotypes may, on the other hand, prematurely terminate shared segments and could cause false negatives, in a similar way to the low SNP density situation. In our case, the majority of the errors comprised of allelic dropouts, which is expected when dealing with forensic samples of low quality and quantity [, Tvedebrink T. Eriksen P.S. Mogensen H.S. Morling N. Estimating the probability of allelic drop-out of STR alleles in forensic genetics.]. Most of the segment analysis algorithms do however allow mismatches of this type to a certain degree. In GEDmatch, for example in the one-to-one tool, the user may define the mismatch threshold manually depending on the quality of the dataset. If allelic drop-outs are not taken into account close relatives would share smaller segments than expected (because of breakdown of matching segments due to lack of allele sharing). Also, some of the shared segments can, due to the same reasoning, become shorter than the cutoff for a single segment to be included in the total shared segment length, and therefore a close relative may appear more distant than expected. Similarly, distant relatives may go undetected. As previously noted, searches were made with this in mind, and discussed with the genealogist beforehand.Since no useful hits were obtained for the first three rounds of searches in GEDmatch we could not, at the time of the searches, exclude the risk of not detecting any relatives due to the low SNP density or due to the observed genotype error rate. But since useful hits were not found with the last SNP dataset either, we can conclude that the absence of hits was most probably due to the absence of relatives in the GEDmatch database, and not due to low SNP density or genotype errors in the WGS datasets. The absence of useful relatives in GEDmatch highlights, however, another important aspect; the content and the size of the databases available for law enforcement searches. The GEDmatch database is heavily weighted towards individuals living in the US [Kling D. Phillips C. Kennett D. Tillmar A. Investigative genetic genealogy: current methods, knowledge and practice.], and the number of individuals who have “opted in” is currently (December 2020) only around 325,000 []. Verogen, the owner of GEDmatch, has recently released a law enforcement dedicated portal, GEDmatch Pro (https://pro.gedmatch.com/). Verogen states that this new portal “…separates police comparisons of GEDmatch data from standard genealogy activities and offers a range of tools most relevant to help further investigations.” This portal is also likely to include measures related to the previously identified security breaches [, ]. Hopefully, the transformation of GEDmatch will also attract new users, some of whom will allow law enforcement searches.

留言 (0)

沒有登入
gif