Should results of HLA haplotype frequency estimations be normalized?

Dear Editor,

regarding the comment by Nunes (Nunes, 2021) on our publication ‘Estimating HLA haplotype frequencies from homozygous individuals’ (Seitz et al., 2021):

The only difference between the approach preferred by Nunes and our analysis is that we normalized the estimated haplotype frequencies (HF), that is, we multiplied each frequency by a constant factor chosen so that the frequency sum equals 1. So, the question is whether it is appropriate to normalize an HF set obtained from a corresponding estimation procedure.

We think there may be no universal answer to this question, but that it depends on what the frequencies are intended to be used for. As we mentioned in the introduction of our original paper, we are particularly interested in questions in the context of stem cell donor registries such as what proportion of patients of given ethnicity will find an HLA-matched donor in a registry of defined size and ethnic composition. This question is usually (Beatty et al., 1995; Müller et al., 2003; Schmidt et al., 2014) answered via a two-step procedure: First, one estimates population-specific HF from appropriate samples of HLA-genotyped individuals. Then, the HF obtained are used as input for the determination of matching probabilities (MP) by registry size. In the simplest scenario (all donors and patients are from the same population), this is done using the formula urn:x-wiley:17443121:media:iji12556:iji12556-math-0001 (Müller et al., 2003). Here, urn:x-wiley:17443121:media:iji12556:iji12556-math-0002 is the MP, urn:x-wiley:17443121:media:iji12556:iji12556-math-0003 is the registry size, and the urn:x-wiley:17443121:media:iji12556:iji12556-math-0004 are the genotype frequencies (GF) of the population under consideration that are derived from the HF determined in step 1 under the assumption of Hardy–Weinberg equilibrium (HWE).

We will now analyze the implications of using non-normalized HF sets for MP estimation with the help of the frequency sets from our original paper: For the sums urn:x-wiley:17443121:media:iji12556:iji12556-math-0005 of the estimated HF without normalization, we obtain urn:x-wiley:17443121:media:iji12556:iji12556-math-0006, urn:x-wiley:17443121:media:iji12556:iji12556-math-0007, and urn:x-wiley:17443121:media:iji12556:iji12556-math-0008 for the 4-, 5-, and 6-locus scenarios, respectively. (These results can be easily calculated from data given in the Supplementary Information of our original paper.) It is straightforward to deduce that urn:x-wiley:17443121:media:iji12556:iji12556-math-0009. In our three scenarios, we have: urn:x-wiley:17443121:media:iji12556:iji12556-math-0010, urn:x-wiley:17443121:media:iji12556:iji12556-math-0011, and urn:x-wiley:17443121:media:iji12556:iji12556-math-0012.

This means that even in a setting with identical donor and patient populations and arbitrary registry growth, one can never achieve an MP greater than 0.857 in the 6-locus scenario. On the other hand, in the other two scenarios one achieves MP well above 1. These unreasonable results provide, in our view, a strong argument that normalized HF sets are the appropriate outcome of HF estimation for our purposes. As stated above, one may reach different conclusions in other contexts although it might generally be difficult to interpret a frequency from a non-normalized HF set with a frequency sum that deviates considerably from 1.

It should be noted that the question of HF set normalization arises generally, not only in HF estimation based on homozygous individuals. When analyzing the original data set (urn:x-wiley:17443121:media:iji12556:iji12556-math-0013) with the expectation-maximization (EM) algorithm (Excoffier & Slatkin, 1995) using our Hapl-o-Mat software (Sauter et al., 2018; Schäfer et al., 2017), the sum of all HF urn:x-wiley:17443121:media:iji12556:iji12556-math-0014 (corresponding to a unique occurrence in the sample) ranged from 0.993 (6-locus scenario) to 0.997 (4-locus scenario). The question of whether to normalize such an HF set is obviously less pressing than for the significant deviations of the HF sums from 1 that we obtained without normalization when estimating HF from homozygous individuals. This is another piece of evidence for the general superiority of the EM algorithm over the HF estimation from homozygous donors, which we had already clearly stated in our original paper.

For much smaller – and probably more common – sample sizes, however, the question if estimated HF sets should be normalized becomes more relevant also for the EM algorithm. To demonstrate this, we determined HF from a random sample (urn:x-wiley:17443121:media:iji12556:iji12556-math-0015) of the original sample using the Hapl-o-Mat software. The sum of all frequencies corresponding to at least one occurrence in the sample ranged from 0.772 (6-locus scenario) to 0.905 (4-locus scenario). Thus, if one wants to use such an HF set as input for MP estimation and to avoid unreasonable results like above, one has the choice to (a) include frequencies in the calculation whose underlying haplotypes are presumably not included in the sample at all; (b) normalize the estimated HFs; or (c) perform a combination of these two approaches. Indeed, the latter is what we have done in the past (Schmidt et al., 2020). We included, starting with the largest HF, all estimated frequencies – including those urn:x-wiley:17443121:media:iji12556:iji12556-math-0016 – up to a cumulative frequency of 0.995, and then normalized this HF set to 1. However, this is a merely pragmatic approach. To our knowledge, there is no standard way to generate input to the MP calculation from the output of the EM algorithm, let alone a mathematically proven optimal approach. We think it would be a worthwhile, though probably non-trivial, scientific effort to define one.

The authors declare no conflict of interest.

留言 (0)

沒有登入
gif