The presentation of results from studies in clinical haematology

Research is based on trying to find answers to specific questions or to test hypotheses. Studies are thus undertaken to generate data which, with appropriate statistical methods, will help to determine the validity of the science under investigation.

The aim of this paper is not to provide answers on which statistical methods to use, but will concentrate on suggesting the best ways of presenting the results of appropriately analysed data. And presentation is the key, because however well conducted and analysed a study may be, incorrect or inappropriate presentation of the findings will severely hamper its publication potential. However, here we should be pragmatic, the reader must be able to relate to the clinical or scientific findings in a meaningful manner, and so some flexibility to the adherence to strict statistical analysis norms should be tolerated. Thus for example, a study investigating possible prognostic factors in a survival outcome setting could be limited to only a multivariate analysis, but this would deny a reader of ‘getting a feel’ for the data, and so univariate analyses in this example would simply enhance the understanding of the dataset under investigation. This paper therefore, covers the fundamentals required in the presentation of study objectives, population selection, description of characteristics and missing values, unadjusted analyses, multivariate regression models and matched pair analyses.

Studies have to be justified, and so in describing the objectives a review of pertinent recent publications should be included to set the scene. The type of study should be clearly specified: randomized or non-randomized trials, prospective or retrospective non interventional studies, as well as the source of the data. It should also be stated that the study follows international guidelines for the specific design being utilised.

Patient inclusion and exclusion criteria should be presented in such a way as to be reproducible in an independent fashion. It is imperative that numbers of patients are consistent throughout the manuscript and that all missing/unknown values are accounted for.

If conducting a clinical trial, then it is mandatory to provide information with regards to patient selection in a figure – the Consort Flow Diagram (https://www.consort-statement.org/consort-statement/flow-diagram), though such a representation is equally valuable in all studies to clearly present inclusion and exclusion criteria (Fig. 1).

When reading a manuscript from a reviewer's point of view, an easy check on whether correct analyses have been undertaken, is to simply count the numbers of patients each time there is a pertinent reference. Omitted patients, unless justified, always raise suspicion that inappropriate analyses may have been carried out.

A.

Patient population description (Table 1)

Descriptive, prognostic or exploratory variables can be divided into categorical, ordinal and continuous. In a descriptive table, categorical and ordinal data should be presented as frequencies, with percentages of the total. Treating and coding ordinal categorical variables as continuous covariates is inappropriate, thus for example, stage of disease (1,2,3,4) is provided with frequencies and percentages, and not with a median and range.

Continuous variables should be described with a mean or median as appropriate, together with a measure of variability. The standard deviation can provide an estimate of the study sample variability, however, a mean and standard deviation should only be provided for variables with a normal distribution. Although the inter-quartile range appears to be gaining prominence in academic papers, it does require interpretation (i.e. the inter-quartile range defines the central 50% of the population), and should always be provided together with the range. The range in fact provides vital information relating to inclusion criteria for a study – thus if in the methods section it is stated that only adult patients were enrolled in the study (e.g. >18yr) and the description of age is provided as median 45yr (IQR 35–65; range 15–83), then a reviewer would question either the entry criteria or whether a typographical error had been introduced.

If continuous variables have been grouped by means of cut-off values, these need to be justified. The choice of cut-off values should ideally be based on existing previously published scientific or clinical evidence, or quartiles derived from the population under study. However, cut-off values should not be derived in order to produce groups that achieve a satisfactory statistical outcome [1]. This would be an example of a post hoc analysis, where there may be a desire to produce positive statistical results solely to justify the purpose of a project. Such an approach would invalidate the interpretation of the hypothesis tests.

Outcomes studied for haematological diseases can occur at varying times during patient follow-up. The most appropriate statistical methods to analyse such data are survival analyses and these encompass a variety of methods. The first step in such an analysis is to define an appropriate time scale and time origin for the data. A misspecification of the time origin can lead to biased estimates of all the outcome probabilities of interest: when studying response to a treatment given for relapse, the time origin must be the treatment start date and not relapse, because time from relapse to treatment could introduce a bias.

In the analysis of time-to-event outcomes, patients who are alive without experiencing the event of interest are censored at last contact. A complication to this simple scenario is that alternative outcomes can preclude the occurrence of the event of interest. Thus, in trying to estimate the probability of relapse, a patient may die before relapsing, and so this becomes a competing risk setting and requires the use of specific methods.

The Kaplan-Meier estimator [2] is commonly used to estimate survival probabilities and this method gives the possibility to produce a clear graphic display of the outcome survival curve over time. The appropriate method to summarize endpoints with elements of competing risk, is the cumulative incidence curve [3].

a.

The survival curve using the Kaplan Meier method

Survival curves show for each point in time t the probability that the event of interest has not occurred before t. Such curves start at 100% (there is 100% chance that the event of interest has not occurred at time 0) and decrease over time. For a given survival curve, one can assess whether the 50% probability line is crossed. If not, the median survival has not been reached, and one cannot describe the survival in terms of a median survival time. Thus, an established time point needs to be chosen that is recognised (by the scientific community) as being of clinical relevance (e.g. 2 or 3yrs post start of study) (Fig. 2, Study 1). If the survival curve does cross the 50% probability line (Fig. 2, Study 2), then an estimate of the median survival time is available. However, the choice of a descriptive statistic still has to be carefully considered, though it is generally always appropriate to use a probability of survival at an established time point (Table 2).

As a general rule of thumb, having less than 5 patients on a survival curve is not advisable as any subsequent event will lead to at least a 20% decrease in the survival probability, a step which is unlikely to be a true reflection of the actual probability estimate. If there was only 1 evaluable patient left, and that patient experienced the event of interest, the curve would drop to 0%, irrespective of whether the previous probability was 80% or 20%, so clearly such an estimate would not be of value.

Censored patients can be indicated on the survival curve with tic marks (Fig. 2), but these provide only limited information. Thus the optimum additional information to be provided alongside the survival curve, are numbers of patients at risk over the time period under observation (Fig. 3). In addition, the number of events occurring over the time period can also add valuable information.

The Y-axis should be labelled as a probability (of survival, or event-free survival or whatever the event being studied), and the X-axis should be labelled as a time (in units) post start of treatment/randomisation/study start (Fig. 2).

b.

Cumulative Incidence curves

Cumulative incidence curves show the opposite of survival curves, i.e. for each point in time t it is the probability of having had the event of interest before that time t. These curves will start at 0 and will not reach 100%, even with complete follow-up, if the competing event has occurred for some patients. In order to interpret cumulative incidence curves, visual inspection of the primary outcome and the competing event curves should be undertaken to try and understand underlying mechanisms. E.g. for relapse and death post stem cell transplantation, when a category of patients has a small risk of relapse, one needs to examine whether this means that they have a good prognosis or that they died too early from complications to experience a relapse (shown by a high Non Relapse Mortality curve) (Fig. 4).

留言 (0)

沒有登入
gif