To evaluate literature about HDI, we first divided clinical descriptions in two general subtypes:
Case reports, that are descriptions of interactions observed in a specific patient.
Clinical studies, in which the interaction is investigated on a panel of participants in standardized conditions.
While other types of informative articles exist such as in vitro or animal studies, these studies require to consider many specific parameters to define a proper assessment scale. We decided to focus only on clinical data as adapting the scale to in vitro and animal data would require more than simple textual analysis of the article, which is beyond the goal of this work.
Severity scoreThe first and clearer information we considered important in our scale was the severity of the observed event. To define this severity, we decided to use a separate scale based on the clinical consequences and duration of the symptoms. This gradation is based on the previous work of De Smet and on the Common Terminology Criteria for Adverse Events (CTCAE v5.0) [14]. While this later provided complete and precise description of ADRs that makes severity gradation consistent, we judged it too time-consuming to be used in our case. We still decided to use a 7 general grades system based on that of De Smet and adapt it to HDI descriptions to establish a “severity score”. The first part of the score grades gravity of the event. This score is related to clinical manifestations observed in described cases. This score is divided in 7 grades (0 to 6) and ranges from absence of symptoms (grade 0) to death (grade 6). The severity can be assessed using the flowchart described in Fig. 2.
Fig. 2Flowchart used to assess the clinical risk when an HDI is described in a case report or clinical study. This flowchart was tested in the rounds with the scorers and validated
Imputability/generalization scoreOn the other hand, having noted the limitations of the literature describing HDI, we thought as appropriate to set up an “imputability/generalization score” based on key points that were more or less precisely described in the assessed publications. This imputability/generalization score reflects the level of confidence regarding the link between the observed events and the herb-drug combination given information available in the article, and how it can be generalized to further cases.
To investigate imputability and generalization in case reports and clinical studies, experts defined four key points that were judged necessary to handle an HDI:
Information about the herb, including the kind of preparation, specifications and / or composition.
The proper use of the herb: indeed, if consumption fails to follow recommendations by exceeding recommended doses or by exposing specific population (e.g. pediatric population), or if the use is recreational or specific to a geographic area, interpretation is harder to generalize.
The mechanism of HDI, the PK/PD/clinical parameters described and pharmacogenetics, chronological and statistical considerations.
The patient description (for case reports) or cohort (for clinical study), describing the clinical status of the subjects, and the composition of the cohort in case of clinical studies (with the possible and classic biases observed).
These four points were investigated to identify the necessary questions that needed to be answered when analyzing the interaction, in order to score it. Thanks to both severity and imputability/generalization scores described above, we established a final score to quickly inform health practitioners.
HerbAs mentioned above, in order to assess imputability and generalization, we first analyzed the elements that describe the herbal product used. The criteria used to evaluate herbal products description in the case reports or clinical studies are shown in Table 1 (a) and Table 2 (a). One criterion concerns the number of herbs taken simultaneously by the patient. As a result, we decided to use this information because interpreting HDI when multiple herbs are taken at the same time becomes hazardous. If 3 herbs or more are taken at the same time, we consider the HDI not interpretable to stick with our aim to make a general and user-friendly scale. This choice was made considering the literature on drug-drug interactions [15]. If the publication describes an HDI involving 3 or more herbs, the complete herb section is considered as not interpretable, and the score is 0. Consequently, the scorer does not have access to the other questions in the section.
Table 1 Criteria for the quotation of the imputability and generalization in case reports describing an HDI. The values for each of the lines were assigned after the first round. These values were discussed and adapted during the different runs to reach a consensus. Answers with a * automatically deactivate the subsequent questionsTable 2 Criteria for the quotation of the imputability and generalization in clinical studies describing an HDI. The values for each of the lines were assigned after the first round. These values were discussed and adapted during the different runs to reach a consensus. Answers with a * automatically deactivate the subsequent questionsThe next criterion considers herb’s description. If a herb is referenced in an article by the name of a herbal food supplement that contains it (criterion n°2), we can be rather confident about the correctness of the description. Yet, herbal food supplements are not as strictly controlled as herbal medicinal products, and variations in dosages may occur, leading to potential differences between batches. An article corresponding to this case will be assigned a score of 2 for this section, which is 1 point below the maximum. If the study concerns a single molecule only (criterion n°3), the evidence is lower as the corresponding herb is a completely different matrix with complex molecules combination, which has the potential to lead to different clinical effects. For this reason, we decided to assign a 0 in this case. The three following criteria concern herbs as such. The first of these criteria ensures that herb’s binomial Latin name is specified (criterion n°4). Even though vernacular name could be sufficient in some cases, binomial Latin name ensures to avoid some confusions. For example, “ginseng” can refer among others to Panax ginseng C. A. Mey., Panax quinquefolius L. or Panax notoginseng (Burkill) F. H. Chen ex C. Y. Wu & K. M. Feng (Araliaceae), but also to unrelated species used as adaptogens. Another crucial piece of information is the part of the herb used (criterion n°5), that will qualitatively and/or quantitatively influence the chemical composition of the herb. The last criterion is the extraction method (criterion n°6), which will also greatly influence the final composition of the herbal product depending on physico-chemical properties of the herbal components. The last three criteria being interdependent in order to get a satisfying description, we decided to base their score on this dependence. A satisfied criterion will give 1 point, while a non-satisfied one will account for − 1. Thus, if all criteria are fulfilled, we get the highest possible score as we have the best degree of description available through an article, at least without diving into complex composition description that would require specific knowledge to interpret. Yet, if at least one of these criteria is not fulfilled, the score drastically drops. Noteworthily, refined specifications were not considered, regarding the profiles of the potential users of this scaling method. Other quality issues which can hardly be assessed (e.g. fraud) were not considered. Nevertheless, clinical studies hardly provide specifications or control steps.
Interaction mechanismThe “Mechanism” section considers pharmacological clues about the interaction. It aims to assess how the event can be expected to be linked to concomitant use of the herb and the drug given their known PK and PD behaviors. The first three criteria concern the nature of the event. The article can either describe the absence of clinical effect when the herb and the drug are given concomitantly (Table 1 (B) and Table 2 (B) criterion n°1), a PD event (Table 2 (B) and Table 2 (B) criterion n°2) or a PK event (Table 2 (B) and Table 1 (B) criterion n°3). If no event occurred as considered in criterion n°1, we can be reasonably confident about the interpretation of the study as there is no pharmacological characteristics of interaction to investigate. In this case, we decided to give the section a good score (4 points). In other cases, we decided to give 5 points for PD events, as they are usually easier to predict, and to give 3 points for PK events, which are much harder to interpret due to the number of impacting factors. The next criteria inquires whether the interaction implies a single or multiple enzymes or transporters (Table 1 (B) criterion n°4 Table 2 (B) criterion n°4 and n°5). We consider this criterion important to identify the enzymes or transporters involved in the interaction. Indeed, this is a crucial step in assessing the link between the products and the described event. If multiple pathways are known, it becomes much harder to determine the mechanism of interaction. Pharmacogenetics may also influence the interaction, which is another concern. In particular, the implication of CYP2C9, CYP2C19 and CYP2D6 can lead to large interindividual variations and thus is to be considered (Table 1 (B) criterion n°5 and Table 2 (B) criterion n°6). The implication of these isoenzymes leads to harder generalization of potential clinical events [16]. The chronology of the event is also a key factor to be considered to assess the potential interaction. An event occurring directly after a single intake of the herb is a greater clue than if it occurs after some days/weeks or after multiple intakes (Table 1 (B) criterion n°6 and Table 2 (B) criterion n°7). Yet, we cannot completely penalize prolonged periods of intakes, as some events are known to occur only after some days/weeks (for example, enzymatic inductions are known to take some weeks to take place) [17]. The presence of a dechallenge or rechallenge, i.e. an improvement when a member of the interaction is withdrawn and/or reappearance when it is reintroduced (Table 1 (B) criterion n°7) is also a great clue of implication of the drug/herb considered in the event, though this information is rarely available. Finally, for clinical studies, the proper statistical analysis, the relevance of tests used and statistical significance must be considered (Table 2 (B) criterion n°8).
Patient or cohort key informationAn important section concerns the characteristics of the patients, defined in two separate tables: a table for a single patient in the case of a case report (or case series) (Table 1 (C)), and a table for the cohorts of clinical studies (Table 2 (C)). Concerning patients, the first characteristic to be considered is the age of patients (Table 1 (C) criterion n°1 and Table 2 (C) criterion n°4). Given that children and elderly patients might answer differently to the same herb/drug combination compared to adults, this information should be considered when trying to infer the risk in general population. A direct consequence of this is the health status and specific conditions of patients (Table 1 (C) criteria n°1 and n°2 and Table 2 (C) criteria n°5 and n°6). Two cases are to be distinguished here: the implied treatment can be related to the condition (i.e. an interaction involving immunosuppressants in a transplanted patient) or not (i.e. an interaction involving an anti-platelet drug in a transplanted patient). In the first case, the interaction is worth noting as we can expect other patients with the same condition to suffer from the same events and thus increase our vigilance about this interaction. In the second case, a potential event might be partially due to the specific condition of the patient and be less informative for general population. Even though this case is still informative in specific situations that should not be overlooked, we considered it less relevant when related with our goal of defining a generalized scale. In case reports, we also wanted to ensure that patient’s treatment was well balanced before introduction of the herb (Table 1 (C) criterion n°3). If it is not the case, defining whether the event is linked to the treatment or to an interaction is more hazardous. In the case of clinical studies, a crucial factor is the cohort composition. This includes the size of the cohorts (Table 2 (C) criteria n°1 and n°2), ethnicity (Table 2 (C) criterion n°3), sex and age (Table 2 (C) criterion n°4) and health status (Table 2 (C) criteria n°5 and n°6).
Proper useThe last criterion we decided to include to establish event causality is the proper use of the herb. We scanned the literature about those events and especially in case reports and we found out that herbal products are often consumed in unpredictable ways. For example, there are cases of patients suffering from adverse events while consuming two liters of St John’s wort (Hypericum perforatum L.) tea a day [18] or death potentially related to recreational use of kratom (Mitragyna speciosa Korth.) [19]. To distinguish cases in which herbal products are used as recommended by manufacturers or guidelines from those in which they are not, we defined general criteria for different herbal preparations. When the use does not respect recommendations, the score is reduced by 2 points. If usage is not described, it is reduced by 1 point, considering the lack of information while still not scoring it as abusive. In other cases, one point is added to the score.
Assessing warning level of an eventGiven the scores defined above, we can now assign a warning level to the event. This warning level should consider reliability of the case described as well as the severity of the event and its propensity to be extrapolated in general population. To take both severity and reliability into account, we decided to use a matrix to generate the final warning level. This matrix is composed of the severity of the event (Fig. 1) on the first dimension and of the reliability level on the second dimension (Tables 1 and 2). The reliability level is composed of the scores of all sections (Table 1 (A), (B), (C) and (D), Table 2 (A), (B) and (C)) synthesized in a unique score (Table 3). The final matrix is shown in Table 4.
Table 3 Thresholds for sections synthesized scores. These scores are obtained by adding the quotations obtained for each of the lines and 3 brackets have been designed to define whether we consider the descriptions to be good, just correct (average) or poorTable 4 Final risk assessment score matrixUsing the scores obtained in Table 4, we have summarized the results obtained from the severity scores AND the imputability/generalization scores. These scores are presented in Table 4.
The final reliability level is defined by synthesizing scores of all sections. The synthesized score ranges from 1 (bad) to 3 (good) and represents the quality of description of the given item in the article. Each of these scores is based on thresholds shown in Table 3. These thresholds are arbitrary and based on the minimum and maximum possible scores obtainable in each section. Final reliability index is obtained by averaging synthesized scores for all sections and rounding it down. By using a simple arithmetic mean, we give the same importance to all sections and thus penalize articles in which any section is poorly described. This choice was motivated by our belief that any event in which one of these sections lacks information greatly reduces confidence in its conclusion, whatever the concerned section.
These matrices can be accessed directly from the website. To simplify the scorer’s interpretation, a color code is used to help the user to make a quick decision.
留言 (0)