A Review of the Scoring and Assessment of Keratosis Pilaris

Disease severity assessment tools play a large part in evaluating skin conditions in dermatology. Currently, there is no existing validated assessment tool for keratosis pilaris (KP), a benign yet highly prevalent follicular disorder. A range of proposed scoring tools have been used in different clinical trials for the assessment of potential treatments for KP. A literature review of the current scoring systems used for KP shows that there is a lack of consistency with most studies using varying versions of unvalidated investigator global assessment (IGA) scores and quartile grading systems. A review of these studies shows that current methods of evaluating KP in clinical trials are subjective, unreliable, and inconsistent. A standardised and validated scoring system would be significant as it could be used in clinical trials to advance the current knowledge of KP.

© 2023 S. Karger AG, Basel

Introduction

Keratosis pilaris (KP) is a common, benign follicular disorder characterised by small keratotic papules in a folliculocentric distribution with varying degrees of surrounding erythema (shown in Fig. 1). The lesions are typically located on the extensor surfaces of the upper arms, thighs, and buttocks and may also be present on the cheeks and trunk [1]. It is the most common follicular disorder in children and affects approximately 40% of adults. Most patients with KP are asymptomatic, and some may be unaware of their condition. However, many patients do seek treatment as they find the condition to be cosmetically disfiguring which can lead to psychological distress and influence quality of life [2]. Although the focus of this review is on general KP, there are many subtypes of KP which will be briefly covered as well.

Fig. 1.

Appearance of KP on the extensor upper arm. Papules, erythema, and follicular plugging with keratin are seen.

/WebMaterial/ShowPic/1506359

Various treatment modalities have been trialled on KP, from emollients and exfoliants to laser treatments that have become increasingly more common. Despite the range available, there are still limited data on the response and effectiveness of these treatments [3]. An appropriate disease assessment tool would be essential for monitoring the progress of these therapies. Importantly, a standardised score would provide a more objective measure of disease severity, making it easier to interpret and compare treatment outcomes.

Therefore, this review will explore the literature on the current state of assessing and evaluating KP, the significance of a standardised score to measure disease, and demonstrate the need for a standardised and validated assessment tool for KP. This is the first literature review on this topic; therefore, it is significant in summarising the current information and identifying unknown areas for future research. Through this review, future research can be targeted to directly address these issues which may impact clinical activity by changing the future of KP treatment and providing a deeper understanding of the disease processes.

Epidemiology

KP is found in 50% of children and 40% of adults. As KP is an underreported condition, the actual prevalence may be much higher. It has no known gender or racial predilection and is reported to begin in childhood [4]. In a survey of 49 British patients aged 2–40 years, incidence was highest in the first decade of life and decreased with age [5]. Notably, this 1994 survey was only conducted in the UK, and further recent studies would need to validate this. Conversely, many report that KP may manifest in an individual at any age and that cases can persist throughout adulthood [2]. Some patients in the study also reported an improvement in the summer and worsening KP in the winter when the conditions are drier [5].

KP can occur more frequently in patients with dry and scaly skin. It can be associated with atopy, obesity, and conditions like ichthyosis vulgaris [2]. Atopic dermatitis (AD) is closely linked with KP due to overlaps in certain genetic mutations, with the barrier abnormalities caused by KP possibly enabling the development of AD. However, the genes are currently unknown, and despite being a very common disorder, there is less literature on KP in comparison to other dermatological conditions such as acne vulgaris or AD. From a research and epidemiological perspective, there is limited documentation on the incidence, patterns, exacerbations, and seasonal trends of KP [6].

Aetiology and Pathophysiology

KP is classified as a papulosquamous disease, a heterogeneous group of disorders characterised by scaly papules and plaques, with an unknown aetiology [7]. KP is thought to be inherited in an autosomal dominant pattern with variable penetrance [8]. The pathophysiology of KP is not completely understood, and there are numerous hypotheses regarding it. The most widely accepted theory suggests that the keratotic infundibular plug found in KP is the result of abnormal keratinisation of follicular epithelium [9, 10]. Thomas et al. [2] propose that KP is not a primary disorder of keratinocytes but rather a hair shaft disorder. KP occurs when coiled hair shafts rupture the follicular epithelium, causing defective follicular keratinisation and inflammation. This originated when the hair shafts of KP patients extracted with a needle retained their coiled shape even after being removed from the follicle. However, this study had a small sample size of 25 patients, and further research would be required to validate it. Additionally, Gruber et al. [11] postulated that both abnormal keratinisation and hair shaft abnormalities can be explained by the absence of sebaceous glands in an early step of KP pathogenesis.

Clinical Presentation and Diagnosis

KP is a common follicular skin disorder that presents as spiny, keratotic papules approximately 1 mm in size, that often contain fine-coiled, brittle hair. Papules may be grouped or scattered and may involve subtle perifollicular erythema (shown in Fig. 2). They may cause post-inflammatory damage such as hyperpigmentation, pitted scars. In cases where surrounding erythema is marked, the term keratosis pilaris rubra (KPR) may be used. In other cases where the papules are grayish-white, without erythema, the term keratosis pilaris alba is used. Another less common subtype, keratosis pilaris atrophicans (KPA), describes a condition succeeded by atrophy as the pinhead follicular plugs of KP are shed [12]. Pitted scars and alopecia, particularly in the eyebrows, may be observed in patients with KPA [13, 14]. KP is characteristically found on the extensor aspect of the upper arms, thighs, and buttocks, although it may also affect the face and trunk. The stippled appearance is colloquially termed “gooseflesh” or “chicken skin” [8].

Fig. 2.

KP papules with perifollicular erythema on the upper thigh.

/WebMaterial/ShowPic/1506358

Diagnosis involves a clinical history and physical exam, although dermoscopy can be used to aid the diagnosis and monitor response to treatments. Dermoscopic findings include hyperpigmentation, perifollicular erythema, scaling, dermal vascular ectasia, widened follicular orifices, coiled or twisted vellus hairs, and keratin plugs [9]. KP is usually asymptomatic but may be pruritic and cosmetically displeasing. This may prompt patients to seek treatment for the appearance of the lesions rather than symptoms from the lesions themselves [10].

Impact of KP

Though patients’ reactions and tolerance to the rough texture and cosmetic appearance may vary considerably, KP can have a significant social and psychological impact. The psychosocial impact of KP is complex and can be associated with developmental issues of body image, socialisation, and sexuality, particularly in the younger adolescent population. There is little documentation and literature covering the extent of these issues, and hence the impact of KP may be underestimated. Whilst some studies include a Dermatology Life Quality Index (DLQI) questionnaire, the outcomes are not reported [15]. Kootiratrakarn et al. [16] conducted a prospective, randomised clinical trial in Thailand and concurrently found that more than 40% of those with KP have significant effect on self-image and quality of life. This is congruent with an improvement in anxiety, depression, and body satisfaction following effective treatments. Notably, it was a single-centre study and lacked external validity.

Keratosis pilaris rubra is a common but rarely reported variant, presenting with marked erythema in addition to KP. Similarly, it can be socially disturbing, and patients tend to trial numerous therapies with limited effectiveness in reducing erythema. Schoch et al. [17] presented a case study of a 14-year-old boy with persistent KPR of his bilateral cheeks. The boy was self-conscious of the erythematous appearance and experienced stinging discomfort. He purposely missed out on physical education classes at school due to his embarrassment and discomfort associated with his condition. This example provides more information about the extent to which KP and its subtypes can impact patients; however, more research is required to further investigate these issues.

Evaluating KP

Despite the significant impact KP can have on patients, there is a lack of research into the treatment of KP [8]. In clinical trials, there have been a range of proposed scoring tools used to assess disease severity [3]. There has been no other recorded instance where a clinical assessment tool has been used to assess KP outside of treatment. A literature search was conducted on medical databases including MEDLINE, PubMed, Embase, and Scopus. This was performed with the keywords “keratosis pilaris” AND “scoring OR scoring system OR grading OR severity OR validation OR scale OR index OR outcome assessment OR IGA.” All final included papers were selected through a search strategy as shown in Figure 3. A summary table, shown in Table 1, of the relevant literature has been included.

Fig. 3.

Search strategies for articles relevant to the evaluation of KP.

/WebMaterial/ShowPic/1506357Table 1.

Summary of papers related to current evaluations of KP and their limitations

Authors; year; countryType of studyJournalNo. of patientsFemale:maleFP skin type/ethnicityMean age or range, yearsScoring system usedLimitationsClark et al., 2000 [12]; UKProspective studyJournal of Cutaneous Laser Therapy126:6N/A
Caucasian11Skin erythema estimated with erythema metre (mean reading). Skin roughness analysed with micrometre evaluation of a skin surface biopsy.
A smooth paste of adhesive was applied to the mapped area. A microscope slide was placed over the paste and left to set. The slide was removed, resulting in a skin surface imprint. These were analysed by computer micrometre to calculate the roughness profile which estimated the average skin roughness and the average roughness depth.- Small sample size
- Used specific equipment that would be hard to replicateBreithaupt et al., 2011 [20]; USADouble-blind, bilateral paired comparison studyPaediatric Dermatology27N/AN/A2–16Assessed using an IGA score at weeks 4 and 6. Evaluated hyperkeratosis, erythema, follicular prominence, papules, and pustules, with 1 = mild, 2 = moderate, 3 = severe. The IGA score was the sum of these categories.
QOL questionnaire was administered at the end of the study regarding symptoms of KP (and treatment efficacy).- Assessors not trainedPark et al., 2011 [23]; KoreaPilot studyAnnals of Dermatology126:6N/A
Asian26.3Photograph taken before each treatment session and 2 weeks after the last treatment. Clinical improvement was measured 1 month after last treatment using photographical data. Two independent, experienced dermatologists evaluated using a quartile grading scale/GIS 1–4 on skin texture (including decrease in papules) and dyspigmentation (erythema and hyperpigmentation)- Small number of patients
- 2D photography
- Assessors not trainedLee et al., 2013 [24]; KoreaRetrospective studyJournal of Cosmetic and Laser Therapy2621:5IV
Asian28.3Photographs assessed by two blinded dermatologists by comparing baseline and photos from 3 months after treatment in nonchronological order. GIS grade 1–4 was used.- Assessors not trained
- High mean age, likely children were not included in this studyCiliberto et al., 2013; USA [22]Pilot studyJournal of Drugs in Dermatology109:1I-III29Scale of 1–3 (mild, moderate, severe), measuring redness and skin texture roughness. Measured a month later.- Assessors not trained
- No blinding
- Small sample sizeSaelim et al., 2013; Thailand [28]RCTJournal of Dermatological Treatment188:10III-IV
Asian15–42Digital photographs taken at baseline and 4 weeks after last treatment. Blinded assessments of photographs by three unbiased dermatologists. GIS (−4 to 4) used to rate changes in global appearance, erythema, number of keratotic papules.- Assessors not trainedPark et al., 2014 [25]; KoreaPilot studyJournal of Dermatological Treatment1612:4N/A
Asian21.7Mexameter used to measure erythema index and melanin index. Standardised digital photographs were taken using identical camera settings and lighting conditions.
Evaluating investigator used a GIS on skin texture (including decrease in papules) and dyspigmentation (erythema and hyperpigmentation)- Does not account for worsening responses
- 2D photography
- Assessors not trainedEyler et al., 2015 [15]; USAPilot studyCellular Immunology and Serum Biology128:4N/AN/AKeratosis Pilaris Severity Index (KPSI), a tool used to measure KP severity and degree of body surface area involvement.
DLQI completed at each visit. Outcomes were not discussed.- No information on KPSI
- Does not mention training or validationIbrahim et al., 2015 [32]; USARCTJAMA Dermatology1815:3I-III18–65Rated redness and roughness/bumpiness on a scale of 0–3 (least to most severe) for a maximum score of 6 per patient per arm.
The score was not validated but scorers were trained prior by rating archival skin images. They then reconciled these ratings through face-to-face forced agreement. This process was repeated until concordance was achieved between ratings and their separately rated scores were consistently equivalent.- Inter-rater reliability
- Process of forced agreement is not outlinedKootiratrakarn et al., 2015 [16]; ThailandProspective, RCTDermatology Research and Practice50N/AN/A
AsianN/AThree independent dermatologists evaluated percentage of improvement. Categories graded include change in signs of papulae and post-inflammatory pigment alteration.- Assessors not trainedVachiramon et al., 2016 [29]; ThailandRCT/Prospective, randomised, single-blinded intraindividual comparativeBiomed Research International208:12III-IV
Asian27Standard digital photographs taken at baseline and after last treatment. Two dermatologists who did not perform the laser procedure evaluated the response through digital images.
GIS: keratotic papules, hyperpigmentation, and erythema were evaluated from −4 to 4 (minimal, moderate, good, excellent)- Small sample size
- 2D photography
- Asian patients with FP III-V
- Assessors not trainedSobhi et al., 2020 [27]; EgyptProspective, intraindividual comparative studyLasers in Medical Science1818:0III-IV
Arabic16–43Photographed before each session and 1 month after the last session. Subjective assessment was done by two unblinded and two blinded investigators. They scored with GIS from 1 to 4 for overall improvement of follicular prominence/keratotic papules and pigmentation.- 2D photography
- Limited Fitzpatrick skin types
- Small sample size
- Assessors not trained
- Female patients onlyMaitriwong et al., 2020 [31]; ThailandRCT, double blindLasers in Surgery and Medicine2311:13III-IV
Asian21–23Subjective evaluation with GIS (−4 to 4) to rate changes in skin roughness, erythema, hyperpigmentation, and global appearance in week 16.- Small sample size
- Limited Fitzpatrick skin typesIsmail et al., 2020 [30]; EgyptProspective studyJournal of Cosmetic Dermatology6060:0III-IV
Arabic19.7IGA 0–3 score: absent, mild, moderate, severe.
GIS assessed keratotic papules, hyperpigmentation, and erythema (−4 to 4). Prior to study, raters were trained on the use of the scale by rating archival skin images. The two independent scorers rated images separately and then reconciled their ratings through face-to-face forced agreements.- Only female patients
- Limited Fitzpatrick skin typesSiadat et al., 2020 [26]; IranRCT, non-blindedIranian Journal of Dermatology1010:0III-IVN/AStandard digital photographs were taken at baseline, 4 weeks and 8 weeks after the last treatment. Two dermatologists who did not perform the laser procedure evaluated the response through images. GIS on keratotic papules, roughness, hyperpigmentation, and erythema were evaluated. Grade 1–4 for improvement.- Assessors not trained
- Only female patientsTatavarthi et al., 2022 [21]; IndiaProspective hospital-based interventional studyJournal of Medical Sciences6034:26N/A18.9IGA scale at baseline and end of treatment assessing follicular prominence, erythema and scaling. IGA score 1 = mild, 2 = moderate, 3 = severe.- Assessors not trained

Clark et al. [12] from the UK conducted a prospective study to evaluate the response of KPA to pulsed tunable dye laser in 12 patients. They evaluated skin erythema and roughness objectively using tools. An erythema metre was used on a specific area three times, with the mean taken. For skin roughness, a microscope slide was applied over a smooth paste on the skin to create a skin surface imprint. This was then analysed by a computer micrometre to calculate the roughness profile, estimating average skin roughness and depth. Although the study was able to objectively evaluate KPA, it requires specific technology, therefore limiting its reproducibility.

Investigator global assessment (IGA) scores are commonly used in clinical trials to evaluate severity of diseases and assess the effectiveness of treatments [18]. The Food and Drug Administration (FDA) recommends using a 5-point IGA in many dermatology clinical trials. Notably, the FDA prefers non-validated IGAs over validated disease-specific tools for skin diseases [19]. Currently, there are no existing, validated scores for KP. Thus, studies have been using their own 3-point and 4-point variations, as discussed below, which makes it difficult to compare the effectiveness of different treatments.

Breithaupt et al. [20] from the USA and Tatavarthi et al. [21] from India evaluated KP treatment sites using the same 3-point IGA score at baseline and after treatment. Hyperkeratosis, erythema, follicular prominence, papules, and pustules were graded with 1 = mild, 2 = moderate, and 3 = severe. The composite scores were an average of these categories. Notably, the score was not validated, and assessors were not trained prior to use. Thus, the reliability and accuracy of the IGA score used remain uncertain. Likewise, Ciliberto et al. [22] in the USA designed a pilot study for the use of photopneumatic therapy, a treatment that combines light-based therapy with a pneumatic component to treat follicular plugging and inflammation, and used the same 1–3 scale to score redness and skin texture roughness of KP at baseline and 1 month after treatment. This study has similar limitations, as well as a small sample size (n = 10), and lack of blinding that make recall bias possible.

A pilot study by Eyler et al. [15] in the USA used a Keratosis Pilaris Severity Index (KPSI), a tool to measure KP severity and degree of body surface area involvement in 12 patients with a baseline classification of “moderate severity.” This was taken at each treatment visit to assess overall improvement. However, there is no further information available on the components of the KPSI tool, the scores, and the severity classifications. Attempts to contact the authors were unsuccessful. Thus, it can be concluded with the insufficient information that this is a non-unvalidated score used only by Eyler et al. [15].

Another common way of assessing the severity of KP is through a global improvement scale (GIS). Park et al. [23] in 2011 and Lee et al. [24] in 2013 both studied the effectiveness of different laser treatments in Korean patients, and in another 2014 Korean study, Park et al. [25] trialled combination peels for KP. The methodology used for scoring was identical and replicated in more recent studies in Iran [26] and Egypt [27]. Two independent dermatologists evaluated clinical improvement based on the digital photographs taken at baseline and 1 month after the last treatment. This was evaluated using a quartile grading scale where grade 1 = <25%, grade 2 = 25–50%, grade 3 = 51–75%, and grade 4 = >75% improvement on two categories: skin texture and dyspigmentation. Additionally, Park et al. [25] used a mexameter to objectively measure erythema and melanin index. GIS clinical assessments are limited as they only measure improvement and do not account for any worsening response to treatment. The 2D photography makes it difficult to properly evaluate skin texture, and the assessors were not trained on grading. Moreover, the sample size for these studies was small, they only represented one ethnicity, and three studies only included female patients [23, 26, 27].

A variation of the GIS is one ranging from −4 to 4, where in addition to the standard definitions outlined above, grade −4 = >75% worsening, grade −3 = 51–75% worsening, grade −2 = 26–50% worsening, grade −1 = 1–25% worsening, grade 0 = no change. This was employed by Saelim et al. [28], Kootiratrakarn et al. [16], Vachiramon et al. [29], Ismail et al. [30], and Maitriwong et al. [31] using digital photographs. This more extensive GIS accounts for worsening of disease severity and is typically used to score erythema, number of keratotic papules, and hyperpigmentation. Furthermore, the prospective study conducted by Ismail et al. [30] trained the two independent assessors. They rated archival images separately, then reconciled their ratings through face-to-face agreement until concordance was achieved before rating patient images. The scale used was able to measure an improvement seen in patients treated with fractional CO2 laser in lesions on the thighs. Although the GIS remains non-validated, this method helps to increase accuracy and reliability.

In a randomised controlled trial testing the treatment of KP with 810-nm diode laser in the USA, Ibrahim et al. [32] rated redness and roughness/bumpiness of KP in a treated site and control site on the arms of 20 patients with two blinded dermatologists. The score suggests that the 810-nm diode laser was highly effective in treating the roughness and bumpiness of the skin but was not effective against erythema. Thus, it would be promising to use in the nonerythematous variants of KP such as KPA. They noted there was no relevant validated scale, and they used a 4-point IGA score (0 least severe – 3 most severe), also used by Ismail et al. [30], that had not been validated. However, both raters received training on using the scale by thorough rating of archival images and then discussing these ratings face-to-face until their separately rated scores were consistently equivalent. Limitations of this study include the patient inclusion criteria that were only restricted to participants of Fitzpatrick skin types I–III. Furthermore, during patient evaluations, forced agreement was used to reconcile blinded ratings between the two raters; however, the study does not outline how often this happened, and there is no further information on inter-rater reliability.

Conclusion

Essentially, until other objective parameters are developed, scoring systems will remain essential in assessing severity of dermatological conditions. This review identified the noticeable gap in the understanding and clinical assessment of KP. Furthermore, current methods of evaluating KP in clinical trials are unreliable and inconsistent. Different studies and clinical trials use different scoring systems to measure KP, which make comparisons across treatment options difficult. Thus, a standardised and validated scoring system is needed, as it is required in clinical trials to advance the approval of effective safe treatments for KP.

Statement of Ethics

All patient photos were obtained and used with consent. Written informed consent was obtained from participants prior to the study. This study protocol was reviewed and approved by the Bellberry Limited Ethics Committee, approval number 2022-04-376.

Conflict of Interest Statement

The authors have no conflicts of interest to declare.

Funding Sources

This research did not receive any funding. M.A. Wang conducted the project as her Honours thesis with D.F. Murrell at UNSW.

Author Contributions

Professor Dedee Murrell conceived of the presented idea. Madeline Wang performed the literature search and manuscript preparation. Dr Anna Wilson was involved in the editing of the manuscript. Finally, all authors were involved with reviewing and approving the final manuscript.

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

留言 (0)

沒有登入
gif