Rethinking Liver Fibrosis Staging in Patients with Hepatocellular Carcinoma: New Insights from a Large Two-Center Cohort Study

Introduction

Hepatocellular carcinoma (HCC) remains a common malignancy and a leading cause of cancer-related death worldwide, and is closely related to background chronic liver disease.1 Liver resection remains the mainstay of curative treatment, especially in China.2 The preoperative assessment of the severity of concomitant liver fibrosis burden in patients with HCC is urgently needed to formulate management strategies that include preoperative hepatectomy planning, choice of therapeutic modalities, and risk assessment for postoperative liver failure.3

Liver biopsy is regarded as the gold standard for the accurate pathological evaluation of hepatic fibrosis and inflammation in chronic liver disease. Unfortunately, major concerns regarding the reliability of liver biopsy include subjective interpretations of histopathologic findings and misclassification of fibrosis due to sampling errors,4 thus limiting its wide application in clinical practice. Liver biopsy has been reserved for a very small minority of patients with uncertain HCC diagnoses, and has been performed infrequently for preoperative liver fibrosis staging.5

Assays of serum indices have been studied for the non-invasive evaluation of liver fibrosis burden in chronic liver disease of various etiologies.6 Unfortunately, numerous serum indices and derived cutoff values have yielded highly variable results that have precluded their generalized clinical application. Multiple radiological modalities have also been studied to meet the clinical need; these include ultrasound (US) transient elastography, computed tomography (CT) and magnetic resonance (MR) imaging-based techniques, and molecular imaging probes.7 The most promising methods are liver surface nodularity (LSN) scoring using CT, liver stiffness measurement obtained by US elastography,8 and MR elastography.9,10 However, these are confounded by interinstitutional variations in imaging protocols and requirements for additional expertise and equipment. Consequently, sufficient accuracies have not yet been validated, and routine clinical use is still unrealistic.7 Data on liver fibrosis staging in HCC patients are lacking. Several small cohort studies have used liver stiffness measurement to stage liver fibrosis in specific HCC cohorts;11–14 however, conflicting results have precluded robust and generally acceptable conclusions.

Consequently, clinicians may resort to lessons derived from experiences in chronic liver disease, presuming that tests used to diagnose HCC may have utility to stage liver fibrosis. Given the ambiguity, limitations, and conflicting results of currently used serum indices and radiological modalities for the diagnosis of chronic liver disease, and uncertainties regarding the possible effects of tumor-specific factors on the interpretation of test results, there is a clear and urgent unmet clinical need to accurately assess background liver fibrosis burden in patients with HCC, to thereby inform HCC management decision-making.

In view of this research gap, we conducted a large, two-center retrospective cohort study to develop a new predictive model for differentiating background liver fibrosis severity in patients with HCC. We also investigated the performance of serum indices used frequently to evaluate chronic liver disease.

Materials and Methods Study Population

A Chinese two-center cross-sectional study of patients undergoing hepatectomy indicated for HCC was conducted. Cases were identified from a two-institutional prospectively compiled HCC database. The institutional review boards of both medical centers approved the study (Hunan Provincial People’s Hospital [HPPH] approval no: 2021–045; Affiliated Hospital of North Sichuan Medical College [NSMC] approval no: 2021–015). The requirement for written informed consent was waived because of the retrospective study design. Patient privacy was ensured, and data were anonymized or maintained with confidentiality. All procedures were performed in accordance with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

A total of 900 consecutive patients undergoing initial hepatectomy for HCC at HPPH were enrolled between March 2015 and September 2020 as the Derivation Cohort. The External Validation Cohort comprised 344 individuals at NSMC between January 2019 and January 2021. Inclusion criteria were: 1) postoperative histopathologic confirmation of HCC; and 2) complete histopathologic data on liver fibrosis stage. The study was conducted in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement for reporting multivariable prediction model derivation and validation.15

Preoperative Evaluation

Demographic data, routine assessments including laboratory investigations, and postoperative pathological results were retrieved from a prospectively implemented institutional database and analysed retrospectively. For multiple measurements, the data obtained nearest to the operation day were used. Preoperative CT or MRI images were obtained from Picture Archiving and Communication Systems databases. Preoperative HCC diagnosis was based on clinical and radiological imaging features or histopathology of liver biopsy specimens according to AASLD guidelines.16 Portal vein tumor thrombosis and hepatic vein tumor thrombosis were graded according to published criteria.17 Bile duct tumor thrombosis was classified as reported previously.18 Frequently used fibrosis-associated serum indices were calculated based on preoperative biochemical and baseline demographic data by using published formulas19–26 presented in the Supplementary Table 1.

Evaluation of Clinically Significant Portal Hypertension (PH)

Clinically significant PH was defined by the presence of esophagogastric varices on imaging and/or endoscopy or platelet count <100,000/mm3 and splenomegaly (largest diameter in transverse plane on CT >12 cm).27 PH severity was classified as we reported previously: none, mild (slight esophagogastric varices), moderate (obvious esophagogastric varices without “red wale” signs), and severe (obvious esophagogastric varices with “red wale” signs).28,29

Construction of Liver Fibrosis Severity Score (LFSS)

LFSS was based on the combination of preoperative radiologic findings and the intraoperative observation of liver surface regenerative nodules. A scoring system was derived by assigning numerical values to each factor. We scored preoperative radiologic morphometric evaluation on a scale from 0 to 2, with 0 corresponding to no visible contour abnormality, 1 to hepatic morphologic changes (liver segment hypertrophy/atrophy) and no confirmed liver regenerative nodules, and 2 to radiologically confirmed regenerative nodules or signs of PH. Liver surface regenerative nodules were scored on a scale from 0 to 2 based on intraoperative findings, with 0 corresponding to no apparent nodules, 1 to multiple regenerative micronodules (< 3 mm in diameter), and 2 to regenerative macronodules (regenerative nodules > 3 mm in diameter).30 The color of liver parenchyma was scored as: 0 (normal ruddy), and 1 (greyish-red). Total scores for LFSS were further classified as 0–1 (None); 2 (Mild); 3 (Moderate) and ≥4 (Severe) (Table 1). Representative pictures of LFSS construction are shown in Figure 1.

Table 1 Score Criteria Used for Construction of LFSS

Figure 1 Continued.

Figure 1 Images to illustrate construction of LFSS. Case 1: A 24-year-old woman with a large right lobe HCC. Preoperative axial multiphasic CT images (A–C) showing typical radiological features of HCC, normal liver morphology, and regular margin. (A) unenhanced CT phase; (B) arterial phase; (C) portal venous phase. Intraoperative photography (D) showed normal liver morphology, absence of regenerative nodules, and ruddy liver parenchyma. According to our LFSS criteria, preoperative radiological evaluation score: 0 (normal morphology), intraoperative observation: 0 (no liver regenerated nodules), liver gross appearance color: ruddy (0), thus total score of LFSS was 0 (None). (E) Histological evaluation of paracancerous parenchyma showed G1 and S1 (HE, 100×). Case 2: A 37-year-old woman with spontaneous rupture of HCC in segment IV. Preoperative axial multiphasic CT images (A–C) showing hypertrophy of the lateral segments of the left liver lobe and regular hepatic margins. (A) unenhanced phase; (B) arterial phase; (C) portal venous phase. Notes: Filling defect caused by tumoral thrombosis was noted in the sagittal part of left portal vein. Intraoperative photography (D) showed multiple micro-regenerative nodules and red liver parenchyma. According to our LFSS criteria, preoperative radiological evaluation score: 1 (morphological change), intraoperative observation: 1 (multiple regenerative micronodules), liver gross appearance color: red (0), thus total score of LFSS was 2 (Mild). (E) Histologic evaluation of paracancerous parenchyma showed G3 and S2 (HE, 100×). Case 3: A 47-year-old man with HCC in the segment VIII. Preoperative axial multiphasic CT images (a-c) were obtained showing liver morphologic change (atrophy of the posterior segments of the right liver lobe) and regular margin. (A) unenhanced phase; (B) arterial phase; (C) portal venous phase. Notes: Anterior branch of right portal vein and distal branch of middle hepatic vein invaded by tumor. Intraoperative photography (D) showed multiple regenerative micronodules and greyish-red liver parenchyma. According to our LFSS criteria, preoperative radiological evaluation score: 1 (morphological change), intraoperative observation: 1 (multiple regenerative micronodules), liver gross appearance color: greyish-red (1), thus total score of LFSS was 3 (Moderate). (E) Histologic evaluation of paracancerous parenchyma showed G3 and S3 (HE, 100×). Case 4: A 57-year-old man with HCC in segment VIII. Preoperative axial multiphasic MR images (A–C) showing liver morphologic changes (atrophy of the posterior segments of the right lobe and medial segments of the left lobe, hypertrophy of the lateral segments of the left lobe) and irregular hepatic margins. (A) precontrast phase; (B) hepatic arterial phase. Notes: Enhancement of portal vein branches but not of hepatic vein branches in hepatic arterial phase; (C) portal venous phase. Notes: Middle and right hepatic veins were compressed by tumor. Intraoperative photography (D) showed multiple regenerative nodules and red liver parenchyma. According to our LFSS criteria, preoperative radiological evaluation score: 1 (morphologic change), intraoperative observation: 2 (multiple regenerative macronodules), liver gross appearance color: red (0), thus total score of LFSS was 3 (Moderate). (E) Histologic evaluation of paracancerous parenchyma showed G3 and S4 (HE, 100×). Case 5: A 61-year-old man with HCC in segment VII. Preoperative axial multiphasic CT images (A–C) showing pronounced liver morphologic changes and irregular margins. (A) unenhanced phase; (B) arterial phase; (C) portal venous phase image; (D) coronal view shows signs of clinically significant PH (engorged and tortuous paraesophageal varices and splenomegaly). Intraoperative photography (E) showed uneven distribution of multiple regenerative micro- and macronodules and greyish-red liver parenchyma. According to our LFSS criteria, preoperative radiological evaluation score: 2 (confirmed liver cirrhosis and/or imaging features of PH), intraoperative observation: 2 (multiple macro-regenerative nodules), liver gross appearance color: 1 (greyish-red), thus total LFSS score was 5 (Severe). (F) Histologic evaluation of paracancerous parenchyma showed G4 and S4 (HE, 100×). Case 6: A 48-year-old man with HCC of the right lobe. Preoperative axial multiphasic CT images (A–C) showing pronounced liver morphologic changes, ascites and splenomegaly. (A) unenhanced phase; (B) arterial phase; (C) portal venous phase. Notes: Neither the right hepatic nor the right portal veins were visualized; (D) coronal image shows compression of the retrohepatic inferior vena cava by tumor and splenomegaly. Notes: This patient received a preoperative diagnosis of intrahepatic cholangiocarcinoma with concomitant PH and cirrhosis. However, postoperative pathology confirmed a diagnosis of HCC. Intraoperative photography (E) showed morphologic changes, no regenerative nodules, and ruddy liver parenchyma. According to our LFSS criteria, preoperative radiological evaluation score: 2 (confirmed cirrhosis and/or imaging features of PH), intraoperative observation: 0 (no regenerative nodules), liver gross appearance color: red (0), thus total LFSS score was 2, (Moderate, in contrast to preoperative assessment as Severe with indicator of PH). (F) Histologic evaluation of paracancerous parenchyma showed G1 and S0 (HE, 100×).

Abbreviations: LFSS, liver fibrosis severity score; HCC, hepatocellular carcinoma; CT, computed tomography; HE, hematoxylin and eosin; MR, magnetic resonance; PH, portal hypertension.

Surgical Procedure

Surgery was performed by experienced hepatic specialists. The operative procedures have been described in detail in our previous reports.28,29

Tumor Staging and Pathological Analysis

The AJCC-TNM staging system (8th edition) was used for HCC staging.31 Resected specimens were fixed in 10% buffered formalin, embedded in paraffin, and studied with hematoxylin and eosin (HE), Masson’s trichrome, and reticulin stains. The Edmondson-Steiner classification was used to determine the degree of tumor differentiation.32 Only the area of noncancerous liver parenchyma distant from the tumor was evaluated for histological grading of inflammation (G0-G4) and staging of liver fibrosis (S0-S4) following the Scheuer scoring system.33 S0–1 indicated no or mild fibrosis, S2 significant fibrosis, S3 severe fibrosis, and S4 early cirrhosis. G0 referred to no inflammation, G1 to mild inflammation with no necrosis, and G2–4 to mild-moderate-to-severe inflammation, respectively. Two senior histologists blinded to the clinical data independently determined the grades of hepatic fibrosis and inflammation of each specimen. Disagreements between the histologists were resolved by consensus.

Statistical Analysis

Continuous variables were expressed as median and range and compared by using the Mann–Whitney test. Categorical variables were expressed as frequencies and percentages and compared with the Chi-square χ2 test or Fisher exact test when appropriate. Univariate and multivariate ordinal logistic regressions were used to select the optimal predictors associated with different liver fibrosis S stages. Variables with a P value <0.05 in the univariate analysis were further included for multivariate logistic regression analysis through the stepwise forward selection method. Independent predictors (P < 0.05) in the multivariate ordinal logistic regression analysis were included in the nomogram construction.

Nomograms were validated in the external validation cohort. 500 Bootstrap resamples for the internal validation were performed in the derivation and external validation cohorts, respectively. Model performance for predicting each endpoint (S1, S2, S3, S4, and S3+S4) was evaluated by the area under the receiver operator characteristic curve (AUROC) and the concordance index (C-index). C-index was calculated as described previously.34 Calibration belt curves were plotted to assess the calibration of the proposed model, accompanied by the Hosmer-Lemeshow (HL) test, which identifies significant deviations from the ideal calibration, as well as the direction of the variation.

Selected serum indices were evaluated for their performance in liver fibrosis staging in terms of AUROC. Non-parametric Spearman correlation was utilized to examine the correlation between liver fibrosis S stage and liver inflammatory G grade. AUROC >0.9 was considered excellent, 0.8–0.9 good, 0.7–0.8 fair35 and 0.5–0.7 poor. All statistical analyses were performed with the rms package in R version 3.4.0 (http://www.r-project.org/). A two-sided P value < 0.05 was considered statistically significant.

Results Demographic and Clinicopathologic Characteristics

The demographic and clinicopathologic characteristics of the Derivation and External Validation cohorts are listed in Table 2. Given the small number of S0 patients (n = 17), we pooled S0 and S1 patients into a group with “no significant fibrosis” (S0-S1). No G0 patients were identified.

Table 2 Demographic and Clinicopathological Characteristics of the Derivation and External Validation Cohorts

Clinical data from Derivation and External Validation cohorts are presented in Table 2. Hepatitis B infection was the most prevalent etiology of liver disease and HCC in both the Derivation (85.7%) and External Validation (86.9%) cohorts. Patients in the Derivation Cohort with Child-Pugh grade A (n = 860), exhibited liver fibrosis S stages as follows: S1 9.0% (n = 77), S2 39.8% (n = 342), S3 25.1% (n = 216), and S4 26.2% (n = 225), respectively; in Child-Pugh grade B (n = 40), liver fibrosis stages were: S1 5.0% (n = 2), S2 17.5% (n = 7), S3 25.0% (n = 10), and S4 52.5% (n = 21), respectively. In patients with Child-Pugh grade A (n = 860), liver inflammation grades were: G1 8.4% (n = 72), G2 66.7% (n = 574), G3 22.7% (n = 195), and G4 2.2% (n = 19), respectively; in patients with Child-Pugh B grade B (n = 40), liver inflammation grades were: G1 0 (n = 0), G2 55.0% (n = 22), G3 35.0% (n = 14), and G4 10.0% (n = 4), respectively.

Univariate and Multivariate Ordinal Logistic Regression Analyses

Results of the univariate ordinal logistic regression analysis for liver fibrosis S stages in the Derivation Cohort are listed in Table 3. Multivariate logistic regression analysis demonstrated that LFSS, PH severity, plateletcrit (PCT) and model for end-stage liver disease-sodium (MELD-Na) were independent predictors of pathological liver fibrosis stages in HCC patients (Table 4).

Table 3 Univariate Ordinal Logistic Regression for Liver Fibrosis Stage in the Derivation Cohort

Table 4 Multivariate Ordinal Logistic Regression for Liver Fibrosis Stage in the Derivation Cohort

Predictive nomograms were developed through multivariate analysis. For S stage prediction, total points were calculated by adding the points for each of the four analyzed factors and referred to as the probability of S stage in the bottom axis.

Performance of Established Predictive Models S1

A predictive nomogram for S1 is presented in Figure 2A. The AUROCs of the nomogram to predict S1 in the Derivation and External Validation cohorts were 0.850 (95% CI, 0.8132–0.8870) and 0.919 (95% CI, 0.8474–0.9407), respectively, indicating good and excellent performance for distinguishing between S1 and non-S1 patients (Figure 2B and C).

Figure 2 Continued.

Figure 2 Establishment and validation of S1 predictive nomogram. (A) A predictive nomogram for S1. (B and C) The AUROCs of the nomogram to predict S1 in the Derivation (B) and External Validation (C) cohorts. AUCs are shown in the figure and reported with 95% CIs in the text together with the c-index. (D) Bootstrap analysis for internal validation in the Derivation Cohort. (E) GiViTI calibration plot shows good consistency between the observed frequency and predicted probability for S1 in the Derivation Cohort. Calibration plots (black lines) show fitted polynomial logistic function curves of the relationship between the logit transformation of the predicted probabilities and empirical outcomes (shaded yellow, 95% CI). Ideal reference lines are red. HL chi-square test value is reported in the Results section. (F) Bootstrap analysis for internal validation performed with the External Validation Cohort data. (G) The favorable calibration of the nomogram in the External Validation Cohort was further confirmed by the GiViTI calibration plot. HL chi-square test value is reported in the Results section.

For internal validation of Derivation Cohort data, we obtained AUROCs for S1 prediction of 0.845 (95% CI, 0.7850–0.8929) for the development dataset (D-set) and 0.782 (95% CI, 0.7490–0.8819) for the validation dataset (V-set), with a C-index of 0.823 (95% CI, 0.783–0.864) (Figure 2D). A GiViTI calibration plot showed good consistency between the observed frequency and predicted probability of S1 with the Hosmer-Lemeshow (HL) test (χ2 =5.872; P=0.662), indicating no departure from good fit (Figure 2E).

Internal validation of the External Validation Cohort data yielded AUROCs for S1 prediction of 0.786 (95% CI, 0.7350–0.8374) for the D-set and 0.800 (95% CI, 0.7449–0.8773) for the V-set, with a C-index of 0.833 (95% CI, 0.712–0.895) (Figure 2F). The favorable calibration of the nomogram was further confirmed by the GiViTI calibration plot and the Hosmer-Lemeshow test (χ2 =2.291; P = 0.971) (Figure 2G).

S2

A predictive nomogram for S2 is presented in Figure 3A. The AUROCs of the nomogram to predict S2 in the Derivation and External Validation cohorts were 0.726 (95% CI, 0.6926–0.7601) and 0.806 (95% CI, 0.7710–0.8521) respectively, indicating that the nomogram had fair and good performance for distinguishing between S2 and non-S2 patients (Figure 3B and C).

Figure 3 Continued.

Figure 3 Establishment and validation of S2 predictive nomogram. (A) A predictive nomogram for S2. (B and C) The AUROCs of the nomogram to predict S2 in the Derivation (B) and External Validation (C) cohorts. AUCs are shown in the figure and reported with 95% CIs in the text together with the c-index. (D) Bootstrap analysis for internal validation in the Derivation Cohort. (E) A GiViTI calibration plot showed good consistency between the observed frequency and predicted probability for S2 in the Derivation Cohort. Calibration plots are as defined in the Figure 2 legend. HL chi-square test value is reported in the Results section. (F) Bootstrap analysis for internal validation performed with the External Validation Cohort data. (G) The favorable calibration of the nomogram in the External Validation Cohort was further confirmed by the GiViTI calibration plot. HL chi-square test value is reported in the Results section.

Bootstrap analysis for internal validation of the Derivation Cohort data obtained AUROCs for S2 prediction of 0.708 (95% CI, 0.6789–0.7476) for the D-set and 0.722 (95% CI, 0.6814–0.7508) for the V-set, respectively, with a C-index of 0.713 (95% CI, 0.678–0.747), indicating that the nomogram had fairly good performance for distinguishing between S2 and non-S2 patients (Figure 3D). A GiViTI calibration plot showed good consistency between the observed frequency and predicted probability for S2 among patients with an HL chi-squared test value of 8.219 (P = 0.412) (Figure 3E).

Internal validation in the External Validation Cohort achieved AUROCs for S2 prediction of 0.814 (95% CI, 0.7479–0.8801) for the D-set and 0.798 (95% CI, 0.7395–0.8566) for the V-set, respectively; with a C-index of 0.791 (95% CI, 0.743–0.839) (Figure 3F). A GiViTI calibration plot affirmed that the model calibration yielded good consistency between the observed frequency and predicted probability among patients with S2-class, with an HL chi-squared test value of 3.706 (P = 0.883) (Figure 3G).

S3

A predictive nomogram for S3 is presented in Figure 4A. AUROCs of the nomogram to predict S3 in the Derivation and External Validation cohorts were 0.648 (95% CI, 0.6078–0.6892) and 0.698 (95% CI, 0.6387–0.7582), respectively, indicating poor differentiation of S3 and non-S3 classes (Figure 4B and C).

Figure 4 Continued.

Figure 4 Establishment and validation of S3 predictive nomogram. (A) A predictive nomogram for S3. (B and C) AUROCs to predict S3 in the Derivation (B) and External Validation (C) cohorts. (D) Bootstrap analysis for internal validation in the Derivation Cohort. (E) GiViTI calibration plot showed good consistency between the observed frequency and predicted probability for S3 in the Derivation Cohort. Calibration plots are as defined in the Figure 2 legend. HL chi-square test value is reported in the Results section. (F) Bootstrap analysis for internal validation performed with the External Validation Cohort data. (G) The favorable calibration of the nomogram in the External Validation Cohort was further confirmed by the GiViTI calibration plot. HL chi-square test value is reported in the Results section.

For the Derivation Cohort, bootstrap analysis for internal validation obtained AUROCs for S3 prediction of 0.637 (95% CI, 0.5894–0.6848) for the D-set and 0.556 (95% CI, 0.4788–0.6523) for the V-set, respectively; with a C-index of 0.616 (95% CI, 0.575–0.658), indicating poor performance for distinguishing between S3 and non-S3 patients (Figure 4D), although a GiViTI calibration plot showed good consistency between the observed frequency and predicted probability of S3, with an HL chi-square test value of 9.050 (P= 0.338) (Figure 4E).

Bootstrap analysis for internal validation of the External Validation Cohort data achieved AUROCs for S3 prediction of 0.681 (95% CI, 0.6105–0.7519) for the D-set and 0.601 (95% CI, 0.4632–0.7385) for the V-set, respectively; with a C-index of 0.666 (95% CI, 0.603–0.728) (Figure 4F), although a GiViTI calibration plot affirmed that the model calibration yielded good consistency between the observed frequency and predicted probability of S3-class with an HL chi-squared test value of 7.210 (P = 0.514) (Figure 4G).

S4

A predictive nomogram for S4 is presented in Figure 5A. AUROCs of the nomogram to predict S4 in the Derivation and External Validation cohorts were 0.812 (95% CI, 0.7817–0.8413) and 0.824 (95% CI, 0.7770–0.8705), respectively; indicating that the model was good at discriminating S4 both in Derivation and External Validation cohorts (Figure 5B and C).

Figure 5 Continued.

Figure 5 Establishment and validation of S4 predictive nomogram. (A) Predictive nomogram for S4. (B and C) AUROCs to predict S4 in the Derivation (B) and External Validation (C) cohorts. (D) Bootstrap analysis for internal validation in the Derivation Cohort. (E) GiViTI calibration plot showed good consistency between the observed frequency and predicted probability for S4 in the Derivation Cohort. Calibration plots are as defined in the Figure 2 legend. HL chi-square test value is reported in the Results section. (F) Bootstrap analysis for internal validation performed with the External Validation Cohort data. (G) The favorable calibration of the nomogram in the External Validation Cohort was further confirmed by the GiViTI calibration plot. HL chi-square test value is reported in the Results section.

For Derivation Cohort data, bootstrap analysis for internal validation obtained AUROCs for S4 prediction of 0.822 (95% CI, 0.7889–0.8557) for the D-set and 0.753 (95% CI, 0.6866–0.8202) for the V-set, respectively; with a C-index of 0.804 (95% CI, 0.769–0.835) (Figure 5D). A GiViTI calibration plot showed good consistency between the observed frequency and predicted probability of S4 with an HL chi-square test value of 5.149 (P =0.742) (Figure 5E).

Bootstrap analysis for internal validation of the External Validation Cohort data revealed AUROCs for S4 prediction of 0.805 (95% CI, 0.7458–0.8654) for the D-set and 0.734 (95% CI, 0.6207–0.8476) for the V-set, respectively; with a C-index of 0.792 (95% CI, 0.740–0.843) (Figure 5F). A GiViTI calibration plot affirmed that the model calibration yielded good consistency between the observed frequency and predicted probability of S4-class with an HL chi-squared test value of 10.488 (P = 0.232) (Figure 5G).

S3+S4 (≥S3)

A predictive nomogram for S3+S4 is presented in Figure 6A. The AUROCs of the nomogram to predict S3+S4 in the Derivation and External Validation cohorts were 0.806 (95% CI, 0.7770–0.8346) and 0.840 (95% CI, 0.7976–0.8815), respectively, indicating that the model was good at discriminating S3+S4 in both cohorts. (Figure 6B and C).

Figure 6 Continued.

Figure 6 Establishment and validation of S3+S4 predictive nomogram. (A) Predictive nomogram for S3+S4. (B and C) AUROCs to predict S3+S4 in the Derivation (B) and External Validation (C) cohorts. (D) Bootstrap analysis for internal validation in the Derivation Cohort. (E) GiViTI calibration plot showed good consistency between the observed frequency and predicted probability of S3+S4 in the Derivation Cohort. Calibration plots are as defined in the Figure 2 legend. HL chi-square test value is reported in the Results section. (F) Bootstrap analysis for internal validation performed with the External Validation Cohort data. (G) The favorable calibration of the nomogram in the External Validation Cohort was further confirmed by the GiViTI calibration plot. HL chi-square test value is reported in the Results section. The calibration curves for S1, S2, S3, S4 and S3+S4 were ideal-matched with a 45° reference line, indicating optimal agreement between the nomogram-predicted probabilities on the X-axis, and the actual rates on the Y-axis. HL chi-square calibration values for each cohort are reported in the Results section.

Bootstrap analysis for internal validation of the Derivation Cohort data obtained AUROCs for S3+S4 prediction of 0.800 (95% CI, 0.7478–0.8553) for the D-set and 0.774 (95% CI, 0.7133–0.8342) for the V-set, respectively; with a C-index of 0.795 (95% CI, 0.765–0.824) (Figure 6D). A GiViTI calibration plot showed good consistency between the observed frequency and predicted probability of S3+S4 with an HL chi-square test value of 3.626 (P = 0.889) (Figure 6E).

Bootstrap analysis for internal validation performed with the External Validation Cohort data achieved AUROCs for S3+S4 prediction of 0.812 (95% CI, 0.7597–0.8643) for the D-set and 0.805 (95% CI, 0.7514–0.88581) for the V-set, respectively; with a C-index of 0.811 (95% CI, 0.766–0.856) (Figure 6F). A GiViTI calibration plot affirmed that the model calibration yielded good consistency between the observed frequency and predicted probability of S3+S4 -class with an HL chi-squared test value of 8.304 (P = 0.404) (Figure 6G).

Overall, the calibration curves for our proposed predictive models for S1, S2, S3, S4 and S3+S4 were ideal-matched with a 45° reference line, indicating optimal agreement between the nomogram-predicted probabilities on the X-axis, and the actual rates on the Y-axis.

Performance of Selective Serum Indices in Liver Fibrosis Staging

Table 5 and Figure 7A–G show the performance of selective serum indices, LFSS alone, or PH severity for identifying S1 (7a), S2 (7b), S3 (7c), S4 (7d), S1+S2 (7e), S2+S3 (7f) and S3+S4 (7g) in the Derivation Cohort. None of the indices had meaningful diagnostic value in identifying liver fibrosis stages S1 or S2. Most indices had poor abilities to diagnose S3, with AUROCs less than 0.6. Several indices such as Lok index (grade), King’s score grade, ALBI grade, MELD (grade) and MELD-Na (grade) cannot be used to diagnose S3.

Table 5 ROC Curve Analysis of Selective NITs, LFSS and PH Severity for Different Liver Fibrosis Stage

Figure 7 Continued.

Figure 7 ROC analysis of selective serum indices, LFSS, and PH severity. (A) S1 diagnosis; (B) S2 diagnosis; (C) S3 diagnosis; (D) S4 diagnosis; (E) S1+S2 diagnosis; (F) S2+S3 diagnosis; (G) S3+S4 diagnosis. Results are shown in detail in Table 5.

Although AUROCs of all indices reached statistical significance (P<0.05) for the diagnosis of S4, only LFSS achieved the highest AUROC (0.775), with other indices <0.7. The selective serum indices failed to demonstrate robust diagnostic ability for S4.

Correlations Between Inflammation G Grade and Fibrosis S Stage

Univariate (Table 3) and multivariate (Table 4) ordinal logistic regression analyses did not confirm an association between inflammation activity grade and fibrosis stage. Details of different liver fibrosis stages and inflammation grades in the Derivation Cohort are summarized in Table 6. G grades did not always parallel S stages, and the distribution of liver fibrosis stages was unequal across inflammation G grades. The non-parametric Spearman correlation test (Table 7) showed that only G1 demonstrated a weak positive correlation with S1 (r = 0.574, P = 0.000), lacking clinically significant relevance, while the other r coefficients revealed no correlation between G grades and S stages.

Table 6 Summary of Inflammation Grade Across Different Liver Fibrosis Stage Groups in the Derivation Cohort

Table 7 Correlation Between Fibrosis Stage and Inflammatory Grade in the Derivation Cohort

Discussion

To the best of our knowledge, we report the first large cohort study to identify predictive factors and to construct models to facilitate liver fibrosis staging in patients with HCC. We also found variable liver fibrosis stages distributed within Child-Pugh grades, suggesting that they cannot accurately reflect liver fibrosis burden.

Multiple issues must be addressed during the evaluation of liver fibrosis burden in patients with HCC: 1) serum markers are easily influenced by clinical interventions and fail to provide reliable quantitative assessments of fibrosis; 2) imaging modalities have been developed to assess various aspects of liver parenchyma that include morphometric features, mechanical properties, and metabolic and functional characteristics. However, routine clinical use has been confounded by insufficient accuracy, complex modifications of typical imaging protocols, and requirements for additional institutional expertise and equipment;8–10 and 3) the effects of tumor-related factors (size, number, vascular invasion) on liver fibrosis staging have not been fully determined. Thus, unmet clinical needs and wide research gaps for accurate fibrosis staging persist. On the other hand, because CT and MRI are the primary radiologic modalities for HCC diagnosis and staging,3 their application to liver fibrosis staging would be expedient if proven to be accurate.

Increased parenchymal stiffness and presence of liver regenerative nodules are meaningful indicators of significant or advanced fibrosis and cirrhosis. The preoperative diagnosis of regenerative nodules by routine CT or MR imaging is difficult during the early stages of liver fibrosis.7 Consequently, liver stiffness measurement using US and MR elastography has been considered a promising method, although potentially confounded by multiple factors.36 Intrinsic discrepancies exist between areas biopsied using US or MR elastography and the anatomic locations of tumors.37 Previous studies have yielded conflicting results that are inapplicable to clinical practice.11–14 Thus, the use of liver stiffness measurements as surrogate markers of liver fibrosis in patients with HCC should be exercised with caution.

CT-LSN scores may discern subtle changes of hepatic parenchymal architecture and have been suggested as valuable in liver fibrosis staging,38,39 and may be associated with post-hepatectomy liver failure among HCC patients.40 However, the use of different imaging parameters and variable thresholds10,38–43 preclude inter-study comparisons and general reproducibility. Whether CT-LNS scores can evaluate background liver fibrosis in patients with HCC is unclear. In view of this point and restricted by the unavailability of liver stiffness measurement using US or MR elastography and CT-LSN scores in our two centers, we hypothesized that the introduction of more expedient predictive models for liver fibrosis staging may surmount the intrinsic limits of CT-based LSN scores.

The intraoperative classification of the gross severity of cirrhosis was proposed as a rapid staging method to determine the extent of hepatic resection.44 However, the single-center study was conducted using the generic term “cirrhosis”, omitted external validation, and failed to present convincing evidence of liver fibrosis staging.44 Our LFSS, including more indices than regenerative nodules as a morphological staging index alone as reported previously,44 unexpectedly showed only fair, but not superior, ability to diagnose S4. Most importantly, our results showed that not all regenerative nodules of varying size prove to be cirrhotic on postoperative pathological examination.

Liver parenchymal color was unexpectedly identified as a meaningful indicator. Although the mechanisms underlying color change during fibrosis still require elucidation, we suggest that color change may be a meaningful complementary factor to evaluate liver fibrosis; to our knowledge, this observation has not been reported previously. Two patients (Case 2 and 3) with similar morphological changes and multiple micronodules differed in gross color appearance (Figure 1B and C). If this factor had been ignored, the two patients would have been classified with mild LFSS. In actuality, Case 3 was reclassified as moderate, and postoperative histopathology revealed stage S3. The gross appearance of parenchymal color may reflect key pathogenic events during the course of progressive fibrosis, and highlights the value of further research to explore possible surrogate markers of liver fibrosis burden.

Morphologic changes that complicate HCC may also result from hemodynamic consequences of local tumor effects, such as macrovascular invasion or tumor thrombosis. Reliance on morphologic features to diagnose or exclude cirrhosis should be undertaken with caution. Thus, in our LFSS criteria, we assigned 1 point for morphological changes regardless of other potentially confounding factors. Such criteria minimized the risk of underestimation. Even so, the LFSS proposed in the current study that includes liver morphometric measurement unexpectedly had little value in the diagnosis of S1 or S2. The LFSS demonstrated poor diagnostic performance for S3 and only fair performance in the diagnosis of S4, although it achieved significant improvement in AUROC and outperformed all single indices. Our results underscored whether or what factors can be used to improve diagnostic accuracy, and also suggested that CT- or MRI-based LSN scores may not be as accurate as previously thought.

PH was identified as a complementary factor, and was incorporated into the predictive model. Clinically significant PH can signify stages of severe fibrosis and cirrhosis in chronic liver disease.45 The gold standard method of evaluating PH severity is hepatic vein pressure gradient measurement, which is adopted infrequently in our evaluation of patients with HCC. The absence of radiological signs cannot exclude the diagnosis of PH.46 We also observed that splenomegaly with thrombocytopenia does not necessarily accompany radiologic or endoscopic signs of PH, as illustrated by Case 6 in Figure 1. Preoperative axial multiphasic CT showed remarkable hepatic morphologic changes, ascites, and splenomegaly; while intraoperative observations included ruddy liver parenchyma and the absence of liver regenerative nodules. We suggest that the characteristics of PH in patients with HCC may differ from those without HCC, since tumor-related effects can accumulate in a relatively short time interval compared to a gradual progression in chronic liver disease without HCC. A superimposition of tumor mass effect and/or vascular-associated factors (eg, macrovascular invasion, portal vein [tumor] thrombosis) may precipitate or aggravate portal hypertension with inapparent formation of contralateral shunts. Even so, our results demonstrated that PH severity alone had little diagnostic value for S3 (advanced liver fibrosis) and poor diagnostic ability for S4 (cirrhosis).

The LFSS proposed in the current study overcomes the shortcomings of preoperative CT or MRI in the diagnosis of incipient or early liver fibrosis. The criteria are simple and easily applied before hepatectomy indicated for HCC after open or laparoscopic laparotomy.

Preoperative PCT and MELD-Na can facilitate clinical screening for hepatic fibrosis. MELD-Na and other MELD-based models have advantages for the accurate prediction of clinical outcomes in patients with cirrhosis,47–49 and may be particularly useful in patients for whom hepatectomy may be contraindicated by severe cirrhosis. In our study, MELD (grade) and MELD-Na (grade) showed poor diagnostic ability for S4 with similar AUROCs, while only MELD-Na remained significant in multivariate analysis, and improved the performance of our model.

Reduced PCT is an indicator of liver fibrosis in HCV-,50,51 and HBV-related chronic liver disease,52 alcoholic liver cirrhosis, and nonalcoholic fatty liver disease.53 Platelet count is an important index in several serum tests. Although PCT and platelet count had similar performance in fibrosis staging in our study, only PCT remained as an independent predictor by multivariate analysis. A similar finding was reported in HBV-related chronic liver disease,52 and deserves further investigation.

We also evaluated the performance of selected serum indices. None can be used for S1 or S2 diagnosis. For S3, most indices performed poorly, with AUROCs below 0.6. For S4, the Lok index, King’s score (grade), and APRI, FIB4 (grade) demonstrated improved performance, while other indices did not show significant improvements. These results emphasize that frequently used serum tests failed as useful indices for differentiating background liver fibrosis burden in patients with HCC, suggesting that the use of these cutoffs should be discouraged in clinical practice.

Tumor-associated factors cannot be ignored during liver fibrosis staging in patients with HCC. Liver stiffness measurement values may vary with distance from the tumor boundary, and can be affected by tumor location.11,13,53 However, at least one study suggests that tumors do not significantly influence liver stiffness measurement results.54 We explored heterogeneous characteristics of tumor-related factors for possible associations with liver fibrosis stage. Tumor maximum diameter (MD) cutoff values were statistically significant in univariate analysis, but did not remain significant in multivariate analysis. Tumor aggressiveness reflected by AJCC-TNM stage also showed no impact on each value of liver fibrosis stage. Although an association between tumor-related factors and liver fibrosis stage seems intuitively plausible, our results did not support this hypothesis, and no tumor-related factors were included in our prediction model for fibrosis staging. We suggest that architectural changes caused by HCC are relatively acute compared to the gradual progression of hepatic fibrosis. Given the inability to determine possible relationships between tumor characteristics and liver fibrosis stage, we should consider that liver stiffness measurement values may not reflect liver fibrosis burden accurately in patients with HCC.

Conflicting results have been reported regarding the influence of inflammation on liver stiffness measurement-based liver fibrosis staging of chronic liver disease.55,56 Whether such an association is present in patients with HCC remains unknown. Of note, our study did not observe the grade of liver inflammation as a possible indicator of liver fibrosis S stage by uni- and multivariate analyses in patients with HCC. Furthermore, we observed no significant correlation between liver fibrosis S stage and inflammation G grade in patients with HCC, with the exception of a weak positive correlation between G1 and S1. Taken together, these results suggest that our proposed liver fibrosis staging nomogram is independent of hepatic inflammation.

Due to disequilibrium and asynchronous progression of liver fibrosis,46 the demarcation between adjacent S stages S2 or S3 or S4 is often challenging. The diagnostic accuracy of our model was only fair in identifying S2 and poor for S3 in both the derivation and validation cohorts. These findings imply overlap of adjacent fibrosis stages, each representing a heterogeneous group with some patients having S3-S4 and others having S2-S3 fibrosis. As shown in Table 5, when classifying S1 with S2 and S2 with S3 as categories, respectively, none of the indices have clinical diagnostic utility by AUROC analysis. In terms of clinical relevance and possible roles of liver fibrosis staging, classification of S3 with S4 as an entity (≥S3) (whether S3 fibrosis is closer to S4 is perplexing), may be a reasonable and acceptable strategy. Our previous study investigated background liver fibrosis stage (≥S3) as one of the risk factors associated with early HCC recurrence after curative hepatectomy.28 In addition, our proposed predictive nomogram performs well with good to excellent accuracy, suggesting its potential clinical value. Whether precise discrimination of intermediate stage S3 is necessary in clinical practice deserves further study.

Several major strengths of our study must be highlighted. We used a real-world HCC dataset that featured a relatively large sample size and consecutive enrolment, thus capable of an adequate representation of HCC characteristics in clinical practice. Our study design facilitated the exploration of factors associated with different liver fibrosis stages. The development of our model and its external validation, and the similarity of results between the two institutions indicate the reproducibility of our results regardless of differences in baseline characteristics of study populations. These results can be generalized and applied to similar contexts.

The use of resected specimens as the gold standard for evaluating liver fibrosis may have obviated the conventional pitfalls of histological interpretation based on percutaneous liver biopsy, and may have ensured the high reliability of our conclusions. The histopathologic consistency between sampling sites clarified the association of tumoral factors and liver fibrosis staging, thus overcoming shortcomings of elastography, because resection of the area examined by elastography was not always possible.

Our predictive model based on multivariate analysis differs from those reported in studies that used dichotomized endpoints such as significant/advanced fibrosis or cirrhosis compared with healthy counterparts, or that evaluated liver disease of different etiologies. Our model showed robust evidence of reasonably accurate diagnostic performance. Whether our proposed model may indicate functional hepatic reserve may represent a noteworthy extension of our study.

Several limitations of our study should be mentioned. First, our observational retrospective study design might have allowed selection bias, causing unbalanced numbers of patients in different fibrosis stages. However, the large cohort size and the analysis accounting for validated predictors of liver fibrosis burden reduced possible selection bias. Second, our study was conducted in only two centers in China. We have planned further research to identify additional markers that might further improve the predictive ability of our proposed models, and have initiated the validation of the clinical applicability of our models in oth

留言 (0)

沒有登入
gif