Diagnostic Accuracy of Non-Invasive Diagnostic Tests for Nonalcoholic Fatty Liver Disease: A Systematic Review and Network Meta-Analysis

Introduction

Due to the implementation of effective vaccines and potent antiviral therapies, the causes and patterns of the prevalence and incidence of end-stage liver diseases (ESLDs, eg, cirrhosis and HCC) have changed. Nonalcoholic fatty liver disease (NAFLD) significantly contributes to the prevalence of ESIDs.1,2 A recent study in China involving 5,757,335 check-up participants estimated that the prevalence of liver steatosis was 44.39%, while advanced fibrosis and cirrhosis were 2.85% and 0.87%, respectively.3 NAFLD poses a substantial threat to patient health and imposes a significant burden on healthcare systems and socio-economic structures. Early diagnosis and effective management of NAFLD are essential. The diagnosis of NAFLD has relied on liver tissue biopsy as the gold standard, which is invasive, carries risks, and is limited by sampling bias and subjective interpretation issues. With advancements in medical technology and growing patient comfort demands, non-invasive tests (NITs) have garnered attention and widespread application. Particularly in early-stage disease or asymptomatic phases, these NITs enhance diagnostic accuracy and reduce the need for traditional liver tissue biopsy, thereby minimizing patient discomfort and healthcare costs. Numerous NITs have been developed to diagnose NAFLD, but their clinical application has yet to be widely adopted due to several challenges. First, more long-term validation, large-sample studies, and longitudinal research are needed to confirm their clinical efficacy.4 Second, there is a lack of standardization and differences in threshold settings for NITs, with varying recommendations for the same indicators, such as FIB-4 scores,5 Guidelines recommended the cutoffs proposed of <1.3 as the lower cutoff to rule out fibrosis and >2.67 as the upper cutoff to rule in fibrosis.6–10 The AASLD recommends <1.45 as the lower cutoff and >3.25 as the upper cutoff.11 Third, many NITs do not meet the generally accepted cost-effectiveness thresholds, limiting their use;12 and fourth, because the pathophysiology of NAFLD is complex, involving lipid accumulation, inflammation, oxidative stress, and fibrosis, most NITs focus on assessing fibrosis or fat content, which limits their ability to evaluate the disease comprehensively.13,14 In addition, the implementation and routine use of NITs in NAFLD faces multiple barriers, including a lack of societal awareness about NAFLD, skill gaps in interpreting test results, insufficient skills in using NITs to screen patients for timely referral, a misperception that NITs are not helpful in the absence of medical treatments, and limited predictive value for certain NITs.15 The diagnostic accuracy and target populations of different NITs still need to be clarified, and there is a lack of consensus in guidelines and expert opinions, which further complicates clinicians’ decision-making when selecting appropriate diagnostic tools.16 Therefore, it is essential to clarify the diagnostic performance of NITs, optimize their ranking, and explore their potential value in various clinical contexts. This study assesses the sensitivity, specificity, and overall diagnostic value of existing NITs for NAFLD diagnosis through a systematic review and network meta-analysis, ranking their performance. The goal is to provide evidence-based support for clinical practice while identifying potential reasons for differences in diagnostic performance among NITs, offering insights for future improvements and optimization.

Materials and MethodsProtocol and Guidance

This study followed the Preferred Reporting Items for a Systematic Reviews and Meta-analysis of Diagnostic Test Accuracy (PRISMA-DTA) and Network Meta-Analyses (PRISMA-NMA),17,18 registered under PROSPERO (CRD42024582871).

Inclusion Criteria

We included diagnostic studies evaluating the predictive value of NITs in diagnosing NAFLD, with or without liver fibrosis or cirrhosis. The publication language was restricted to Chinese or English. Studies had to involve liver histopathology as the reference standard and adhere to established diagnostic criteria. The participants included patients with suspected or confirmed NAFLD, irrespective of age, sex, ethnicity. Combination with metabolic syndrome was allowed. NITs included both serum-based and imaging-based biomarkers. Studies were included if they reported data on true positive (TP), true negative (TN), false positive (FP), and false negative (FN), or metrics such as sensitivity (Se), specificity (Sp), accuracy, positive predictive value (PPV), negative predictive value (NPV), diagnostic odds ratio (DOR), and area under the receiver operating characteristic curve (AUC). While no restrictions were imposed on specific cut-off values for diagnostic measures such as Se, Sp, PPV, and NPV, we included all reported results with cut-off values to reflect the real-world variation in diagnostic thresholds. This approach aimed to capture the breadth of diagnostic performance across diverse clinical contexts. Variations in cut-off values and their potential impact on diagnostic performance were explored qualitatively in the discussion. This manuscript covers numerous non-invasive diagnostic tests, the abbreviations for NITs can be found in Supplemental eTable 1.

Exclusion Criteria

We excluded studies involving participants with fatty liver attributed to factors such as excessive alcohol consumption, genetic predispositions, or other secondary causes. Studies were excluded if they lacked full-text availability, were duplicate publications, or were conference abstracts without sufficient methodological details. To ensure the quality and reliability of included studies, we excluded those with unclear diagnostic criteria, sample sizes smaller than 30 participants, or insufficient data to calculate diagnostic accuracy measures. Only studies with observational designs (cross-sectional, case-control, or cohort) that specifically focused on the diagnostic performance of NITs were included.

Outcomes

The primary outcomes included absolute and relative Se and Sp, DOR, AUC and their respective 95% confidence intervals, which reflect the overall diagnostic accuracy of NITs in diagnosing NAFLD. Secondary outcomes included the superiority index (S index), results of subgroup analyses (eg, by disease severity or population characteristics). AUC values above 0.7 were considered indicative of acceptable diagnostic performance, while absolute Se and Sp values ≥0.8 were deemed clinically meaningful. To handle different reporting formats, we extracted or converted available data to calculate true positives, false positives, true negatives, and false negatives, ensuring consistent metric derivation across studies.

Search Strategy

We searched seven databases (PubMed, Embase, Cochrane Library, SinoMed, CNKI, VIP, and WANGFANG) for diagnostic studies up to April 28, 2024. The search strategy, including specific search terms, Medical Subject Headings (MeSH), and Boolean operators, was detailed in Supplementary eTable 2. In addition, we performed manual searches of reference lists from relevant articles to identify additional studies. Efforts were also made to include grey literature by reviewing conference proceedings, dissertations, and unpublished data when available, aiming to minimize publication bias. Published journal articles and grey literature were both considered for inclusion in the analysis if they met the eligibility criteria.

Study Selection and Data Extraction

Two investigators independently screened the study titles and abstracts to identify studies meeting the criteria for full-text evaluation. Any disagreements at the title and abstract stage were discussed and resolved through consensus. Unresolved discrepancies were addressed by involving a senior third investigator for adjudication. All investigators underwent standardized training before the screening process, including calibration exercises, to ensure consistency in applying the inclusion and exclusion criteria. We used Endnote 21 software to record study selection decisions. Reasons for exclusion were systematically recorded at each stage (title/abstract and full-text screening) to ensure transparency and reproducibility. Diagnostic data (TP, FP, TN, FN), age, sex, disease condition, sample size, and research center information, were extracted. In cases where TP, TN, FP, or FN were not reported, they were calculated from known variables (Se, Sp, NPV, or PPV).

Quality Assessment and Publication Bias

The methodological quality of the included studies was assessed by two independent investigators using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) framework.19 The QUADAS-2 tool evaluates four key domains: patient selection, index test, reference standard, and flow and timing. Each domain was assessed for risk of bias, and the first three domains were also evaluated for concerns regarding applicability. Predefined signaling questions were used to guide judgments for each domain, with disagreements resolved through discussion or consultation with a third investigator. Quality scoring criteria were determined by categorizing studies as “low”, “moderate”, or “high” quality based on the level of risk of bias and applicability concerns identified in the QUADAS-2 assessment. Studies rated as having a high risk of bias in multiple domains were classified as low-quality. Low-quality studies were included in the primary analysis to ensure a comprehensive synthesis of the available evidence. However, their potential impact on the robustness of the results was systematically addressed. Publication bias was assessed using funnel plots in STATA 17.0.

Data Synthesis

In the pairwise meta-analysis, we assessed Se, Sp, PPV, NPV, DOR, AUC and corresponding 95% confidence intervals (CIs). Head-to-head analyses were conducted only when more than ten studies were included. The Cochrane’s Q test and the inconsistency index (I2) test were used to assess statistical heterogeneity. When little to no heterogeneity was present between studies (I2 < 25%), a fixed-effect model was used for data pooling. If substantial heterogeneity was present (25% < I2 < 95%) and clinical heterogeneity was deemed acceptable, we applied a random-effects model. I2> 50%, the meta-regression or sensitivity analysis were conducted to explore the sources of heterogeneity. If I² > 95% or if significant clinical heterogeneity was present, we refrained from pooling quantitative data. The funnel plot method was used to determine publication bias. The pairwise meta-analysis and meta-regression were performed using STATA software, version 17.0. Only studies with complete data or data that could be manually calculated were included, no missing data were present in this work.

The network meta-analysis was conducted with Bayesian framework using R 4.0.3. Data preprocessing included calculating true positives, true negatives, patients correctly diagnosed by the gold standard, and those excluded. Diagnostic test names were replaced with numeric codes. Preprocessed data were analyzed using an ANOVA random-effects model.20,21 The model assumed a common-effect parameter across comparisons for diagnostic odds ratios (DOR). Rankings based on DOR values indicated the diagnostic test’s discrimination capability, with higher values indicating better discriminatory test performance.22 The primary outcomes included DOR, sensitivity, specificity, AUC, and 95% CIs. Bayesian calculations were performed using the Rstan package, specifying parameters like MCMC chain count, iteration count, and operational cycles (initially set at four chains, 50,000 iterations, with a 5-interval random-effects model to mitigate initial value impact). Default non-informative priors provided by the Rstan package were used for the Bayesian analysis, ensuring minimal prior influence on the parameter estimates. Model convergence was assessed using the potential scale reduction factor (R-hat). R-hat values below 1.1 were considered indicative of good convergence. Any convergence issues identified were resolved by increasing the number of iterations or adjusting model priors to improve stability. Network plots and inconsistency evaluation for NMA were not conducted, because all NITs were compared to liver biopsies, Closed-loop structures were failed to establish, which are essential for generating network plots and evaluating the consistency between direct and indirect evidence, were therefore absent.

Subgroup Analyses

For the different stages of NAFLD, we performed several subgroup analyses based on the steatosis stage, fibrosis stage, NAFLD with fibrosis, NASH, and high-risk NASH. We defined “significant fibrosis” as ≥ S2 on the Scheuer and METAVIR scales or ≥ S3 on the Ishak scale, and “advanced fibrosis” as ≥ S3 on the Scheuer and METAVIR scales or S4 on the Ishak scale.23–26 Meta-regression was conducted to explore potential sources of heterogeneity, considering factors such as sample size, publication year, number of centers, funding, and the four QUADAS-2 domains: patient selection, index test, reference standard, and flow and timing.

ResultsEligible Studies and Study Characteristics

We identified 15,877 studies, and after removing duplicates, 4,213 studies underwent screening (Figure 1). A full-text review of 738 studies resulted in the inclusion of 180 studies evaluating 153 NITs. All studies utilized liver biopsy as the gold standard comparator. Geographically, most studies were conducted in China (62 studies) and the USA (33 studies). The characteristics and reference citations of the included studies are detailed in Supplemental eTable 3. To enhance clarity, NITs were categorized into serological biomarker models and imaging-based tests. Among the serological models, the FIB4 index was the most studied (52 studies, 64 arms), followed by the NAFLD Fibrosis Score (NFS, 39 studies, 47 arms), the APRI (34 studies, 45 arms). For imaging-based tests, transient elastography (TE) had the most studies (43 studies, 100 arms), followed by PDFF (12 studies, 18 arms), MRE (11 studies, 19 arms). Details shown in Figure 2. Additionally, 94 NITs were evaluated in only one study each.

Figure 1 Flow diagram showing the inclusion of studies from the literature review.

Figure 2 Evidence Map of NITs (studies or multi-arm > 10) in the prognostic of NAFLD. Evidence map displaying the distribution of NITs based on their modality and literature size. The X-axis distinguishes between imaging-based NITs (left) and serum-based NITs (right). The Y-axis represents the number of the literature.

Pairwise Meta-Analysis for DTA

A direct meta-analysis was conducted to evaluate the diagnostic accuracy of 11 NITs for NAFLD across various disease stages. The analysis included data from 101 studies and focused on detecting steatosis (S1 and S2), significant fibrosis (≥ F2), advanced fibrosis (≥ F3), and high-risk NASH. Imaging-based markers, such as MRE, TE, ARFI, and PDFF, generally outperformed serum-based markers in diagnosing fibrosis stages, while simpler indices (APRI, AAR, and BARD) showed lower diagnostic accuracy. Details see Table 1. No significant publication bias was identified (Figure 3). SROC curve see Figure 4, forest plot see Figure 5.

Table 1 Pairwise Meta-Analysis of Diagnostic Accuracy for NAFLD

Figure 3 Deeks’ Funnel plots for diagnostic comparisons.

Abbreviations: AAR, Aspartate Aminotransferase to Alanine Aminotransferase Ratio score; APRI, Aspartate Aminotransferase to Platelet Ratio Index; F3, Fibrosis Stage 3; F2, Fibrosis Stage 2; NFS, NAFLD Fibrosis Score; MRE, Magnetic Resonance Elastography; TE, Transient Elastography; ARFI, Acoustic Radiation Force Impulse Imaging; PDFF, Proton Density Fat Fraction.

Figure 4 SROC Curve Comparing Various NITs for Diagnosing NAFLD in Direct Comparison.

Figure 5 Forest Plot Comparing Various NlTs for Diagnosing NAFLD in Direct Comparison.

For advanced fibrosis (≥ F3), MRE achieved the highest accuracy (AUC 0.93, 95% CI 0.90–0.95), with Se 0.86 (95% CI 0.80–0.90), Sp 0.91 (95% CI 0.86–0.94), and DOR 60 (95% CI 31–119). Among serum markers, FIB-4 (AUC 0.84, 95% CI 0.81–0.87) and NFS (AUC 0.82, 95% CI 0.78–0.85) performed better than APRI (AUC 0.79, 95% CI 0.75–0.82). For significant fibrosis (≥ F2), imaging tests, MRE showed strong diagnostic utility with DOR values above 31 and AUC 0.92. Serum markers, such as FIB-4 (AUC 0.80, 95% CI 0.76–0.83) and PRO-C3 (AUC 0.83, 95% CI 0.80–0.86), exhibited moderate performance. For steatosis (S1 and S2), PDFF achieved excellent diagnostic accuracy with an AUC of 0.95 (95% CI 0.93–0.96) for S1 and 0.89 (95% CI 0.86–0.92) for S2. TE also demonstrated high accuracy for S1 with an AUC of 0.92 (95% CI 0.89–0.94). Meta-regression analyses suggested that the heterogeneity in the diagnostic accuracy of NITs for NAFLD may be due to differences in study design, threshold settings, study populations, and performance indicators across studies. Details in Figure 6.

Figure 6 Meta-regression in Direct Comparison of NITs for Diagnosing NAFLD. The meta-regression analysis includes the following covariates: Size (sample size), Year (year of paper publication), Patient (selection bias in patient inclusion), Gold (implementation of the gold standard), Se (sensitivity), Center (number of research centers), Test (bias in test implementation or interpretation), and Funding (research funding sources).

Abbreviations: AAR, Aspartate Aminotransferase to Alanine Aminotransferase Ratio score; APRI, Aspartate Aminotransferase to Platelet Ratio Index; ARFI, Acoustic Radiation Force Impulse Imaging; F3, Fibrosis Stage 3; F2, Fibrosis Stage 2; MRE, Magnetic Resonance Elastography; NFS, NAFLD Fibrosis Score; S1, Steatosis Stage 1; TE, Transient Elastography; 95% CI, 95% Confidence Interval.

Network Meta-Analysis for DTA

A total of 180 studies were available for NMA. Seven subgroup analyses were conducted using disease stage as the grouping criterion, including steatosis ≥ S1, ≥ S2, NAFL vs NASH, NAFLD with fibrosis, NAFLD ≥ F3, NAFLD ≥ F2, and high-risk NASH.

For steatosis (≥ S1), H-MRS achieved the highest diagnostic accuracy (DOR 15,745,657.6, 95% CI 17.2–1,014,063.59), followed by NLV (DOR 515,080.28, 95% CI 3.05–295,953.65) and among serum markers, ALT exhibited the best performance (DOR 722.13, 95% CI 7.48–3,133.23), followed by TyG (DOR 241.24, 95% CI 2.56–1,229.06). For steatosis (≥ S2), HRI (DOR 127.23, 95% CI 7.69–594.44) and MRS (DOR 116.52, 95% CI 5.21–595.46) outperformed PDFF (DOR 28.88, 95% CI 11.98–56.75) and TE (DOR 10.7, 95% CI 3.72–22.69). Details are provided in Table 2 and 3.

Table 2 27 NITs Diagnostic for Steatosis ≥S1 ANOVA Model Analysis Results

Table 3 4 NITs Diagnostic for Steatosis ≥S2 ANOVA Model Analysis Results

For distinguishing NAFLD and NASH, MRE showed the highest performance (DOR 4360974.03, 95% CI 210.74–3,589,007.06), followed by HsCRP (DOR 942765.03, 95% CI 4.88–154,229.3), IL-1RA (DOR 1234.67, 95% CI 10.55–7459.82). Other NITs, such as miRNA-99a, A1AT, PC, FNI, SAN, NI-NASH-DS, demonstrated moderate accuracy (DORs ranging from 282.72 to 42.95). Details are provided in Table 4.

Table 4 40 NITs Diagnostic for NAFL Vs NASH ANOVA Model Analysis Results

For NAFLD with any stage of fibrosis, sH2a showed the best performance (DOR 40.64, 95% CI 0.64, 244.34), followed by IgA (DOR 27.38, 95% CI 0.78, 157.05), sICAM-1 (DOR 23.18, 95% CI 0.45, 152.53), AAR score (DOR 18.24, 95% CI 0.33–108.01), NIS (DOR 10.13, 95% CI 0.34–59.72), and TE (DOR 9.98, 95% CI 1.7–32.59). Details are shown in Table 5.

Table 5 15 NITs Diagnostic for NAFLD With Fibrosis ANOVA Model Analysis Results

For significant fibrosis (≥F2), imaging-based NITs demonstrated the highest diagnostic accuracy, with HRI leading (DOR 80.94, 95% CI 6.46–391.41), followed by MRE (DOR 22.36, 95% CI 13.17–36.15) and 2D-SWE (DOR 21.72, 95% CI 5.59–61.32). Among serum-based NITs, HSI showed the best performance (DOR 63.75, 95% CI 13.13–196.85), followed by SHG B-index (DOR 40.57, 95% CI 4.84–160.84), FLI (DOR 30.75, 95% CI 6.37–93.26), Ali2021 (DOR 28.01, 95% CI 4.85–89.58), LRM (DOR 25.45, 95% CI 6.99–64.38). Details are shown in Table 6.

Table 6 40 NITs Diagnostic for NAFLD ≥F2 ANOVA Model Analysis Results

For advanced fibrosis (≥F3), serum-based NITs showed the highest diagnostic accuracy, with CK-18 leading (DOR 102654.16, 95% CI 1.6–134,059.8), followed by SHG B-index (DOR 1149.9, 95% CI 11.35–5983.64), LFS (DOR 793.48, 95% CI 4.17–5316.76), LPR (DOR 239.87, 95% CI 3.96–1371.41), FM (DOR 78.67, 95% CI 4.6–382.09), HSI (DOR 61.12, 95% CI 3.53–306.22), FLI (DOR 52.18, 95% CI 3.43–247.93), 10 metabolites (DOR 50.89, 95% CI 1.88–271.77), and Hepascore (DOR 47.17, 95% CI 10.81–141.89). Among imaging-based NITs, MRE had a DOR of 51.11 (95% CI 18.71–115.19). Details are shown in Table 7.

Table 7 64 NITs Diagnostic for Advanced Fibrosis (≥F3) ANOVA Model Analysis Results

For high-risk NASH (NAS≥4, ≥F2), imaging-based NITs demonstrated varying levels of diagnostic accuracy, with Real-Time Elastography showing the highest performance (DOR 18.1, 95% CI 0.7–96.33), followed by MRE (DOR 11.27, 95% CI 0.64–53.29) and TE (DOR 6.36, 95% CI 1.77–16.52). Among serum-based NITs, CK-18 achieved the most remarkable performance (DOR 39.46, 95% CI 0.48–267.09), followed by ELF (DOR 34.64, 95% CI 0.83–181.08), C-DAG (DOR 29.34, 95% CI 1.59–144.73), FNI (DOR 17.36, 95% CI 2.48–59.67), MACK-3 (DOR 10.63, 95% CI 1.82–38.35), MEFIB (DOR 8.34, 95% CI 1.47–27.15), NIS (DOR 7.9, 95% CI 1.62–23.06), FAST (DOR 4.67, 95% CI 2.27–8.35), APRI (DOR 3.86, 95% CI 1.27–9), FIB4 (DOR 3.78, 95% CI 1.57–7.51), and NFS (DOR 2.93, 95% CI 0.97–6.55). Details are shown in Table 8.

Table 8 14 NITs Diagnostic for High-Risk NASH (NAS≥4, ≥F2) ANOVA Model Analysis Results

Quality of Evidence

Two authors critically appraised the 180 studies using the critical criteria of the QUADAS-2 assessment criteria. Details are shown in Figure 7. For applicability concerns, the included studies aligned well with the review questions. Specially, 100% (180/180) of reference standard, 97.2% (175/180) of patient selection, 97.8% (176/180) of index test had low applicability concerns. However, regarding the risk of bias, 75.0% (135/180) of the studies needed a clear description (unclear or high risk), with only 23.3% (42/180) being properly reported. The same issue occurred in the index test domain, where 81.1% (146/180) of the studies needed more detailed descriptions of NITs.

Figure 7 QUADAS-2 quality evaluation results of 180 included studies.

Discussion

To serve as a reference for selecting non-invasive diagnostic tools for identifying different stages of NAFLD in clinical practice, we compiled and analyzed diagnostic performance data from various non-invasive tests found in the existing literature and ranked their accuracy. In the context of diagnosing steatosis, our analysis revealed that for S1 steatosis, H-MRS demonstrated the highest diagnostic efficacy, followed by NLV, PDFF, HSI, VTQ, MRI, UPGAP, and UAP, with TE showing the lowest diagnostic efficacy. Among serum markers, ALT, TyG, FAI, VAI, and CK18-M30 showed higher diagnostic accuracy. For S2 steatosis, HRI exhibited the highest diagnostic accuracy, followed by MRS, PDFF, and TE. However, further evidence is required to validate the diagnosis of S2 steatosis using serological NITs. A meta-analysis by Yokoo et al indicated that MRS had higher accuracy in diagnosing mild hepatic steatosis than other NITs, achieving sensitivity and specificity rates of 77–95% and 81–97%, respectively.27 Nonetheless, the clinical application of MRS is limited by its long scanning time, high cost, and potential for sampling error. In clinical practice, TE is frequently utilized as an NIT for screening hepatic steatosis and fibrosis. However, our study suggests that TE has low sensitivity for detecting early steatosis (S1), which aligns with previous findings.28,29 This lower sensitivity may be attributed to its firm reliance on operator skill, inability to measure liver fat content precisely, and susceptibility to obesity, which can diminish diagnostic accuracy. MRE showed the highest efficacy in distinguishing between NAFL and NASH, followed by various serological markers, with HsCRP and IL-1RA ranking amongst the top three. In our study, ION ranked 34th with a DOR of 6.02 (95% CI 0.45–27.65). A previous study suggested that ION could effectively differentiate between MASLD and MASH, achieving an AUC of nearly 0.85. This best performance may stem from their study’s defining of specific features of MASH, such as hepatocellular ballooning and lobular inflammation.30 Additionally, the evidence map (Figure 2) indicates insufficient evidence for diagnosing S2 steatosis using serological NITs. It may be related to the inadequate sensitivity of the serological markers for S2, particularly in the early stages of inflammation before it progresses to overt liver fibrosis. NAFLD is a dynamic disease, and S2 steatosis signifies an increased fat load. Factors initiating inflammation and fibrosis—such as reactive oxygen species and pro-fibrotic agents—may be significantly heightened during this stage. This increase could interfere with serological NITs, making them less accurate in assessing S2 steatosis and more focused on evaluating fibrosis progression. This could help explain the lack of robust evidence, and further mechanistic studies are needed to support this hypothesis.

The largest number of NITs studies diagnosed advanced fibrosis, revealing up to 64 tests identified within the advanced fibrosis subgroup. Among the serological markers, CK18, SHG B-index and LFS had the highest overall performance. Among imaging-based NITs, HSI, MRE and TE ranked the top three. ELF, the first FDA-approved test for the assessing advanced fibrosis, ranked 39th in our NMA (DOR 7.84, 95% CI 1.98–20.72). In addition, ELF is included in the AACE and AGA guidelines for identifying “high-risk NASH”. Our results support this approach, as ELF showed strong performance in the high-risk NASH subgroup, ranked 2th with a DOR of 34.64 (95% CI 0.83 to 181.08), just behind CK18. An analysis showed that FIB-4 was superior to APRI as a predictor of advanced fibrosis, with a validated AUC (FIB-4 vs APRI 0.802 vs 0.73),31 which is consistent with our conclusion (FIB-4 vs APRI 0.84 vs 0.79).

In the subgroup with significant fibrosis, the HRI, HSI, and MRE demonstrated high diagnostic accuracy in identifying this condition. The SHG B-index, FLI, Ali2021, and LRM exhibited the highest diagnostic efficacy among the serological markers. While one study indicated that the APRI test was the most effective NIT for identifying at least F2 fibrosis (with an AUC of 0.735),30 we found that APRI identified at least F2 fibrosis with an AUC of 0.75 (95% CI 0.71–0.79). The DOR for indirect comparisons was 5.18 (95% CI 3.30–7.74). It indicates a close value; however, other serological NITs with higher diagnostic accuracy were identified in this study.

In diagnosing high-risk NASH, the three serological markers that demonstrated the best overall performance were CK-18, ELF, C-DAG. Among the imaging-based NITs, transient elastography and MRE demonstrated the best performance. In recent years, some studies have emphasized the importance of combining multiple NITs and adopting a stepwise screening approach to identify high-risk NASH, suggesting that relying on the diagnostic efficacy of any single NIT alone may be insufficient to meet clinical needs.32,33 We are trying to reveal the diagnostic performance of each test, providing specific data to optimize screening strategies.

Our study demonstrated a pooled AUC of 0.86 (95% CI 0.83 to 0.89) for CK18-M30 in diagnosing NASH. This is comparable to the AUC of 0.82 (95% CI 0.76 to 0.88) reported in a meta-analysis of previous studies. The CK18 fragment has been recognized as a marker for NASH in several research studies. However, due to the absence of a definitive threshold and the continual emergence of new NITs, CK18 is not currently recommended as a standard diagnostic tool in clinical practice guidelines, and its potential as a diagnostic indicator has yet to be thoroughly validated.

This study extensively examines the diagnostic accuracy of NAFLD using serological and imaging NITs, identifying a wide range of potential non-invasive diagnostic indicators. Combining multiple NITs with multi-step diagnosis is a hot topic in the diagnosis of NAFLD. Clarifying the diagnostic value of individual NITs contributes to the advancement of research in this area. We recognize that the strength of this evidence presented needs to be stronger. There are several limitations to this systematic review: (1) Many NITs rely on a single study, which may impact the statistical results and introduce confounding bias. (2) Our search was limited to studies published in Chinese and English databases, potentially leading to information bias due to excluding studies in other languages. (3) Errors in fibrosis staging often occur during clinical diagnoses, which may also contribute to information bias. (4) The included studies exhibited high heterogeneity. Through meta-regression and subgroup analysis, we found that this heterogeneity may be attributable to the design or quality of the studies, variations in the design of each original study (such as differing threshold settings), differences in the study populations, and inconsistent performance indicators across the studies.

Conclusion

The increasing number of non-invasive tests for diagnosing NAFLD creates challenges in their selective use. To help clinicians make informed decisions, we conducted a network meta-analysis ranking the accuracy for these tests. Despite the promising results we report here, not all NITs demonstrate strong accuracy, and further validation with larger datasets is necessary. Additionally, methods such as threshold calibration, cross-study validation, and clinical decision curve analysis can help clarify the benefits and risks associated with different thresholds in clinical decision-making. This approach aims to optimize clear thresholds that will guide clinical decisions and encourage research on threshold models to determine treatment thresholds, test-treatment thresholds, and testing thresholds. Future studies should focus on key aspects such as strengthening methodological reporting and providing more reliable evidence for accurate diagnosis. It is essential to ensure that the cases included in the studies are either consecutive or randomly selected. The diagnostic threshold for NITs should be established before conducting the study, rather than determining it post-analysis or leaving it unclear. Moreover, those interpreting the results should avoid being influenced by knowledge of gold standard results and should implement blinding in their analyses.

Funding

The research work was supported by Shenzhen Science and Technology Program (ZDSYS20210623092000002), Guangdong Fundamental and Applied Basic Research Program (2022B1515120034).

Disclosure

The authors report no conflicts of interest in this work.

References

1. Boyle M, Tiniakos D, Schattenberg JM, et al. Performance of the PRO-C3 collagen neo-epitope biomarker in non-alcoholic fatty liver disease. JHEP Rep. 2019;1(3):188–198. doi:10.1016/j.jhepr.2019.06.004

2. Zhou J, Zhou F, Wang W, et al. Epidemiological features of NAFLD from 1999 to 2018 in China. Hepatology. 2020;71(5):1851–1864. doi:10.1002/hep.31150

3. Man S, Deng Y, Ma Y, et al. Prevalence of liver steatosis and fibrosis in the general population and various high-risk populations: a nationwide study with 5.7 million adults in China. Gastroenterology. 2023;165(4):1025–1040. doi:10.1053/j.gastro.2023.05.053

4. Wang JL, Jiang SW, Hu AR, et al. Non-invasive diagnosis of non-alcoholic fatty liver disease: current status and future perspective. Heliyon. 2024;10(5):e27325. doi:10.1016/j.heliyon.2024.e27325

5. Sumida Y, Yoneda M, Hyogo H, et al. Validation of the FIB4 index in a Japanese nonalcoholic fatty liver disease population. BMC Gastroenterol. 2012;12:2. doi:10.1186/1471-230X-12-2

6. Cusi K, Isaacs S, Barb D, et al. American Association of Clinical Endocrinology Clinical Practice Guideline for the diagnosis and management of nonalcoholic fatty liver disease in primary care and endocrinology clinical settings: co-sponsored by the American Association for the Study of Liver Diseases (AASLD). Endocr Pract. 2022;28(5):528–562. doi:10.1016/j.eprac.2022.03.010

7. Fouad Y, Esmat G, Elwakil R, et al. The egyptian clinical practice guidelines for the diagnosis and management of metabolic associated fatty liver disease. Saudi J Gastroenterol. 2022;28(1):3–20. doi:10.4103/sjg.sjg_357_21

8. Kanwal F, Shubrook JH, Adams LA, et al. Clinical care pathway for the risk stratification and management of patients with nonalcoholic fatty liver disease. Gastroenterology. 2021;161(5):1657–1669. doi:10.1053/j.gastro.2021.07.049

9. Kang SH, Lee HW, Yoo JJ, et al. KASL clinical practice guidelines: management of nonalcoholic fatty liver disease. Clin Mol Hepatol. 2021;27(3):363–401. doi:10.3350/cmh.2021.0178

10. Francque S, Lanthier N, Verbeke L, et al. The Belgian association for study of the liver guidance document on the management of adult and paediatric non-alcoholic fatty liver disease. Acta Gastroenterol Belg. 2018;81(1):55–81.

11. Chalasani N, Younossi Z, Lavine JE, et al. The diagnosis and management of nonalcoholic fatty liver disease: practice guidance from the American association for the study of liver diseases. Hepatology. 2018;67(1):328–357. doi:10.1002/hep.29367

12. Gruneau L, Kechagias S, Sandström P, Ekstedt M, Henriksson M. Cost-effectiveness analysis of noninvasive tests to identify advanced fibrosis in non-alcoholic fatty liver disease. Hepatol Commun. 2023;7(7). doi:10.1097/HC9.0000000000000191

13. Li G, Zhang X, Lin H, Liang LY, Wong GL, Wong VW. Non-invasive tests of non-alcoholic fatty liver disease. Chin Med J. 2022;135(5):532–546. doi:10.1097/CM9.0000000000002027

14. Tilg H, Adolph TE, Moschen AR. Multiple parallel hits hypothesis in nonalcoholic fatty liver disease: revisited after a decade. Hepatology. 2021;73(2):833–842. doi:10.1002/hep.31518

15. Thiele MS, Péloquin S, Valenti LN, et al. Assessing the applicability of non-invasive diagnostic tests (NITs) in non-alcoholic fatty liver disease: an international qualitative study. J Hepatol. 2022;77:S438–S439. doi:10.1016/S0168-8278(22)01213-2

16. Chow KW, Futela P, Saharan A, Saab S. Comparison of guidelines for the screening, diagnosis, and noninvasive assessment of nonalcoholic fatty liver disease. J Clin Exp Hepatol. 2023;13(5):783–793. doi:10.1016/j.jceh.2023.01.016

17. Hutton B, Salanti G, Caldwell DM, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162(11):777–784. doi:10.7326/M14-2385

18. McInnes MDF, Moher D, Thombs BD, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–396. doi:10.1001/jama.2017.19163

19. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–536. doi:10.7326/0003-4819-155-8-201110180-00009

20. Nyaga VN, Aerts M, Arbyn M. ANOVA model for network meta-analysis of diagnostic test accuracy data. Stat Methods Med Res. 2018;27(6):1766–1784. doi:10.1177/0962280216669182

21. Wu JL, Pan B, Ge L, Zhang JH, Yuan CZ, Tian JH. ANOVA model for Bayesian network meta-analysis of diagnostic test accuracy. Chinese J of Evidence-Based Med. 2017;17(09):1111–1116.

22. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129–1135. doi:10.1016/S0895-4356(03)00177-X

23. Desmet VJ, Gerber M, Hoofnagle JH, Manns M, Scheuer PJ. Classification of chronic hepatitis: diagnosis, grading and staging. Hepatology. 1994;19(6):1513–1520. doi:10.1002/hep.1840190629

24. Bedossa P, Poynard T. An algorithm for the grading of activity in chronic hepatitis C. The METAVIR cooperative study group. Hepatology. 1996;24(2):289–293. doi:10.1002/hep.510240201

25. Ishak K, Baptista A, Bianchi L, et al. Histological grading and staging of chronic hepatitis. J Hepatol. 1995;22(6):696–699. doi:10.1016/0168-8278(95)80226-6

26. Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD practice guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023;77(5):1797–1835. doi:10.1097/HEP.0000000000000323

27. Yokoo T, Bydder M, Hamilton G, et al. Nonalcoholic fatty liver disease: diagnostic and fat-grading accuracy of low-flip-angle multiecho gradient-recalled-echo MR imaging at 1.5 T. Radiology. 2009;251(1):67–76. doi:10.1148/radiol.2511080666

28. Zhang X, Wong GL, Wong VW. Application of transient elastography in nonalcoholic fatty liver disease. Clin Mol Hepatol. 2020;26(2):128–141. doi:10.3350/cmh.2019.0001n

29. Gao Q, Ding JP, Wang FY. Advances in quantitative diagnostic imaging of nonalcoholic fatty liver disease. Int J Med Radiol. 2018;41(04):436–439.

30. Kouvari M, Valenzuela-Vallejo L, Guatibonza-Garcia V, et al. Liver biopsy-based validation, confirmation and comparison of the diagnostic performance of established and novel non-invasive steatotic liver disease indexes: results from a large multi-center study. Metabolism. 2023;147:155666. doi:10.1016/j.metabol.2023.155666

31. Shah AG, Lydecker A, Murray K, Tetri BN, Contos MJ, Sanyal AJ. Comparison of noninvasive markers of fibrosis in patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2009;7(10):1104–1112. doi:10.1016/j.cgh.2009.05.033

32. Younossi Z, Alkhouri N, Cusi K, et al. A practical use of noninvasive tests in clinical practice to identify high-risk patients with nonalcoholic steatohepatitis. Aliment Pharmacol Ther. 2023;57(3):304–312. doi:10.1111/apt.17346

33. Younossi ZM, Pham H, Felix S, et al. Identification of high-risk patients with nonalcoholic fatty liver disease using noninvasive tests from primary care and endocrinology real-world practices. Clin Transl Gastroenterol. 2021;12(4):e00340. doi:10.14309/ctg.0000000000000340

留言 (0)

沒有登入
gif