In this international study, ML-based analyses were used to develop the modified TRANSSPHER grade. The score showed a refined ability to predict surgical complexity compared to the state-of-the-art TRANSSPHER grade, as shown by ROC analyses and AIC comparisons. These refinements build on the strengths of the original scale by retaining key features such as tumor size, suprasellar extension, and cavernous sinus infiltration, while incorporating radiological measures of tumor consistency. A better prognostication of surgical complexity may have significant implications. The simplicity and rapid calculation of the m-TRANSSPHER grade make it a valuable tool for preoperative counseling, enabling patients to be informed about the surgical goal and the potential need for closer postoperative monitoring. Additionally, the score may support multidisciplinary pituitary board discussions to better stratify expected outcomes. This could be especially relevant for frail patients, where a careful evaluation of the risk–benefit trade-off is essential to guide management decisions. Furthermore, the score may contribute to standardizing the reporting of surgical complexity in future studies on NFPAs.
The ML algorithms employed to develop the m-TRANSSPHER grade showed robust performances. Particularly, the RFT classifier was able to predict GTR in NFPAs with high precision and sensitivity (weighted F1 score of 0.74 and 0.87 in the development and external validation cohorts, respectively). Predictive models generally outperform explanatory or descriptive statistics in accurately quantifying the predictability of measurable phenomena [14, 24]. Therefore, predictive models may provide a more accurate estimate of the possibility of predicting a certain phenomenon, such as tumor GTR [14]. The high accuracy of the hereby proposed ML models to correctly classify GTR in NFPAs yields a new, more reliable [12, 24] and strong “reality check[14]” of the predictive value of the aforementioned variables and about the effectivness of preoperative tools for assessing surgical complexity in pituitary surgery.
Predictors of surgical complexity in NFPAsInvasion of the cavernous sinus deeply affects pituitary surgical complexity as infiltrative pituitary adenomas tend to invade not only the medial wall of the cavernous sinus [16, 25], whose initial tumor infiltration still remains surgically amenable in experienced hands [26] but also other regions of the cavernous sinus dural enclosure, the internal carotid artery adventitia, and the cranial nerves [27]. Despite discriminating cavernous sinus invasion from compression might be challenging on preoperative MRI [28], the preoperative Knosp classification has been widely validated as a predictor of tumor resection in pituitary surgery [7, 9, 10, 16, 25, 29, 30]. The results of our predictive models are in line with the previous literature, with invasive NFPAs (tumor Knosp grade greater than 2) being strongly associated with reduced chances of GTR.
The reliability of tumor maximum diameter to prognosticate EOR in NFPAs has been largely described [10, 29, 31]. Nonetheless, the debate about the best method to assess tumor size when predicting tumor resection is still an object of open debate. Volumetric measures have been claimed to be more reliable than bi-dimensional measurements since they consider all three planes in which cross-sectional length can be measured [32]. On the other hand, tumor maximum diameter in any plane represents a simple and reproducible way to assess tumor size, with the advantage of being less time-consuming compared to tumor volume. Thus, the inclusion of this measure fits well with the objective of this study: that is, to translate the results of complex ML algorithms into a simple tool to predict surgical complexity. Further, the concurrent inclusion of measures of antero-posterior, cranio-caudal and transverse tumor invasion should limit the drawbacks of using a bi-dimensional measure of tumor size. In line with this hypothesis, Mooney et al. [10] already found no significant differences between the predictive value of tumor diameter and tumor volume when predicting GTR in NFPAs along with measures of para-sellar tumor invasion. We used a cut-off of 40 mm as it is still widely recognized as pivotal to distinguish giant pituitary adenomas from their counterpart [5].
Even though the importance of tumor consistency for pituitary surgery has been documented for a long time [33], its prediction using simple T2w images has been challenging for a while [34]. Recently, a new parameter of tumor consistency (the T2SIR) has been proposed [9]. Overcoming the limitations of early methods, the T2SIR was built with high attention to the standardization and heterogeneity of signal intensity measurements of NFPAs. Thus, it proved to be a reliable predictor of NFPAs’ collagen content, wigh high sensitivity and specificity [9]. Further, the T2SIR was strongly associated with tumor extent of resection; firm tumors showed reduced T2SIR values and were less likely to receive GTR [9]. The results hereby reported replicated those findings. The hierarchical tree showed that almost 40% of invasive NFPAs, which by definition are unlikely to receive GTR, are still completely resectable if they have soft consistency (T2SIR greater than 0.6). On the other hand, non-invasive tumors offer additional challenges to undergo GTR when they present with an intermediate-to-hard consistency (T2SIR less than 0.5). Tumor consistency was the predictive variable most exploited by the employed ML algorithms, confirming the importance of taking into account tumor texture to predict surgical complexity in NFPA trans-sphenoidal surgery.
The fourth feature composing the m-TRANSSPHER grade was the presence of a suprasellar tumor invasion. Again, the importance of this variable may be explained by inspecting the hierarchical tree. A subgroup of tumors that are not invasive (tumor Knosp grade less than 3) struggle to receive GTR when they show an intermediate-to-hard consistency and they grow in the suprasellar cistern toward the third ventricle or the frontal and temporal lobes (Hardy-Williams grade C-E). These tumors represents a real challenge for pituitary surgeons.
The inter-carotid distance was not included in the m-TRANSSPHER grade as it proved to be an unreliable predictor of NFPA surgical complexity. For tumors with a maximum diameter greater than 23 mm, increasing values of inter-carotid distance were related to reduced rates of GTR. On the contrary, for tumors with a maximum diameter of less than 23 mm, increasing values of inter-carotid distance were related to increased rates of GTR. The first counterintuitive result, which was in line with previous reports [10], is to be imputed to the internal carotid arteries’ displacement caused by large NFPAs. Therefore, increasing values of inter-carotid distance as a result of tumor growth relate to tumors of larger size that in turn are associated with reduced rates of GTR. Medium-to-small NFPAs do usually not displace the surrounding structures being confined in the sella turcica. In these cases, a wider surgical corridor related to greater rates of GTR (see the hierarchical tree in Fig. 2).
Finally, none of the employed ML models identified the rate of sellar floor tumor invasion as an important predictor of surgical complexity. This finding suggests that tumor infiltration toward the sphenoidal sinus or even downward is not as challenging as tumor invasion of the cavernous sinus or supra-sellar regions. Indeed, large tumors can still be amenable to GTR when they principally extend to the sphenoidal sinus or downward.
Uncovering the relationship between predictors of surgical complexity in NFPAsThe RFT classifier outperformed the other implemented algorithms, namely LR and DT models. The implications coming from this finding are twofold. Firstly, the relationship between predictors of surgical complexity in NFPAs is more complex than what linear models and classical statistics can describe. Tumor soft consistency is important per se to achieve GTR. Nonetheless, this can be even more significant for tumors infiltrating the cavernous sinus and/or for big tumors with supra-sellar extension. Secondly, the relationship between the aforementioned predictors is not hierarchical. The DT model, which uses a hierarchical tree structure, identified the Knosp grade as the most important feature to classify surgical complexity. This result aligns with the previous literature and also holds for the majority of patients included in this study. However, the RFT model showed increased classification performances by adopting a non-hierarchical, random split of a subset of predictors each time. In a clinical scenario, non-infiltrating Knosp grade 2 NFPAs with large size extending to the supra-sellar compartments and/or with hard consistency may offer additional surgical challenges compared to infiltrating Knosp grade 3 NFPAs with soft consistency, reduced size, and/or absent nodular supra-sellar extension.
Limitations and future prospectsA strength of this externally validated international ML-based study is that high predictive power did not come at the cost of interpretability. Rather, it was used to build a powerful simple prognostic scale of surgical complexity, and new insights into the relationship between predictors of GTR in NFPAs, and their association with the outcome variable, were described. However, the following limitations must be noted. Although we adhered to the TRIPOD guidelines for external validation—ensuring both geographical and temporal validation [20]— the relatively small cohort size may limit the generalizability of our findings, despite the model demonstrating minimal evidence of overfitting. Expanding validation to larger, multicenter cohorts will enhance the model’s applicability and robustness. Another limitation was the use of T2SIR as an indirect measure of tumor consistency. While T2SIR has been validated as a predictor of collagen content in NFPAs, direct assessments of tumor consistency are only feasible intraoperatively. Even if our findings provide further validation of T2SIR as an indicator of tumor consistency, future studies will explore the correlation between T2SIR and intraoperative consistency to strengthen its applicability. Additionally, we limited predictive features to six variables to enhance interpretability and reduce the risk of overfitting, potentially affecting predictive power [12, 35]. The algorithm’s performance on external validation supports this trade-off. Finally, because of the small numbers we cannot comment strongly on the distribution of surgical complications across the different grading classes of the score. Future investigations would be valuable.
留言 (0)