Explainable AI in radiology: a white paper of the Italian Society of Medical and Interventional Radiology

In recent years, artificial intelligence has rapidly entered diagnostic imaging, demonstrating a lot of potential, both as a catalyst of the workflow and as an aid to the interpretation of bio-images, becoming a promising engine of the decision support systems in radiology [1]. One of the major drivers behind the steady blossoming of AI in medical imaging is powered not only by the widespread availability of large data sets and advancements in both hardware and software systems, but the urge to achieve greater efficiency in clinical care and management. By providing quantitative image data with radiomics in combination with AI tools, AI in radiology smoothly embeds the essence of diagnostic, predictive, and prognostic applications [2]. The popular pillars for the key AI technologies shaping the future of radiologists cover image processing, computer vision, natural language processing, and much more [3]. Besides, the growing evidence indicates that AI algorithms provide support at all levels of radiology workflow management for a variety of non-diagnostic applications, such as quality, safety, and operational efficiency [1]. The integration of AI into the imaging workflow has the potential to enhance efficiency, minimize errors, and meet specific goals with minimal human intervention. [4]. However, due to the “black-box” nature of AI models, they are often perceived as being less trustworthy by physicians, which has limited their implementation in real-world clinical settings. [5]. To address this issue, the field of Explainable Artificial Intelligence (xAI) has been developed, with the goal of improving the interpretability of AI decisions. The focus of xAI is to create new techniques and algorithms that increase the transparency of the decisions accepted by algorithms and predictive models, thus the reliability and the impact of each feature on the outcome. [6].

This white paper of the Italian Society of Medical and Interventional Radiology (SIRM) is intended to aid radiologists, medical practitioners, and scientists in understanding an emerging field of xAI, enhancing awareness of the black-box problem behind the success of AI, increasing the knowledge of the xAI methods that enable to unveil the black-box into a glass-box, raising consciousness about the role, and the responsibilities of the radiologists for appropriate use of the AI-technology.

The clinical use of AI and the problem of the black-box

Currently, two primary AI methods are commonly employed in radiology. The first one adopts handcrafted engineered attributes, such as radiomics features, that are used as inputs in cutting-edge machine learning models trained to perform various clinical decision-making tasks [7]. The second method, based on deep neural networks or deep learning (DL), gained significant attention in the last decade [8, 9].

There are three primary types of machine learning algorithms including supervised learning, unsupervised learning, and reinforcement learning. In supervised learning algorithms, such as linear and multivariate regression, logistic regression, Naive Bayes, decision trees, k-nearest neighbor and linear discriminant analysis, the input data is labeled. In comparison to supervised learning, the unsupervised learning does not required labeled data. Clustering analysis, anomaly detection, hierarchical clustering, and principal component analysis represent unsupervised learning algorithms. Reinforcement learning is a more advanced machine learning algorithm that solves multi-level problems through learning [7]. DL is a relatively new area of study. While machine learning techniques rely on statistical methods to recognize patterns, DL resembles the human brain and it is best known for its neural network models. A deep neural network typically consists of three types of layers: the Input Layer, the Hidden Layer, and the Output Layer (Fig. 1).

Fig. 1figure 1

The architecture of the deep neural network consisting of the input layer, the hidden layer and the output layer

The Input Layer receives the input data, while the Hidden Layer performs various computations on that data. The Output Layer produces the final result. It is important to note that a neural network can have multiple hidden layers, allowing for more complex computations and predictions. One of the advantages of DL algorithms is their ability to learn characteristic attributes from data automatically, with no requirement for human experts to define them beforehand. With sufficient amounts of example data, DL models can identify abnormalities in tissue and avoid the need for human-defined segmentations, which allows for more abstract feature definitions and improves generalizability. DL's ability to learn complex data representations often makes it vigorous against unwelcome variations, including, for example, inter-reader variability, and further enables it to put into a wide range of clinical conditions and frameworks [7]. Table 1 summarizes the main advantages and disadvantages of machine learning and DL methods (Table 1).

Table 1 Advantages and disadvantages of machine learning and deep learning

The DL tools can generate extremely reliable outcomes, yet they own an intrinsic “opacity”, and although not entirely opaque, their behavior can be difficult to comprehend. Even experts at the highest level may struggle to fully understand the so-called “black-box” models, the reasonability through which models come to forecasting decisions in areas that are critical and relevant to our society, including healthcare information technology and medical imaging, may be still difficult [10]. The highly opaque nature or inexplicability of AI represents the main element of distrust on the part of medical professionals and patients towards this new technology [11]. This fact generates an obstruction to its practical application, which is particularly reflected in those susceptible fields, where automation influences the existence and survival of the human being, as in a particular way in the sector of healthcare. Applying AI to the field of medicine poses significant challenges. Medical decision-making typically involves uncertainty, incomplete and noisy data sets, and a high level of complexity [12]. As a result, transparency in AI models is particularly crucial in medical care, because of its inner ambiguous quality. While humans may not always be able to explain their reasoning, understanding how an AI model makes decisions can provide confidence in human–machine interactions [13]. With an increasing focus on incorporating ethical standards into AI technology design and implementation, there is a growing demand for “Trustable AI,” a term that with slight conceptual modification may encompass Valid AI, Responsible AI, Privacy-Preserving AI, and Explainable AI (xAI). In this context, the xAI aims to display cardinal issues about the decision-making process either for human or machine positions [10].

What does explainable AI mean?

The xAI is an emerging field with several new strategies and multiple ongoing studies that generate a significant impact on the development of AI in many different areas. Van Lent et al., put in place, first, the concept of xAI by describing their system's ability to explain AI-based predictions [14]. Although the term has been inconsistently applied, it generally refers to a class of systems that can shed light on how an AI system arrives at its settlements [15]. The xAI investigates the reasoning behind the decision-making process, outlines the system's strengths and weaknesses, and predicts the future conduct of the model [10].

Thus far, the xAI may be considered an umbrella term covering certain aspects of xAI [10, 16], including

Interpretability, refers to the understanding of the output of the algorithm for end-user implementation

Explainability, involves clarifying how a decision was reached so that a broader range of users can understand it.

Transparency, refers to the degree of the incomprehensibility of the model.

Justifiability, involves providing an in-depth case to support certain conclusions.

Contestability, relates to the fact that users are able to proclaim a particular decision.

In AI, there is often a negative association between the complexity or depth of a system and its interpretability. This inherent tension between predictive accuracy and explainability frequently results in the most accurate methods (such as DL) being the least transparent, while the most interpretable methods (like decision trees) are less accurate [17]. It is essential to attain a balance between the performance of the model and its interpretability, as the first concept will markedly improve patient care, while the second one will enhance the adoption and trust of AI in radiological practice [16].

Ethical, legal, and social issues (ELSI) of xAI

The pursuit of transparent and explainable AI in recent years has not only sparked significant research efforts in the field, but it has also become a central focus of many ethical and responsible design proposals [5, 11, 18]. Additionally, people often express concerns about privacy and security when it comes to AI technologies [19]. The need for greater clarity and transparency was recognized by various institutions. The European Commission has produced a white paper aimed at creating a regulatory framework for a digital ecosystem of trust in reliable AI, among which the fundamental ethical requirements identified are transparency and explainability. In the Ethical Guidelines for reliable AI document, drawn up by the High-Level Expert Group on AI of the European Union, the right is stated to “require an adequate explanation of the decision-making process” whenever AI “significantly affects the people's lives “ [20].

It is intrinsic that after human intelligence fails with significant consequences, the appropriate best practice is to find the root causes, make improvements, and learn from our own mistakes. In the case, the AI fails, it is important to acknowledge it, and increasingly, there is a demand for an explanation of what went wrong in the AI decision-making algorithm [21]. The practical outcome is to establish accountability both in the legal and social sense. Without a clear assignment of liability, it is unlikely that AI can be widely implemented in real-world situations. Therefore, an unforeseen legal challenge may arise, which could have significant implications. [21]. However, addressing only ethical or legal concerns surrounding AI may not be sufficient. All Ethical, Legal, and Social Issues (ELSI) of AI deserve equal attention and certainly should be ahead of AI and xAI implementation in healthcare, as the aim of an ELSI reflection is to provide decision-makers and stakeholders with a comprehensive understanding of the ethical, legal, and social issues associated with a particular technology or practice [22].

In recent times, explaining the output of AI systems has become a crucial issue, not just technically but also legally and politically. There is a general belief that explainable AI systems should be ethically desirable and possibly even legally necessary, which has driven much research in this area [23]. The question of transparency has been given significant attention in regulatory proposals at the EU level, particularly in the proposed Artificial Intelligence Act (AIA). However, discussions and consultations around regulating AI systems are ongoing, and the obligations for explainability under existing regulations and future policies are still being debated [18].

Solutions to the black box?—explainable AI models

Explainability methods, either in the research setting or legal communities, are being recommended as a practical means to increase transparency and discrimination in AI models [24].

A few proposals to classify the xAI techniques have been promoted so far based on the three fundamental dimensions [25]:

the xAI technique implementation stage (ante-hoc, post-hoc)

the xAI technique is intended to provide either a global explanation of the model or a local explanation of a prediction

the xAI technique is model-specific or model-agnostic

Figure 2 summarizes the simplified classification of xAI techniques with a diagrammatic view. (Fig. 2).

Fig. 2figure 2

The diagrammatic view of the classification of xAI techniques

Broadly speaking, two types of explainable AI models can be distinguished: post-hoc explainability, occurring after the event in question; and ante-hoc explainability, or so-called, inherent explainability, occurring before the event in question. The concept of xAI can be applied through two approaches: post-hoc and ante-hoc [12]. Post-hoc xAI involves the use of external explainers to interpret a trained model’s behavior during testing. In contrast, ante-hoc xAI incorporates explainability into the AI model's structure from the outset, prioritizing natural understandability while still striving for optimal accuracy during training. Essentially, ante-hoc aims to consider a model’s explainability throughout its development, whereas post-hoc merely explains the model’s behavior after it has been trained [12, 26].

The explainability of machine learning models is generally feasible when models rely on input data that is easily quantifiable and interpretable. There are algorithms, for instance, the decision trees, sparse linear and additive models, or the Bayesian classifiers that are designed with a limited number of internal components, thus allowing the inspection of the model's prediction and/or classification operations. These models provide traceability and transparency in their decision-making [25]. However, in modern AI algorithms, models and data are often complex and high-dimensional, making them difficult to explain with a simple relationship between inputs and outputs. For example, DL models are a category of machine learning algorithms that surrender the model’s understandability for prediction and/or classification accuracy [25]. The DL frameworks are used in applications such as speech and image recognition, natural language processing, and analyzing complex image and sound data. Therefore, explainability techniques for these “black-box” models are post-hoc explainability techniques. First, they resemble DL black-box models into simpler interpretable models, and by doing so, they permit to explore and explain the black-box [12]. These techniques are called xAI, with the main aim to migrate form “black-box” models into more transparent and interpretable, akin to “glass-box” models [25]. The scope of an explanation can be either global or local, with global explanations aiming to translate the whole inferential course transparent and intelligible, while locally explainable methods aim to explain individual feature attributions [26].

The common form of post-hoc explainability in medical imaging settings is heat maps or saliency maps. These maps are a common form of post-hoc explainability that bring out the contribution of each region into the process of decision formation [27]. These methods are not solitary instrumentation available for xAI users, despite their immature state. In the medical imaging field, some other approaches have been already successfully adopted, including methods for feature visualization and prototypical comparisons. More common general post-hoc explanation methods acceptable for complex medical imaging data embrace the locally interpretable model-agnostic explanations (LIME) and Shapley values (SHAP). LIME attempts to understand decisions at the discrete stage by permuting the input sample, while SHAP generates explanations by measuring the contribution of each feature to a specific prediction. LIME and SHAP are generic and applicable to various types of data in healthcare, not limited to medical imaging data. They are commonly used to provide explanations for complex models in the healthcare domain [27, 28].

Post-hoc xAI methods can be model-specific or model-agnostic (Fig. 2). Model-specific methods reshapes DL models in a way to incorporate interpretability context into the structure and learning mechanisms of the model itself, in contrary to model-agnostic methods that operate at the level of the inputs and output of the black box models to handle the explainability issues and to draw explanations. However, ante-hoc methods focus on creating a running model transparent, which is why the ante-hoc methods are intrinsically model-specific. For model-agnostic methods, the internal elements of a model can be ignored, hence these types of models can be applied to any learning approach, while model-specific methods are limited to a determinant subgroup of models [26]. Figure 3 shows possible work-flow approach of different AI model, applied on a specific input (renal solid mass) including black box, post-hoc xAI and ante-hoc xAI.

Fig. 3figure 3

A possible work-flow approach of different Artificial Intelligence (AI) model, applied on a specific input (renal solid mass) including black box, post-hoc Explainable AI (XAI) and ante-hoc XAI. The output of renal tumor is appreciable in all three model but with no explanation on black box model and different approaches of explainability on post-hoc and ante-hoc XAI

留言 (0)

沒有登入
gif