Deep Learning and Imaging for the Orthopaedic Surgeon: How Machines “Read” Radiographs

In 2012, a research group led by Geoffrey Hinton achieved a dramatic milestone in the ability of a computer to automatically identify objects in images1. Their work was a testament to the power and capability of machine learning—the idea of using algorithms to identify patterns in data, to build mathematical models based on those patterns, and to use those models to make a determination or prediction about something in the world2-4. This is a departure from the usual way of doing things. Traditionally, the way to enable a computer to do anything was to enter explicit instructions, line-by-line in the form of human-authored computer code. Machine learning is different. It does not require step-by-step guidance from a human; rather, humans need only supply data and a learning system, and then the computer learns patterns on its own5. Hinton’s group specifically demonstrated the capacity of deep learning, a branch of machine learning in which mathematical models learn to make predictions directly from unprocessed data (such as images)3,6. Excitement over their advancement ignited a research renaissance in deep learning and the broader field of machine learning. Since then, the results have entered our daily lives in numerous forms, from digital assistants we talk to, to self-driving cars, to automated drug discovery7-17.

The impact of deep learning is expected to be just as profound in clinical medicine as it already has been in daily life. This is particularly true for medical imaging. Deep learning has demonstrated remarkable progress in the analysis of medical imaging across a range of modalities including radiographs, computed tomographic (CT) scans, and magnetic resonance imaging (MRI) scans18-35. There is a growing body of evidence showing clinical utility for deep learning in musculoskeletal radiography (Table I), as evidenced by studies that use deep learning to achieve an expert or near-expert level of performance for the identification and localization of fractures on radiographs (Table II)31,36-43. Until recently, these deep learning algorithms had been confined to research papers, narrow tasks, and specific regions of human anatomy, but the technology is advancing rapidly. Deep learning is now in the early stages of entering the clinical setting, involving validation and proof-of-concept studies44,45. Only time will tell, but we believe that one thing is certain: The success of deep learning in the analysis of medical imaging has been propelling the field forward so rapidly that now is the time for surgeons to pause and understand how this technology works at a conceptual level, before the technology ends up in front of us and our patients. This article is intended to provide surgeons with this basic level of understanding of how current deep learning methods work. We do this with a foundational example—explaining how a deep learning method called convolutional neural networks (CNNs) enables a computer to “read” radiographs to detect the presence of a fracture.

TABLE I - A Nonexhaustive Review of Use Cases and Representative Studies Demonstrating the Current State of Deep Learning Across Musculoskeletal Radiography Use Case Representative Study Level of Evidence Description of Study* Diagnosis (including classification, staging, and severity of disease) Fracture identification, classification, and localization See See See Implant loosening Shah et al.33 (2020) II Incremental inputs improve the automated detection of implant loosening using machine-learning algorithms:
The study used a CNN to predict implant loosening from radiographs in first-time revision total hip and knee arthroplasty patients with 88% accuracy by using a combination of radiographs in conjunction with patient characteristics. Bone tumor classification von Schacky et al.34 (2021) II Multitask deep learning for segmentation and classification of primary bone tumors on radiographs:
This work trained a CNN to simultaneously localize and classify primary bone tumors on radiographs. Trained on 934 radiographs, benign or malignant bone tumors were diagnosed using the patient histopathologic findings as the reference standard. For the classification of bone tumors as malignant or benign, the model achieved 80% accuracy, 63% sensitivity, and 88% specificity. This classification accuracy was higher than that of 2 radiographic residents (71% and 65%; p = 0.002 and p < 0.001, respectively). It was comparable with that of 2 musculoskeletal fellowship-trained radiologists (84% and 83%; p = 0.13 and p = 0.25, respectively). Knee osteoarthritis severity Norman et al.21 (2019) II Applying densely connected convolutional neural networks for staging osteoarthritis (OA) severity from radiographs:
The study used a CNN to automatically stage osteoarthritis severity according to Kellgren-Lawrence grade from radiographs of the knee in an adult population. The model was trained on 4,490 bilateral PA fixed-flexion knee radiographs from adults in the U.S. The model achieved sensitivity rates for no, mild, moderate, and severe OA of 84%, 70%, 69%, and 86%, respectively. The corresponding specificity rates were 86%, 84%, 97%, and 99%. Osteoporosis Yamamoto et al.35 (2020) II Deep learning for osteoporosis classification using hip radiographs and patient clinical covariates:
The authors used deep learning to diagnose osteoporosis from radiographs of the hip (T-score < −2.5) using a data set of 1,131 images from patients who underwent both skeletal bone mineral density measurement and hip radiography at a single hospital in Japan from 2014 to 2019. The CNN applied to hip radiographs alone exhibited high accuracy (approx. 84%), sensitivity (approx. 81%), and specificity (approx. 88%), and the performance improved further with the addition of clinical covariates from patient records (approx. 89%, 89%, and 88%, respectively). Pediatric bone age Halabi et al.27 (2019) II The RSNA pediatric bone age machine learning challenge:
The challenge asked competitors to create a model using machine learning that could accurately determine skeletal age in a data set of pediatric hand radiographs (n = 14,236). The best models were all CNNs, and the top performing models were, on average, accurate to within 4.2 to 4.5 months of the reference standard, bone age. Cause of shoulder pain Grauhan et al.26 (2022) II Deep learning for accurately recognizing common causes of shoulder pain:
The authors trained a CNN to automatically detect the most common causes of shoulder pain on radiographs, including proximal humeral fractures, joint dislocations, periarticular calcification, osteoarthritis, osteosynthesis, and a joint endoprosthesis. The model was trained on 2,700 shoulder radiographs from multiple institutions. Model performance was variable across the 6 diagnoses: sensitivity and specificity were 75% and 86% for fractures, 95% and 65% for joint dislocation, and 90% and 86% for osteoarthrosis. Implant identification Arthroplasty implant identification Karnuta et al.29 (2021) II Artificial intelligence to identify arthroplasty implants from radiographs of the hip:
The authors trained, validated, and tested a CNN to classify total hip arthroplasty femoral implants as 1 of 18 different models from 1,972 retrospectively collected radiographs of the hip. The CNN discriminated 18 implant models with AUC of 0.99, accuracy of 99%, sensitivity of 94%, and specificity of 99% in the external-testing data set. Prognosis and risk stratification Risk of dislocation in total hip arthroplasty Rouzrokh et al.32 (2021) II Deep learning artificial intelligence model for assessment of hip dislocation risk following primary hip arthroplasty from postoperative radiographs:
The study presented a CNN trained to predict the risk of future hip dislocation from postoperative radiographs and succeeded in achieving high sensitivity (89%) and high negative predictive value (99%) in the external-testing data set. The study also demonstrated the use of saliency maps to highlight the radiographic features that the model found to be most predictive of prosthesis dislocation. Bone mineral density Hsieh et al.28 (2021) II Automated bone mineral density prediction and fracture risk assessment using radiographs via deep learning:
This study examined how successfully a CNN can measure bone mineral density and evaluate fracture risk through a radiograph. Results were compared against dual x-ray absorptiometry (DXA) measurements and fracture risk assessment tool (FRAX) scores. Both hips (5,633 training radiographs) and the lumbar spine (7,307 training radiographs) were examined. The AUC and accuracy were 0.89 and 92% for detecting hip osteoporosis and 0.96 and 90% for high hip fracture risk. When applied to the lumbar spine radiographs, the scores were 0.89 and 86% for spine osteoporosis, and 0.83 and 95% for high 10-year fracture risk. This capability would allow for evaluating fracture risk using radiographs already made for other indications to identify at-risk patients, without additional cost, time, and radiation from DXA. Measurements Leg lengths Zheng et al.25 (2020) II Deep learning measurement of leg-length discrepancy in children based on radiographs:
This study created a method for automating the measurement of leg length discrepancies in pediatric patients using deep learning. The authors trained and tested a CNN using 179 radiographs and found that deep learning-derived measurements were strongly correlated with those that were manually derived by pediatric radiologists (r = 0.92 and mean absolute error of 0.51 cm for full leg length discrepancy, p < 0.001). Cobb angle Horng et al.18 (2019) II Cobb angle measurement of the spine from radiographs using convolutional neural networks:
The authors created an automatic system for measuring spine curvature via Cobb angle by applying a CNN to 595 AP spinal radiographs. The deep learning-derived measurements did not demonstrate any significant differences compared with the manual measurements made by clinical doctors (p < 0.98). Scientific discovery Insights into pain disparities in underserved populations Pierson et al.22 (2021) II An algorithmic approach to reducing unexplained variation in pain disparities in underserved populations:
Underserved populations experience higher levels of pain. Using osteoarthritis as an example, this study looked at whether this pain can be detected in radiographs. Traditional methods to objectively measure osteoarthritic severity (i.e., Kellgren-Lawrence grade) account for only 9% of this disparity. By training a CNN to predict pain levels from 25,049 radiographs, algorithmic predictions accounted for 43% of disparities. This implies that much of the osteoarthritic pain felt by underserved patients stems from visible factors within the knee that are not captured in standard radiographic measures of severity such as Kellgren-Lawrence grade.

*PA = posteroanterior, RSNA = Radiological Society of North America, AUC = area under the receiver operating characteristic curve, and AP = anteroposterior.

TABLE II - Select Studies That Demonstrate the Potential of Deep Learning for the Automated Detection of Bone Fractures in Radiographs Across Different Regions of Human Anatomy * Study by Anatomical Region Level of Evidence Objectives and Prediction Classes Study Cohort and Image Sources Reference Standard* Training Data Set Size† Performance of Model† AUC Sensitivity Specificity Accuracy Hip Krogue et al.31 (2020) II Objectives:
- Fracture detection
- Fracture classification
- Fracture localization
Prediction classes:
- Fracture or no fracture
- Multiclass: No fracture, displaced femoral neck fracture, nondisplaced, femoral neck fracture, intertrochanteric fracture, and preexisting implant
- Localization of fracture by bounding box Retrospective cohort:
All hip and pelvic radiographic studies obtained in the ED of a single institution with the words “intertrochanteric” or “femoral neck” occurring near “fracture” in the radiology report at a single institution from 1998 to 2017 Reviewed by 2 orthopaedic residents. In cases of uncertainty, CT and MRI scans and postop. imaging were reviewed 1,815 0.98 93% 94% 94% Cheng et al.37 (2019) II Objective:
- Fracture detection
Prediction classes:
- Fracture or no fracture Retrospective cohort:
Pelvic radiographs from trauma patients seen at a single institution from 2008 to 2017 Diagnosis derived from trauma registry reviewed by a trauma surgeon. Other imaging modalities and clinical course were reviewed in equivocal cases 3,605 0.98 98% 84% 91% Urakawa et al.42 (2019) II Objective:
- Fracture detection (intertrochanteric)
Prediction classes:
- Fracture or no fracture Retrospective cohort:
All hip radiographs from patients with intertrochanteric hip fractures treated with compression hip screws from 2006 to 2017 Radiology reports were reviewed by a single board-certified orthop. surgeon 2,678 0.98 94% 97% 96% Shoulder Chung et al.38 (2018) II Objectives:
- Fracture detection
- Fracture classification
Prediction classes
- Neer classification + no fracture Retrospective cohort:
All shoulder AP radiographs from 7 hospitals 2 experienced shoulder orthop. specialists and 1 musculoskeletal radiologist. When no agreement from independent reports, CT scans and other imaging were reviewed 1,891 1.00 99% 97% 96% Wrist Thian et al.43 (2019) II Objectives:
- Fracture detection
- Fracture localization
Prediction classes:
- Fracture or no fracture
- Estimates of fracture localization Retrospective cohort:
AP and lateral wrist radiographs in a single institution between 2015 and 2017 3 experienced radiologists. Questionable images were labeled by consensus 14,614 0.90 98% 73% NS Kim and MacKinnon40 (2018) II Objective:
- Fracture detection
Prediction classes:
- Fracture or no fracture Retrospective cohort:
Lateral wrist radiographs at a single institution between 2015 and 2016 Report reviewed by a radiology registrar competent in the reporting of radiographs and with 3 yr of radiology experience 1,111 0.95 90% 88% NS Spine Chen et al.36 (2021) II Objective:
- Fracture detection
Prediction classes:
- Fracture or no fracture Retrospective cohort:
Plain abdominal frontal radiographs obtained from a single institution from 2015 to 2018 Initially labeled based on diagnosis in registry. Final diagnosis determined via the agreement of a radiologist and spine surgeon using any supportive images available 1,045 0.72 74% 73% 74% Ankle Kitamura et al.41 (2019) II Objective:
- Fracture detection
Prediction classes:
- Fracture or no fracture Retrospective cohort:
Studies of ankle fractures were identified by parsing radiology reports Reviewed by a board-certified radiologist and a fourth-year radiology resident 1,441 NS 80% 83% 81% All Jones et al.39 (2020) II Objectives:
- Fracture detection
- Fracture localization
Prediction classes:
- Fracture or no fracture
- Estimates of fracture localization Retrospective cohort:
Radiographs of 16 anatomical regions from 15 hospitals and outpatient care centers in the U.S. Data set manually annotated by 18 orthop. surgeons and 11 radiologists 715,343 0.97 95% 81% NS

*AUC = area under the receiver operating characteristic curve, ED = emergency department, CT = computed tomographic, MRI = magnetic resonance imaging, AP = anteroposterior, and NS = not specified

†The reference standard is the source of “ground truth,” in this case, the source of truth for whether a radiograph demonstrates (or does not demonstrate) a fracture. The size of the training data set is the number of example radiographs used to train the model. The performance measures used for deep learning models are conceptually the same as those used to evaluate diagnostic screening tests used in medicine and include measures such as the AUC, sensitivity, specificity, and accuracy. In general, these are measures that compare a model’s predictions (e.g., fracture/no fracture) to ground truth using a testing data set.

How a CNN Model Works Overview: Layers of Mathematical Functions

Deep learning is not magic; it is mathematics. Computers “think” in numbers, so for a computer to “see” a radiograph, the information in the image must be put into a form that computers can process, perform calculations on, and analyze. Thus, computers “see” a digitized image as numbers, and make sense of these images-made-numeric using mathematical functions. Stated simply, CNNs are mathematical functions that are uniquely suited for the analysis of images. As in all mathematical functions, CNNs have inputs (in this case, the numeric representation of the image) and a final output (a numeric prediction of what it “sees” in the image)3. CNNs are very complex, but one does not need to understand the mathematical details to get a sense of how these functions work.

The original inspiration for the mathematical structure of all deep learning systems (including CNNs) came from the network of neurons in the brain (Fig. 1)46-48. In place of a network of interconnected biologic neurons, deep learning systems substitute a network of interconnected mathematical functions referred to as an artificial neural network (ANN). While biologic neurons exchange electrical signals, the artificial neurons in an ANN exchange the results of their mathematical functions. One should think of the ANN in its entirety as a complex mathematical function made up of a large number of more basic mathematical functions that “connect” and “send signals” between each other in ways that are reminiscent of the synapses and action potentials of biologic neurons.

Fig. 1:

A biologic neuron versus an artificial neuron. The original inspiration for the mathematical structure of all deep-learning systems (including CNNs) came from the network of neurons in the visual cortex of mammals. In place of a vast network of interconnected biologic neurons, deep learning systems substitute a vast network of interconnected mathematical functions. While biologic neurons exchange electrical signals, the artificial neurons of these networks exchange the results of their functions. That is, an artificial neural network (ANN) can be thought of as a complex mathematical function made up of a large number of more basic mathematical functions that “connect” and “send signals” between each other in ways that are reminiscent of the synapses and action potentials of biologic neurons. ANNs are typically organized in layers of interconnected neurons. The mathematical function in each neuron takes weighted inputs (that is, excitatory or inhibitory signals at a biologic neuron’s dendrites) and sums them to produce an output or “activation” (something akin to a biologic neuron’s action potential). The output is then passed on to the neuron(s) in the next layer of the network, where it is received as a weighted input. Thus, the “deep” in deep learning refers to the mathematical depth—the number of layers of mathematical functions that make up the more complex mathematical function that is the neural network in its totality. It is this layering of mathematical functions that enables ANNs (like CNNs) to capture complex, nonlinear relationships, as is required in the identification of a fracture within a radiograph.

ANNs are organized in layers of interconnected neurons. A layer is a general term for a group of neurons that work together at a specific depth within the ANN. The mathematical function in each neuron takes weighted inputs (think excitatory or inhibitory signals at a biologic neuron’s dendrites) and sums them to produce an output or “activation” (something akin to a biologic neuron’s action potential). The output is then passed on to the neuron(s) in the next layer of the network, where it is received as a weighted input.

Different layers in an ANN perform distinct roles. These layers are categorized as input, hidden, and output layers3. Every ANN starts with an input layer and ends with an output layer. Not surprisingly, the input layer is responsible for receiving the inputs (in our example, a radiograph). The output layer is responsible for providing us an answer (e.g., how confident the ANN is that a radiograph demonstrates a fracture). An ANN also typically has a variable number of hidden layers—hidden because they are stacks of mathematical functions that reside in the middle, between input and output layers, shielded from view (Fig. 2).

Fig. 2:

ANNs are a network of artificial neurons, depicted as circles. ANNs are often arranged in layers. It starts with the input layer: The input layer is a column of hundreds of thousands of neurons (mathematical functions) that are each fed a small part of the raw input (for example, a radiograph). The output of each neuron is then sent to hundreds or thousands of neurons in the first hidden layer. Each mathematical function in this hidden layer takes the hundreds or thousands of inputs it receives and sends its own mathematical output to the next hidden layer. This process repeats itself throughout each layer of the ANN: each layer takes its input, performs calculations via its neurons, and then transmits its output on to the subsequent layer. The final output layer typically only has a small number of neurons. In the case of 1 neuron, the final output is a number showing how confident the ANN is that something is true (e.g., the input radiograph shows a fracture). In the case of several output neurons (as depicted here), each output number would show how confident the ANN is about several predictions (e.g., the input radiograph shows a femoral neck fracture versus intertrochanteric fracture versus subtrochanteric hip fracture, etc.).

The hidden layers are where most of the “magic” happens. That is because the addition of hidden layers of mathematical functions adds depth. The “deep” in deep learning refers to the mathematical depth—the number of layers of mathematical functions that make up the more complex mathematical function that is the neural network in its totality. It is this layering of mathematical functions that enables ANNs to capture complex, nonlinear relationships. There may be dozens of these so-called hidden layers within a deep ANN, but not all layers are the same3. Different types of hidden layers use different mathematical functions, and some layers are better suited for some tasks than others. The types of layers commonly used for “reading” a radiograph include convolutional layers, pooling layers, and fully connected layers, all of which will be discussed3.

The final output layer is responsible for providing an answer (e.g., does the radiograph demonstrate a fracture?). The output layer typically has only a small number of neurons. In the case with 1 output neuron, the final output is a number showing how confident the ANN is that something is true (e.g., the input radiograph shows a fracture). In the case with several output neurons, each output number would show how confident the ANN is about several predictions (e.g., the input radiograph shows a femoral neck versus an intertrochanteric hip fracture, etc.).

CNNs are a form of ANN that takes this biologic inspiration further, with chains of interconnected artificial neurons responsible for detecting different shapes or visual features within an image47. As in our visual cortex, the earliest neurons in this chain detect relatively simple visual features, while the downstream neurons use this output to detect more complex shapes referred to as high-level features. Low-level features are shapes like lines and edges. Mid-level features might be something akin to a cortical border or the edge of the joint line in a radiograph. High-level features might look to the human eye like a femoral diaphysis or a femoral head in its entirety (Fig. 3). In a CNN, like the visual cortex, it is the stacking of layers of neurons (i.e., mathematical functions) that enables vision. The layers of mathematical functions define details of images, little bits at a time, starting simply and then building to eventually identify entire objects46.

Fig. 3:

As in the visual cortex of mammals, the earliest neurons in a CNN detect relatively simple features while the downstream neurons use this output to detect more complex shapes referred to as high-level features. Low-level features are shapes like lines and edges. Mid-level features might be something akin to a cortical border, the edge of the joint line, etc. High-level features might look to a human like a femoral diaphysis or a femoral head in its entirety. It is critical to understand that humans do not tell the computer what visual features to look for—a jagged edge, a disruption of the cortex—rather, the computer selects the features on its own, choosing those that are most predictive for the task at hand—in this case, the identification of a fracture.

A More Detailed Look: Computing Pixels

Clearly, there are no rods and cones in a CNN. Instead, a computer uses pixels—the small, illuminated dots that make up images on computer displays. Yet, computers do not “see” illuminated pixels as a human does. Instead, computers represent pixels numerically, assigning a number to each pixel that represents its brightness49-51. White pixels have a value of 1, black pixels have a value of 0, and gray is somewhere in between (Fig. 4). For grayscale images such as radiographs, an image is just a 2-dimensional matrix of numbers. As in a Microsoft Excel spreadsheet, this matrix is merely a table of numbers stored in columns and rows (width times height), with each cell of the table storing the brightness of a pixel in a particular location49,51.

Fig. 4:

Digital images are composed of pixels, and computers represent pixels numerically, by assigning a number to each pixel that encodes brightness. For illustrative purposes, a small square of the image along the edge of the iliac crest has been enlarged to show the brightness of 9 pixels and their numeric representation in a matrix.

Using this format, a CNN must mathematically detect shapes within an image’s matrix of numbers. The first layer of the CNN starts simply, with neurons containing mathematical functions to detect elementary shapes such as lines or edges (so-called low-level features). To do this, a neuron uses a “drawing” of the shape being detected. This “drawing” is called a filter51. The filter’s “drawing” is itself a matrix that looks like a small table of numbers, with each cell of the table characterizing the pixel brightness of the shape. Figure 5 shows a 3 × 3 filter (3 pixels in width times 3 pixels in height) for detecting a right-sided edge. The CNN starts by aligning this filter with the first 3 × 3 pixels in the image and measures how closely those 9 pixels in the image match with the filter. The filter is then moved over 1 pixel to the right, and the degree of similarity is again measured. This process is repeated in a scanning fashion across the entire image, left to right, top to bottom. Mathematically, this is convolving the image with the filter, and it is the reason that a CNN is called a convolutional neural network51. The output of this neuron is called a feature map—a map of how strongly right-sided edges were detected at each point in the original image, with white pixels where the pattern exists and black ones where it does not. Each neuron in the first layer of the CNN has a unique filter and creates its own feature map for the shape it was designed to detect (horizontal, diagonal, or curved lines, etc.). Together, all feature maps are passed to the next layer of the network3,49,51.

Fig. 5:

To detect a shape (i.e., a visual feature) within an image, a “drawing” of the shape is passed over the image. This drawing is called a filter. The figure shows a 3 × 3 filter used for detection of an edge (right-sided) along with its corresponding matrix values. This filter is passed over the entire image in a scanning fashion, left to right, top to bottom. The value for each pixel is multiplied by the value of the corresponding cell within the 3 × 3 filter, and the result is summed to produce a single value. Mathematically, this is called convolving the image with a filter, and it is repeated over the whole image. The output of this mathematical operation is a new matrix of numerical values called a feature map, presented here as an image. This feature map shows how strongly right-sided edges were detected at each point in the original image, with white pixels where the pattern exists and black pixels where it does not.

Individually, feature maps of lines and edges do not capture very complex shapes and patterns. So, the neurons in the next layer detect more complex shapes by looking at several feature maps from the prior layer. For example, consider a slightly more complex visual feature, such as a corner. For a neural filter in the next layer of the CNN to detect a corner, it must sense the end of a vertical edge meeting the end of a horizontal edge. Only in places where both the vertical and horizontal edge maps from the prior layer record the end of an edge will the corner filter signal the presence of a corner at that location. In reality, each filter can use all of the feature maps from a prior layer of neurons to measure the presence of the compound shape it was designed to detect3.

As described, the system cannot detect any features bigger than 3 × 3 pixels. While larger filters may be used (e.g., 9 × 9 pixels), they can create overly specific filters. To avoid this problem, special layers called pooling layers are included. Pooling layers summarize the visual features that exist in each area of an image52. The layer pools together the most relevant information and abstracts away the less helpful details. It creates a lower-resolution version of the feature maps. It keeps a record of the important and predictive features, while omitting the fine details that may not be useful for the task.

For each input feature map, these layers output smaller summarized feature maps. In a max-pooling layer, for example, a 100 × 100 input feature map would be divided into 5 × 5 areas and the largest value in each would be returned53. The end result is a 20 × 20 output that records the presence of important visual features while reducing the input from areas that lack features (e.g., the radiograph’s black background).

After several layers in a CNN, the feature maps contain measures of how strongly each higher-level feature was detected. At this stage, the final layers are dedicated to answering whatever question the model was trained for (e.g., is a fracture present?). This decision is made by hundreds of neurons connected in what are called the fully connected layers of the CNN. These fully connected neurons are quite different from their predecessors. Each neuron considers all of the feature measurements from the prior layer3. Each neuron is designed to look for a specific combination of higher-level features to be present and outputs a confidence measure indicating how certain it is that they are present. With multiple fully connected layers, complex decisions can be made as to whether the found features exist in the right combination, amount, spatial layout, and locations for the model to make a decision. To indicate fracture presence, the final layer would be a single neuron, whose output would be a single number—that is, its certainty of whether a fracture exists on a scale of 0 to 1. In the case of fracture classification, the number of neurons in the final layer would be equal to the number of different types of fractures the system can identify (plus 1 for no fracture). Again, the output of each neuron would be a single number, the model’s confidence that the given image contains a fracture of a given type (e.g., femoral neck, intertrochanteric, subtrochanteric hip fractures, etc., or no fracture at all).

Figure 6 presents a simplified example of a complete CNN. It has a total of 2 convolutional layers and 2 max-pooling layers before the fully connected layers. Real-world CNNs would have many more convolutional and max-pooling layers to detect enough visual features. In this example, with a score of 0.96, the network is highly confident that a fracture is present.

Fig. 6:

A simplified example of a complete CNN. This CNN has a total of 2 convolutional layers and 2 max-pooling layers before the fully connected layers. Real-world networks have many more convolutional and max-pooling layers to detect enough visual features to make accurate predictions. At the end of the fully connected network, a different type of neuron reads the final feature maps and “decides” on the basis of the presence and combination of different visual features whether it believes a fracture is present (expressed on a scale of 0 to 1). In this example, with a score of 0.96, the network is highly confident that a fracture is present.

How a CNN Is Created Overview: Teaching Through Testing

Traditionally, software is created by humans writing precise instructions for executing a task (e.g., if “this,” then “do that,” etc.). In the case of human vision, however, we do not really know how to express these instructions. For example, imagine trying to write a computer program that can enable computers to see fractures in radiographs (e.g., if “cortical disruption here” then “femoral neck fracture,” or if “jagged edge there” then “intertrochanteric fracture”). Approaching the problem in this manner would make for an endless task of describing every possible fracture pattern ever exhibited (or possible) within the human skeleton. This provides a seeming impasse when we attempt to create a computer that can “see” fractures in radiographs: We

View original article

JOURNAL OF BONE AND JOINT SURGERY-AMERICAN VOLUME

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Deep Learning and Imaging for the Orthopaedic Surgeon: How Machines “Read” Radiographs

留言 (0)