Efficient structured reporting in radiology using an intelligent dialogue system based on speech recognition and natural language processing

The novel reporting tool was created by combining preexisting and newly developed components. The user interface was developed as a Microsoft®.NET-application. For speech recognition, indicda® speech (DFC-Systems GmbH, Munich, Germany) was used. The task-oriented dialogue system was built on top of Empolis Knowledge Express® (Empolis Information Management, Kaiserslautern, Germany).

The workflow using the reporting tool is similar to a standard reporting workflow in clinical routine. The radiologist uses a speech recognition handheld for FTR. Two monitors are needed to display the user interface and the imaging study in a PACS viewer. The tool comprises a frontend represented by the user interface, and a backend represented by technical features—including speech recognition, the NLP components, and the task-oriented dialogue system (Fig. 1).

Fig. 1figure 1

Graphic representation of the reporting workflow using the reporting tool. Reporting is done as free text using speech recognition. The reporting tool comprises a user interface (frontend), and a dialogue system that enables NLP-based conversion of free text into a structured report and communication with the radiologist (backend). The combination of these components enables efficient and high-quality reporting

The tool was designed to be applicable for the reporting of any radiological examination. Abdominal CT in suspected urolithiasis was selected as the first use case. Therefore, we applied a standard urolithiasis SR template that is used in clinical routine at our institution. Urolithiasis was chosen because it is a common examination in radiologists’ daily work and is adequately complex to test and evaluate the tool.

Frontend

The user interface comprises two parts (Fig. 2). The first part is a template window on the left side, which shows the to-be-completed SR template for the use case, including an overview of possible response options at the bottom. The SR template separately considers individual organ systems, e.g., right kidney, right ureter, left kidney, left ureter, and urinary bladder. In the upper left margin, there is a tab for each organ, and the radiologist must report on each organ consecutively. The second part comprises a reporting window and a chat window on the right side. Radiologists can activate their speech recognition handheld and dictate a free-text report that appears in the reporting window on the bottom. A red margin of the reporting window indicates that speech recognition is activated. Upon finishing the report, the SR template on the left side is automatically filled in. If findings are missing, the dialogue system sends the radiologist messages that appear in the chat window at the upper right part of the interface (Fig. 3). Upon completion of an organ system, the interface automatically switches to the next. If the radiologist mentions “no pathologies” for a system, it is automatically completed and the system switches to the next. A yellow/green bar on the bottom of the interface displays the progress of reporting.

Fig. 2figure 2

Screenshot of the reporting tool’s graphical user interface (translated from German to English language). On the left side, the to-be-completed SR template is shown, including content suggestions at the bottom. On the right side, the reporting window (bottom) and the dialogue window (top) are shown. The radiologist has started reporting in a free-text form using speech recognition

Fig. 3figure 3

Translated screenshot of the reporting tool’s graphical interface, further along in the reporting process. The previously dictated free text regarding nephrolithiasis of the right kidney (shown in Fig. 2) has been transferred to the SR template on the left side (green background). Since no statements were made regarding the parenchyma and perirenal space, the dialogue system advises the radiologist to discuss these findings (dialogue window). The radiologist can start a new turn by dictating free text about the parenchyma and perirenal space (reporting window)

Backend

The backend portion of the reporting tool comprises a task-oriented dialogue system with several technical components, including speech recognition, NLP, dialogue management, knowledgebase, natural language generation (NLG), and audiovisual representation (Fig. 4) [22]. It was designed to help radiologists document their findings as efficiently as possible. Therefore, it fills in the template on behalf of the radiologist. It was designed as a stateless (web) service, which means that with the same input, the output will be the same, independent of any previous inputs. Its input is communication by the user. As output, it returns the completed template and additional advice for the user. The system has to handle various challenges with spoken language understanding, e.g., synonyms, abbreviations, negations, speculations, and uncertain information.

Fig. 4figure 4

Graphic representation of the reporting tool’s backend. The dialogue system comprises speech recognition, natural language processing (NLP), dialogue management with a knowledgebase, natural language generation (NLG), and visual presentation. The arrows represent the typical data flow. Speech recognition and visual presentation were built on top of indicda® speech. NLP, dialogue management, knowledgebase, and NLG were built on top of Empolis Knowledge Express®

The systems’ components interact with each other (Fig. 4). Speech input is transformed to unstructured text by speech recognition, and the NLP component translates the text into a structured form (segment, concept, and negation detection). For negation detection, a neural network probabilistically determines the negation status and selects the most probable status (affirmed, negated, or speculated). For segment and concept detection, the NLP component applies rule-based matching approaches. RadLex concepts are assigned to the structured content. Then, the dialogue management takes over the structured content, recognizing and returning the user’s intents based on the structured content delivered. Detected intents are used to complete the fields of the SR template (template intents), and to advise the user of possible considerations (advice intents). For this process, the dialogue management uses a knowledgebase containing the necessary expert knowledge on intents. Data are further transferred to the NLG component, which transforms intents from machine-generated codes into a human-understandable form. A visual presentation component communicates the results to the user by automatically completing the SR template, and through the chat window on the user interface. After the visual presentation, a dialogue turn is completed. A new turn can be initiated by the user.

Figure 5 shows an example of the reporting workflow. The user has reported two calculi in the right kidney. Speech is translated to text (speech recognition), and text is translated into structured content of intents and slots (NLP and dialogue management). The detected intent is the completion of the template. Slots are the values of fields for the SR template. Structured content is used to complete the template, and to give the user natural language advice in the graphical user interface (NLG and visual presentation). The template is completed in RadLex terms (RID28453 abnormal and RID4994 calculus). Additional advice intents are returned to the user. In this case, the user is advised to discuss a potential obstructive uropathy (RID34394 obstructive uropathy). Every intent has a confidence. For template intents, confidence equals the template’s completeness, which can be visualized on the progress bar of the user interface. For advice intents, the system uses the confidence to filter out unlikely intents. Besides standard RadLex concepts, for some codes the actual text is used to complete the template—for example, in this case, the number of calculi is entered as “two.” In such cases, the text may be preprocessed by the visual presentation component to fit the template.

Fig. 5figure 5

Functionality of the dialogue system. Text is translated into structured content of intents and slots. The template is completed using RadLex terms. Advice intents are returned to the user. A confidence is computed for each intent

For proper functionality of the system with the use case, special configurations were needed for single components. Since RadLex does not contain all the concepts needed for the urolithiasis template, five concepts had to be added. These were labeled as RadLex ID extensions (RIDE). Synonyms for all concepts were added.

A set of 85 rules for the knowledgebase determines the behavior of the dialogue management in intent detection. These rules define which SR template fields are needed in a specific case—meaning that if a calculus is documented, the user will be asked for additional information about its features (size, density, morphology, and location). Further rules define relations between values, enabling implicit completion of SR template fields (e.g., if a calculus is documented, the kidney is automatically documented as abnormal). The rules are described as decision tables in an excel sheet, which are automatically transformed to the format used in the knowledgebase [23]. The knowledgebase uses an ontology language (OWL/RDF) and a reasoner (ELK reasoner).

The NLG component uses a set of 57 mappings between advice intents and natural language questions for the chat window. It also uses mappings between the SR template and its natural language representation with 65 variables.

留言 (0)

沒有登入
gif