This section presents the results of applying the instantiated reference architecture, introduced in the “Methods” section, on a use case demonstrator in homecare. Moreover, it presents a performance evaluation of the different building blocks on this demonstrator, and a usability evaluation of the installer tools.
Use case demonstratorA demonstrator was built to showcase the feasibility of how the semantic healthcare platform based on the instantiated reference architecture can be used to optimize continuous homecare provisioning [62]. The demonstrator is implemented on a specific use case scenario in continuous homecare, focusing on personalized smart monitoring and the construction and cross-organizational coordination of patients’ treatment plans. This use case scenario has been designed in collaboration with different domain experts and companies involved in continuous homecare and hospital care. This way, it is ensured that this scenario is representative for a large set of real-world situations and problems within continuous homecare specifically and continuous healthcare in general. This will allow to generalize the performance and usability evaluation results in this paper towards the continuous homecare domain.
This section zooms in on the use case description, the demonstrator architecture, the scenario description and a web application of the demonstrator. An additional file zooms in on the technical implementation details of the Proof-of-Concept (PoC) implementation of the use case demonstrator [see Additional file 1].
Use case descriptionThe demonstrator tells the story of Rosa, an elderly woman of 74 years old that lives in a service flat in Ghent, Belgium. To follow up on Rosa, her service flat is equipped with several environmental sensors measuring properties such as room temperature and humidity. Door sensors measure whether a door is currently open or closed. Moreover, Rosa has a wearable that continuously measures her steps, body temperature and heart rate. It also contains a PAS. Through multiple Bluetooth Low Energy (BLE) beacon sensors and a BLE tag integrated into her wearable, Rosa’s presence in the different rooms of the service flat can be detected.
Rosa’s medical profile contains the diagnosis of early stage dementia. Multiple people are part of her caregiver network. Nurse Suzy visits Rosa every afternoon, to assist with daily care. Dr. Wilson is Rosa’s GP. Rosa is also a known patient in a nearby hospital. Moreover, two informal caregivers of Rosa are registered: her daughter Holly, who works nearby and pays Rosa a daily visit around noon, and a neighbor Roger.
Demonstrator architectureTo monitor Rosa’s condition in real-time, the reference architecture in Fig. 2 is instantiated to the specific demonstrator architecture depicted in Fig. 5. The data processing pipeline consists of the RMLStreamer, C-SPARQL and Streaming MASSIF components. C-SPARQL was chosen as RSP engine as it is one of the most well-known existing RSP engines [22, 23]. Moreover, AMADEUS is deployed as semantic workflow engine. UI components are omitted from the demonstrator architecture.
The distributed architecture contains local and central components. RMLStreamer and C-SPARQL are local components that are deployed in the patient’s service flat, for example on an existing low-end local gateway device. They operate only for the patient living in that particular service flat. Streaming MASSIF, DIVIDE and AMADEUS run centrally on a back-end server in a server environment of either a nursing home or hospital. They perform their different tasks for all patients registered in the system.
In the smart monitoring pipeline, RMLStreamer maps the sensor observations from JSON syntax to RDF data. C-SPARQL filters the relevant RDF observations according to Rosa’s profile through queries derived by DIVIDE. In this use case, these queries are determined by the diseases Rosa is diagnosed with. Streaming MASSIF performs further abstraction and temporal reasoning to infer the severity and urgency of the events filtered by C-SPARQL. It implements a service that can detect alarming situations and generate notifications for them to the most appropriate person in the patient’s caregiver network. To decide this person, it takes into account the inferred event parameters and profile information such as already planned visits of caregivers.
AMADEUS is employed to compose semantic workflows representing possible treatment plans to a diagnosis in Rosa’s medical profile, and provide composed quality parameters for them to help the human doctor select the most optimal one. Quality constraints can be defined for the proposed plans on cost, probability of success, relapse risk, patient comfort, and such. The inputs of AMADEUS include the patient’s profile and medical domain knowledge about the options in the treatment of different diseases, which are defined by their input, output, functionality and quality parameters. AMADEUS’ automatic conflict detection between existing & new treatment plans can help a doctor in avoiding generating certain conflicts that they are unaware of.
Fig. 5Architecture of the use case demonstrator
Scenario descriptionTo demonstrate how the building blocks of the demonstrator architecture work together in the presented use case, a specific scenario with multiple steps is designed.
Step 0 – Initial stateIn its initial state, the smart monitoring pipeline is not yet activated. This means that no specific queries are evaluated on C-SPARQL. Instead, naive monitoring takes places where all sensor observations are forwarded to the central server.
Step 1 – Activating the smart monitoring pipelineWhen the smart monitoring pipeline is activated, DIVIDE derives two personalized queries from the generic query patterns to be evaluated on C-SPARQL for Rosa.
The first query filters observations indicating that Rosa is longer than 30 minutes in her bathroom, without performing any movement. This query is derived because this might indicate that an accident has happened, e.g., Rosa has fallen. Because Rosa has dementia, there is a higher chance that she might forget to use her PAS in that case. This query monitors the bathroom’s BLE sensor and the wearable’s step detector.
The second query filters sensor observations which imply that Rosa has left her service flat. To detect this, it monitors the BLE sensor in the hallway and the main door’s sensor. Because Rosa has dementia, such events should be detected and notified to a caregiver, since being outside alone could possibly lead to a disorientation.
Step 2 – Colon cancer diagnosisAt a certain moment in time, Rosa’s medical profile is updated with the diagnosis of colon cancer by a medical specialist at the hospital, who examined Rosa after she complained to the nurse about pain in the stomach and intestines. This update automatically triggers DIVIDE to reevaluate the deployed C-SPARQL queries, without requiring any additional user intervention. As a result, one additional query is derived. It detects when Rosa’s body temperature exceeds 38\(^\)C (38 degrees Celsius), i.e., when Rosa has a fever, by monitoring the sensor in Rosa’s wearable. This new query is derived because the medical domain knowledge states that complications or additional infections form a contraindication for several cancer treatments such as chemoradiotherapy, which means that continuing these treatments would be too dangerous [63]. Since fever might indicate an underlying infection, the medical domain knowledge therefore defines that cancer patients should be monitored for fever.
Step 3 – Constructing a treatment plan for colon cancerTo construct a treatment plan for Rosa’s colon cancer, AMADEUS is triggered by the hospital doctor. First, given Rosa’s profile, the defined treatments and their quality parameters, it composes two possible workflows: a plan consisting of neoadjuvant chemoradiotherapy followed by surgery, and a plan consisting of surgery only. The quality parameters for the presented plans include duration, cost, comfort, survival rate and relapse risk. Since the first plan has the highest survival rate and lowest relapse risk, it is selected by the doctor. This selection triggers AMADEUS to calculate a detailed workflow by adding timestamps to the different steps. In this case, the chemoradiotherapy step is split into four episodes in the hospital, with 30 days between each session. Every session can only be performed if there is no contraindication. After confirmation of the plan, it is added to Rosa’s current treatment plan.
Step 4 – Influenza infection yielding fever notificationsFive days before her next chemoradiotherapy session, Rosa gets infected with the influenza virus, causing her body temperature to start rising. Any body temperature observation exceeding the fever threshold of 38\(^\)C is filtered by the deployed C-SPARQL query and sent to Streaming MASSIF.
The abstraction layer of Streaming MASSIF is configured to abstract the incoming sensor events according to several rules. A body temperature observation between 38.0\(^\)C and 38.5\(^\)C is a low fever event, one between 38.5\(^\)C and 39.0\(^\)C a medium fever event, and one above 39.0\(^\)C a high fever event. Its temporal reasoning layer defines a rising fever event as a sequence of low, medium and high fever events within an hour.
Two queries are defined for the notification service instructed on top of Streaming MASSIF’s temporal reasoning layer. When a low fever event is detected, and a person in the patient’s caregiver network has already planned a visit to the patient on the current day, the first query notifies this person to check up on the patient during this visit. In that case, no other (medical) caregiver should be called. The second query notifies a medical caregiver from the patient’s caregiver network as quickly as possible when a rising fever event is detected.
In the use case scenario, in the morning of the given day, Rosa’s body temperature exceeds 38\(^\)C. This event is filtered by C-SPARQL, and abstracted by Streaming MASSIF as a low fever event. The daily visit of Rosa’s daughter Holly around noon is still planned for the current day, causing Streaming MASSIF to generate a notification to Holly, indicating that she should check up on Rosa’s low fever during her planned visit. However, within an hour after the first low fever event, Rosa’s body temperature further rises to above 39\(^\)C. Thus, Streaming MASSIF detects both a medium fever event and a high fever event in its abstraction layer, and thus a corresponding rising fever event in its temporal reasoning layer. Hence, the Streaming MASSIF service generates a notification to Rosa’s nurse Suzy to visit her with high priority.
Step 5 – Constructing a treatment plan for influenzaAfter Suzy’s examination, Rosa’s GP dr. Wilson is called and diagnoses her with the influenza virus, which is added to her medical profile. To construct a treatment plan for it, dr. Wilson can use AMADEUS. It proposes three possible plans: taking the oseltamivir medicine for ten days, taking the zanamivir medicine for eight days, or waiting for 16 days until the influenza goes over naturally. The durations of the plans resemble the expected time after which the influenza should be cured. Given Rosa’s situation, dr. Wilson decides to choose the first plan, which has the highest value for the comfort quality parameter. After selecting the plan, AMADEUS constructs the detailed workflow of taking the medication every day for a period of ten days.
Step 6 – Treatment plan conflictAMADEUS performs a verification step to ensure that the newly added treatment plan, confirmed by dr. Wilson, does not yield any conflicts with Rosa’s current treatment plan. In this scenario, AMADEUS detects a conflict: Rosa’s next chemoradiotherapy session in the colon cancer treatment plan is scheduled in five days. However, since the influenza treatment plan still takes ten days, the influenza virus will not be cured by then, which forms a contraindication conflict. AMADEUS leaves resolving detected conflicts to its end users. In this case, dr. Wilson can do so by postponing the next chemoradiotherapy session until the influenza virus is fully cured.
Demonstrator web applicationTo visually demonstrate the described use case scenario, a web application is designed [62] on top of a PoC implementation of the use case demonstrator [see Additional file 1]. It illustrates how medical care providers could follow up patients in homecare through the smart monitoring pipeline, in addition to the designed GUIs presented in the “Building blocks” section. More specifically, it contains multiple UI buttons to simulate the different steps of the scenario in the “Scenario description” section and shows a visualization throughout the simulation of Rosa’s profile, the location of Rosa and the people in her caregiver network, and the real-time observations generated by the sensors processed in the monitoring pipeline. Furthermore, it contains a UI to trigger AMADEUS and visualize its output. Figure 6 shows multiple screenshots of the web application corresponding to the different scenario steps. Moreover, a video of the demonstrator is available online at https://vimeo.com/380716692.
Fig. 6Screenshots of the web application built on top of the use case demonstrator’s PoC implementation. They correspond to different steps in the scenario: a Step 0 – Initial state; b Step 3 – Constructing a treatment plan for colon cancer; c Step 4 – Influenza infection yielding fever notifications; d Step 6 – Treatment plan conflict
Performance evaluationThis section evaluates the performance of the different building blocks in the architecture of the use case demonstrator presented in the “Use case demonstrator” section [62]. The main purpose of this evaluation is to evaluate the individual building blocks of the semantic healthcare platform on a single, fixed use case. For in-depth evaluations of the involved building blocks, we refer to the corresponding publications.
The evaluation is split up in three parts. The first part evaluates the data stream processing pipeline with RMLStreamer, C-SPARQL and Streaming MASSIF. Part two evaluates the DIVIDE query derivation. The third part evaluates AMADEUS.
For all evaluations, the local components in the demonstrator architecture in Fig. 5 are running on an Intel NUC, model D54250WYKH, which has a 1300 MHz dual-core Intel Core i5-4250U CPU (turbo frequency 2600 MHz) and 8 GB DDR3-1600 RAM. The central components are deployed on a virtual Ubuntu 18.04 server with a Intel Xeon E5620 2.40GHz CPU, and 12GB DDR3 1066 MHz RAM.
All evaluation results are aggregated in Table 1. For every evaluated component, the following subsections detail the evaluation cases, their rationales, the measured metrics, and how the measures were obtained to calculate the reported statistics.
Performance evaluation of the data stream processing pipelineThe evaluation of the data stream processing pipeline is performed separately for the three components. This approach is chosen because C-SPARQL performs continuous time-based processing of windows on the data streams, while RMLStreamer and Streaming MASSIF do event-based processing. Analyzing the components individually means that inherent networking delays are omitted.
RMLStreamerFor the RMLStreamer evaluation, the processing time is measured, which is defined as the difference between the time at which a JSON observation is sent on the TCP socket input stream of RMLStreamer, and the time at which the RDF observation arrives at the client consuming the TCP socket output stream. This client and the sensor simulator are both running on the same device as the RMLStreamer.
In Table 1, the RMLStreamer performance measures are reported for three different rates of incoming observations on the RMLStreamer: 1 observation per second, 7 observations per second and 14 observations per second. This maximum of 14 is chosen because the demonstrator contains 14 sensors. The reported numbers are aggregated over all observations generated during a simulation of 2 minutes.
Table 1 Results of the performance evaluation of the building blocks in the use case demonstrator’s architecture C-SPARQLFor the C-SPARQL evaluation, the execution time is measured of the query that is filtering Rosa’s body temperature after she is diagnosed with colon cancer. This is the only query that is important for the scenario of the demonstrator, since the other two deployed queries never filter any event during the scenario.
Table 1 reports the evaluation results for the three rates of incoming RDF observations. For C-SPARQL, this defines the number of observations in the data window on which the queries are evaluated. For every rate, exactly one body temperature observation higher than 38\(^\)C is generated per second. Hence, this resembles the period in the scenario when Rosa has a fever. Thus, the reported measures are for query executions that each yield exactly one result, being the most recent high body temperature observation. The query is evaluated every 3 seconds on a 5-second window. The reported numbers are aggregated over all query executions during a 2-minute simulation.
Note that the evaluation results report measures about the query execution times, and not the processing times of an observation. This is because the C-SPARQL query evaluation is not event-based but a continuous, periodic process. The total processing time per observation consists of the waiting time before the window trigger and query evaluation, and the query execution time. The waiting time is upper bounded by the time period between consecutive query evaluations, which is 3 seconds in the demonstrator. Since the actual waiting time is inherent to the system, depends on the mutual initialization of components, and is not dependent on the query bodies and data models, it is not included in the reported results.
Streaming MASSIFFor the evaluation of Streaming MASSIF, the processing time of an incoming event is measured. This is defined as the difference between the event’s arrival time and the time at which the notification (to either Rosa’s daughter or nurse) leaves the system. The reported numbers in the results in Table 1 are aggregated over all processed events in a simulation of 3 minutes, where Rosa’s body temperature is gradually increased from 38.3\(^\)C up to 39.1\(^\)C. The period between incoming events in Streaming MASSIF is equal to the output rate of the evaluated C-SPARQL query, which is 3 seconds.
Performance evaluation of DIVIDEThe evaluation of DIVIDE measures the processing time of the query derivation on the context associated to Rosa, which includes both the dementia and colon cancer diagnoses. DIVIDE performs the semantic reasoning during the query derivation in three parallel threads, where each thread is responsible for deriving the RSP queries from one of the generic query templates. The output consists of the three queries described in the demonstrator’s scenario. The processing time is measured from start to completion of the parallel reasoning processes. All networking overhead for registering the context to DIVIDE, which triggers the query derivation, and registering the resulting queries on C-SPARQL, is not included in the results reported in Table 1. These are aggregated over 30 runs, excluding 3 warm-up and 2 cool-down runs.
Performance evaluation of AMADEUSFor the evaluation of AMADEUS, the processing times are measured of a request to the AMADEUS Web API for the three most important cases in the demonstrator’s scenario: (1) requesting possible treatment plans for colon cancer, (2) requesting possible treatment plans for influenza, and (3) adding the chosen influenza treatment plan to the existing treatment plan for colon cancer, including the conflict detection. The processing time corresponds to the response time of the AMADEUS Web API, which mainly represents the duration of the started EYE reasoner process. The results in Table 1 are measured over 30 runs, excluding 3 warm-up and 2 cool-down runs.
Usability evaluationThis section discusses the usability evaluations of the installer tools in the semantic healthcare platform: the RMLEditor and Streaming MASSIF with its GUI. The evaluations make use of the System Usability Scale (SUS), which is a well-known, rapid method for gathering usability ratings for a technology through a questionnaire [64]. It is known for being concise, applicable across various technologies, and effective in scenarios with limited sample sizes [65]. The SUS measures the user satisfaction of a technology, but not its effectiveness or the efficiency, which is an important consideration when interpreting SUS-scores. The usability of DIVIDE and AMADEUS has not been evaluated.
Usability evaluation of RMLEditor and MapVOWLIn a previous publication, Heyvaert et al. have evaluated the usability of the RMLEditor and its visual mapping rule notation MapVOWL [57]. To this end, they first evaluated whether the MapVOWL graph-based representation of RML mapping rules has a higher human processing accuracy and preference than the classic textual representation of RML rules. Based on the evaluation, the authors concluded that users with knowledge of RML exhibit no difference in accuracy of processing the representations, but do have a preference for MapVOWL to visualize and edit rules. As a second step, the authors evaluated the usability of the graph-based RMLEditor to the RMLx Visual Editor, which is form-based editor to show and edit RML mapping rules [66, 67]. This evaluation showed no difference in the accuracy of creating mapping rules between both editors, but revealed a higher user satisfaction of the RMLEditor through a SUS-score of 82.75 compared to 42 for the RMLx Visual Editor. This is mainly caused by the usage of MapVOWL. For details about the evaluation set-up and results of both evaluations, we refer to the original publication of Heyvaert et al. [57].
Usability evaluation of Streaming MASSIFTo evaluate the usability of using and configuring semantic services with Streaming MASSIF for installers, it is compared with Kafka Streams. This is a stream processing library provided by Apache Kafka that enables developers to build scalable and fault-tolerant real-time applications and microservices [68]. It allows Kafka consumers to process continuous streams of data records from Kafka topics, supporting tasks such as data transformation, aggregation, and event-driven computations.
Evaluation set-upThe evaluation was configured through an online questionnaire. Potential participants were approached from imec and Ghent University. Those agreeing to participate were provided with a short introduction of Streaming MASSIF and Kafka Streams to read through. After the introduction, the actual evaluation was performed together with one of the researchers, constituting of four consecutive parts.
Part 1. In this part of the evaluation, the questionnaire queried the participant’s socio-demographics. Moreover, questions were asked about their experience & familiarity with Linked Data & Semantic Web technologies, stream processing in general, Streaming MASSIF, and Kafka.
Part 2. The second part of the evaluation presented a short text about the fictional evaluation use case, which was the use case of the demonstrator presented in the “Use case demonstrator” section. It described the task that should be performed by the participant, which was about step 4 of the described scenario, i.e., the configuration of a service filtering the alarming situation of a rising fever event for patient Rosa with the influenza diagnosis. More specifically, this text reads as follows: “In the patient room of the future, rooms are equipped with many sensors that capture their environment. These sensors allow to monitor both patients and the status of the room. Let’s consider the presence of a light, sound, and body temperature sensor present in the room. To process this data in a meaningful way, this data needs to be combined with background knowledge regarding the hospital and the patient that is being treated in the room. In this use case, we want to monitor to see if our patients are not being exposed to any alarming situation. For example, let’s assume our patient has influenza. The background knowledge about our patient will then describe that our patient with influenza should be monitored for body temperature values. Therefore we should monitor the body temperature sensors in the room, in order to detect alarming situations. In this case, an alarming situation occurs when the body temperature of our patient is rising too quickly in a limited time span.”
After reading this text and following a short tutorial for both Streaming MASSIF and Kafka Streams, the participants were requested to complete this task with both tools. To have a variation in which tool was used first for the task, the participants were randomly assigned in two groups of equal size. The correctness of performing the task with both tools was assessed by the present researcher.
Part 3. The third part of the evaluation consisted of a multiple-choice questionnaire assessing how well the participants understand Streaming MASSIF. The questions asked and their corresponding correct answer(s) were the following:
1.How does one filter data in Streaming MASSIF? Correct answers: by defining a filter, by abstracting the data, by defining a Complex Event Processing pattern
2.How does one enrich events from the data stream in Streaming MASSIF? Correct answers: by defining a CONSTRUCT query in a filter, by abstracting the data through reasoning
3.How does one abstract the data in Streaming MASSIF? Correct answer: by defining ontological patterns
4.How does one detect temporal dependencies in Streaming MASSIF? Correct answer: by defining a temporal pattern
Part 4. The final part was a post-assessment questioning the usability of both tools. First, the difficulty of performing the use case task with the tools was rated on a 7-point Likert scale from extremely difficult to extremely easy. Similarly, confidence in successful completion was rated on a 7-point Likert scale from strongly agree to strongly disagree. Second, the SUS-score was obtained for both tools. Third, the overall user-friendliness of the tools was rated on a 7-point Likert scale from awful to excellent.
Evaluation resultsPart 1. Eight participants were recruited. They were all male researchers and students from imec and Ghent University between 22 and 28 years old, all but one holding a master’s degree. One participant considered himself a novice on Linked Data and Semantic Web technologies, five were developing knowledge in this domain, one was proficient, and one was an expert. Concerning stream processing in general, four participants were developing knowledge and four were novices. Six participants had already heard of both Streaming MASSIF and Kafka, of which two had already used the latter. The other two participants had never heard of both.
Part 2. All participants successfully completed the task on the presented use case with both evaluated tools.
Part 3. The questions about Streaming MASSIF were answered by all participants. For question 1, one participant selected all three correct answers. Two participants selected two of them, and the other five selected one correct answer. For question 2, three participants selected both correct answers. The other five participants selected one correct answer. For question 3, five participants selected the single correct answer. Three people also selected an additional wrong answer. For question 4, the same happened for two participants, while all of them selected the correct answer.
Part 4. The difficulty of performing the use case task was rated higher for Kafka Streams compared to Streaming MASSIF by seven participants. Five participants rated completing the task with Streaming MASSIF moderately or slightly easy, compared to only two for Kafka Streams. Moreover, regarding confidence in successful completion of the tasks, two participants revealed a higher confidence when using Streaming MASSIF, while the others revealed equal confidence. This confidence was rated positively (slightly agree, agree or strongly agree) by seven participants for Streaming MASSIF. In addition, the average obtained SUS-score of Streaming MASSIF was 72.81, compared to 53.75 for Kafka Streams. Inspecting the individual SUS-scores, they were higher for Streaming MASSIF for six participants, while both scores were equal for the other two participants. Finally, the overall user-friendliness of Streaming MASSIF was rated higher than that of Kafka Streams by five participants, while the other three participants rated it equally well. All participants rated the overall user-friendliness of Streaming MASSIF as good or excellent, compared to a rating of good or OK for Kafka Streams by six participants.
留言 (0)