Forecasting Hospital Room and Ward Occupancy Using Static and Dynamic Information Concurrently: Retrospective Single-Center Cohort Study

IntroductionBackground

The global health care market continues to grow, but the burden of health care costs on governments and individuals is reaching its limits. Consequently, there is increasing interest in the efficient use of limited resources in health care systems, and hospitals must develop approaches to maximize medical effectiveness within budgetary constraints [,]. One approach to this is optimizing the use of medical resources. Medical resources can be broadly categorized into 3 categories: human resources, physical capital, and consumables. The appropriate and optimized use of these resources is critical for improving health care quality and providing care to a larger number of patients [,].

Among the 3 medical resources, hospital beds are considered one of the physical capitals provided by hospitals to patients. These beds are allocated for various purposes, such as rest, hospitalization, postsurgical recovery, etc. They constitute one of the factors that can directly influence the patient’s internal satisfaction within the hospital. However, owing to limited space, hospitals often have a restricted number of beds. Moreover, the number and functionality of beds are often fixed owing to budgetary or environmental constraints, making it difficult to make changes. Nonetheless, if hospital administrators can evaluate bed occupancy rates (BORs) according to different time periods, they can predict the need for health care professionals and resources. On the basis of this information, hospitals can plan resources efficiently, reduce operational costs, and achieve economic objectives []. In addition, excessive BORs can exert a negative effect on the health of staff members and increase the possibility of exposure to infection risks. Hence, emphasizing only maintaining a high BOR may not necessarily lead to favorable outcomes for the hospital [,]. Considering these reasons, BOR prediction plays a vital role in hospitals and is recognized as a broadly understood necessity for resource optimization in the competitive medical field.

In the medical field, optimizing resources is crucial in the face of limited bed capacity and intense competition. Therefore, bed planning is a vital consideration aimed at minimizing hospital costs []. To achieve this, hospitals need to plan staffing and vacations weeks or months in advance []. The use of machine learning (ML) technology for BOR prediction is necessary to address fluctuations in patient numbers due to seasonal variations or infectious diseases, ensuring continuous hospital operations. In the Netherlands, hospitals have already implemented ML-based BOR prediction [], and Johns Hopkins Hospital uses various metrics to effectively manage bed capacity for optimization. Predicting BORs based on quantitative data contributes to validating the clinical quality and cost-effectiveness of treatments. This, in turn, enhances overall accountability throughout the wards and contributes to improving hospital efficiency [].

Prior Work

Hospital BOR prediction has been investigated using various approaches recently. From studies predicting bed demand using mathematical statistics or regression equation models based on given data [-], the focus has shifted toward modeling approaches using time-series analysis. This approach observes recorded data over time to predict future values.

A previous study has taken an innovative approach using time-series analysis alongside the commonly used regression analysis for bed demand prediction, and the study demonstrated that using time-series prediction for bed occupancy yielded higher performance results than using a simple trend fitting approach []. Another study used the autoregressive integrated moving average (ARIMA) model for univariate data and a time-series model for multivariate data to predict BORs []. With the advancement of deep learning (DL) models that possess strong long-term memory capabilities, such as recurrent neural network (RNN) and long short-term memory (LSTM), there has been an increase in studies applying these models to time-series data for prediction purposes. For instance, in the study by Kutafina et al [], hospital BORs were predicted based on dates and public holiday data from government agencies and schools, without involving the personal information of patients. The study used a nonlinear autoregressive exogenous model to predict a short-term period of 60 days, with an aim to contribute to the planning of hospital staff. The model demonstrated good performance, with an average mean absolute percentage error of 6.24%. In emergency situations, such as the recent global COVID-19 pandemic, the sudden influx of infected patients can disrupt the hospitalization plans for patients with pre-existing conditions []. Studies have been conducted using DL architectures to design models for predicting the BOR of patients with COVID-19 on a country-by-country basis. Some studies incorporated additional inputs, such as vaccination rate and median age, to train the models []. Studies have also been conducted to focus on the short-term prediction of BORs during the COVID-19 period [,]. Prior studies are summarized in .

Although previous research has contributed to BOR prediction and operational planning at the hospital level, more detailed and systematic predictions are necessary for practical application in real-world operations. To address this issue, studies have developed their own computer simulation hospital systems to not only predict bed occupancy but also execute scheduling for admissions and surgeries to enhance resource utilization [-]. Nevertheless, existing studies have the limitation of focusing solely on the overall BOR of the hospital. As an advancement to these studies, we aim to propose a strategy for predicting the BOR at the level of each ward and room using various variables in a time-series manner. Interestingly, to our knowledge, this is the first study to apply DL to predict ward- and room-specific occupancy rates using time-series analysis.

Table 1. Summary of prior studies.StudyYearData setMethodPrediction targetMackay and Lee []2007Deidentified data, the date and time of patient admission and discharge between 1998 and 2000Comparison of 2 compartment models through cross-validationEntire hospital bed occupancy (annual average)Littig and Isken []2007Historical and real-time data warehouse and hospital information systems (emergency department, financial, surgical scheduling, and inpatient tracking systems)Computerized model of MLRa and LRbEntire hospital short-term occupancy (24 h or 72 h) based on LOScKumar and Mo []2010Bed management between June 1, 2006, and June 1, 2007; Information: (1) In each class based on length of stay and admission data; (2) Historical previous year’s same week admission data; (3) Relationship between identified variables to aid bed managersThe 3 methods are: (1) Poisson bed occupancy model; (2) Simulation model; and (3) Regression modelThe 3 prediction targets are: (1) Estimation of bed occupancy and optimal bed requirements in each class; (2) Bed occupancy levels for every class for the following week; and (3) Weekly average number of occupied bedsSeematter-Bagnoud et al []2015Inpatient stay data in 2010 (acute somatic care inpatients and outpatients)Three models of hypothesis-based statistical forecasting of future trendsThe 3 targets are: (1) Number of hospital stays; (2) Hospital inpatient days; and (3) Beds for medical stayFarmer and Emami []1990Inpatient stay data for general surgery in the age group of 15-44 years between 1969 and 1982The 2 methods are: (1) Forecasting from a structural model and (2) The time-series or Box-Jenkins methodEntire hospital short-term daily bed requirementsKim et al []2014Data warehouse between January 2009 and June 2012The 2 methods are: (1) The ARIMAd model for univariate data and (2) The time-series model for multivariate dataEntire hospital bed occupancy (1 day and 1 week)Kutafina et al []2019Inpatient stay data between October 14, 2002, and December 31, 2015 (patient identifier, time of admission, discharge, and name of the clinic the patient was admitted to; no personal information on the patients or staff was provided)NARXe model, a type of RNNfEntire hospital mid-term bed occupancy (60 days, bed pool in units of 30 beds)Bouhamed et al []2022COVID-19 hospital occupancy data in 15 countries between December 2021 and early January 2022The 3 models are: LSTMg, GRUh, and SRNNi. Incorporate vaccination percentage and median age of the population to improve performanceEntire hospital bed occupancyBekker et al []2021Historical data publicly available until mid-October 2020The 2 methods are: (1) Using linear programming to predict admissions and (2) Fitting the remaining LOS and using results from the queuing theory to predict occupancyThe 2 targets are: (1) Patient admission and (2) Entire hospital short-term bed occupancyFarcomeni et al []2021Patients admitted to the intensive care unit between January and June 2020The 2 methods are: (1) Generalized linear mixed regression model and (2) Area-specific nonstationary integer autoregressive methodologyEntire hospital short-term intensive care bed occupancy

aMLR: multinomial logistic regression.

bLR: linear regression.

cLOS: length of stay.

dARIMA: autoregressive integrated moving average.

eNARX: nonlinear autoregressive exogenous.

fRNN: recurrent neural network.

gLSTM: long short-term memory.

hGRU: grid recurrent unit.

iSRNN: simple recurrent neural network.

Goal of This Study

The aim of this study was to predict the BORs of hospital wards and rooms using time-series data from individual beds. Although overall bed occupancy prediction is useful for macro-level resource management in hospitals, resource allocation based on the prediction of occupancy rates for each ward and room is required for specific hospital scheduling and practicality. Through this approach, we aim to contribute to the efficient operational cost optimization of the hospital and ensure the availability of resources required for patient care.

We have developed time-series prediction models based on deep neural network (DNN), among which 1 model combines data representing room-specific features (static data) with dynamic data to enhance the prediction performance for room bed occupancy rates (RBORs). Based on bidirectional long short-term memory (Bi-LSTM), the RBOR prediction model demonstrates a lower mean absolute error (MAE) of 0.049, a mean square error (MSE) of 0.042, a root mean square error (RMSE) of 0.007, and a higher R2 score of 0.291, indicating the highest performance among all RBOR models.

We developed 6 types of BOR prediction models, of which 2 types were used for predicting ward bed occupancy rates (WBORs), and the other 4 types focused on predicting RBORs. These models use LSTM and Bi-LSTM architectures with strong long-term memory capabilities as their basic structure. We created 6 models for each architecture, resulting in a total of 12 models. The WBOR models were used for predicting weekly and monthly occupancy rates, serving long-term hospital administrative planning purposes. Conversely, the RBOR models were designed for immediate and rapid occupancy planning and were trained with 3- and 7-day intervals. Each RBOR model was enhanced by combining static data, which represent room-specific features, to generate more sophisticated prediction models.

shows the potential application of our model as a form of web software in a hospital setting. Through an online dashboard, it can provide timely information regarding bed availability, enabling intelligent management of patient movements related to admission and discharge. It facilitates shared responsibilities within the hospital and simplifies future resource planning [].

In the Introduction section, we explored the importance of this research and investigated relevant previous studies, providing a general overview of the direction of our research. In the Methods section, we provide descriptions of the data set used and the structure of the DNN algorithm used, and explain the model architecture and performance. In the Results section, we present the performance and outcomes of this study. Finally, in the Discussion section, we discuss the contributions, limitations, and potential avenues for improvement of the research.

‎

Figure 1. Virtual dashboard of the status and forecast of the ward bed occupancy rate (WBOR) and room bed occupancy rate (RBOR). The first screen presents the overall bed occupancy rate of the hospital, along with the number of beds in use and available. Moreover, a predictive graph displays the anticipated WBOR for selected dates. The second screen presents the WBOR for individual beds, indicating their statuses, such as “in use,” “reserved,” “empty,” and “cleaning.” Detailed information about each room is also displayed.
MethodsOverview

We intended to predict the BORs of individual hospital wards and rooms based on the information accumulated in individual bed–level data on an hourly basis, aggregated on a daily basis. For this purpose, we developed 12 time-series models. As the base models, we applied LSTM and Bi-LSTM, which are suitable for sequence data. These models address the limitation of long-term memory loss in traditional RNNs and were chosen because of their suitability for training bed data represented as sequence data.

Based on the model architecture, there were 2 WBOR prediction model types, which were trained at 7- and 30-day intervals to predict the occupancy rate for the next day. Moreover, there were 2 RBOR prediction model types, similar to the ward models, which were trained at 3- and 7-day intervals. Furthermore, as another approach, each RBOR prediction model was augmented with static data, and 2 DL algorithms were proposed for the final comparison of their performances in predicting RBORs.

Ethical Considerations

The study was approved by the Asan Medical Center (AMC) Institutional Review Board (IRB 2021-0321) and was conducted in accordance with the 2008 Declaration of Helsinki.

MaterialsStudy Setting

This was a retrospective single-center cohort study. Data were collected from AMC, with information on the occupancy status of each bed recorded at hourly intervals between May 27, 2020, and November 21, 2022. The data set comprised a total of 54,632,684 records. This study used ethically preapproved data. Deidentified data used in the study were extracted from ABLE, the AMC clinical research data warehouse.

A total of 57 wards, encompassing specialized wards; 1411 rooms, including private and shared rooms; and 4990 beds were included in this study. Wards and rooms with specific characteristics, such as intensive care unit, newborn room, and nuclear medicine treatment room, were excluded from the analysis as their occupancy prediction using simple and general variables did not align with the direction of this study.

Supporting Data

Supporting data for public holidays were added in our data set. We considered that holidays have both a recurring pattern with specific dates each year and a distinctive characteristic of being nonworking days, which could affect occupancy rates. Based on Korean public holidays, which include Chuseok, Hangeul Proclamation Day, Children’s Day, National Liberation Day, Memorial Day, Buddha’s Birthday, Independence Movement Day, and Constitution Day, there were 27 days that corresponded to public holidays during the period covered by the data set. We denoted these dates with a value of “1” if they were public holidays and “0” if they were not, based on the reference date.

Preprocessing and Description of Variables

Among the variables representing individual beds, the reference date, ward and room information, patient occupancy status, bed cleanliness status, and detailed room information were available. Based on the recorded date of bed status, we derived additional variables, such as the reference year, reference month, reference week (week of the year), reference day, and reference day of the week.

Room data were derived from the input information representing the cleanliness status of beds. This variable had 2 possible states, namely, “admittable” and “discharge.” If neither of these states was indicated, it implied that a patient was currently hospitalized in the bed. As the status of hospitalized patients was indicated by missing values, we replaced them with the number “1” to indicate the presence of a patient in the bed and “0” otherwise. The sum of all “1” values represented the current number of hospitalized patients. The count of beds in each room indicated the capacity of each room. The target variable BOR was calculated by dividing the number of patients in the room by the room capacity, resulting in a room-specific patient occupancy rate variable. The ward data were subjected to a similar process as that of the room data, with the difference being that we generated ward-specific variables, such as ward capacity and WBOR, using the same approach. The static room data consisted of 14 variables, including the title of the room and the detailed information specific to each room.

For the variables in the ward and room data, we disregarded the units of the features and converted them into numerical values for easy comparison, after which we performed normalization. Regarding the variables representing detailed room information, we converted them to numerical values where “yes” was represented as “1” and “no” was represented as “0.”

The final set of variables used in this study was categorized into date, ward, room, and detailed room information. provides the detailed descriptions of the variables used in our training, including all the administrative data related to beds that are readily available in the hospital.

The explanation of the classification for generating the data sets for training each model is provided in . The static features of the detailed room information were combined with the room data set, which has sequence characteristics, to generate a separate data set termed Room+Static.

Table 2. Description of variables by category.VariableTypeDescriptionDate

Year3 categoriesReference year for bed status
Month12 categoriesReference month for bed status
Week53 categoriesReference week for bed status
Day31 categoriesReference day for bed status
Weekday7 categoriesReference day of the week for bed status
Holiday2 categoriesHoliday statusWard

Ward abbreviation57 categoriesAbbreviations for entire ward names
Ward capacityNumericNumber of available ward beds
Ward bed capacityNumericNumber of patients currently admitted to the ward
Ward occupancy rateNumericWard bed capacity divided by ward capacityRoom

Room abbreviation1411 categoriesAbbreviations for entire room names
Room capacityNumericNumber of available room beds
Room bed capacityNumericNumber of patients currently admitted to the room
Room occupancy rateNumericRoom bed capacity divided by room capacityRoom static feature

Room code34 categoriesRoom grade code
Nuclear2 categories (Na/Yb)Nuclear medicine room availability
Sterile2 categories (N/Y)Sterile room availability
Isolation2 categories (N/Y)Isolation room availability
EEGc testing2 categories (N/Y)EEG testing room availability
Observation2 categories (N/Y)Observation room availability
Kidney2 categories (N/Y)Kidney transplant room availability
Liver2 categories (N/Y)Liver transplant room availability
Sub-ICUd2 categories (N/Y)Sub-ICU room availability
Special2 categories (N/Y)Special room availability
Small single2 categories (N/Y)Small single room availability
Short-term2 categories (N/Y)Short-term room availability
Psy-double2 categories (N/Y)Psychiatry department double room availability
Psy-open2 categories (N/Y)Psychiatry department open room availability

aN: No.

bY: Yes.

cEEG: electroencephalogram.

dICU: intensive care unit.

Table 3. Data set classification and included variables.Data setVariablesWard data setWard abbreviation, year, month, week, day, weekday, holiday, ward capacity, ward bed capacity, and ward occupancy rateRoom data setRoom abbreviation, year, month, week, day, weekday, holiday, room capacity, room bed capacity, and room occupancy rateStatic data set14 static variables related to detailed room informationRoom+Static data setRoom abbreviation, year, month, week, day, weekday, holiday, room capacity, room bed capacity, 14 static variables related to detailed room information, and room occupancy rateSeparation

Each data set was split into training, validation, and test sets for training and evaluation of the model. The training set consisted of 32,153 rows (67.8%), with data from May 27, 2020, to December 2021. The validation set, used for parameter tuning, included 7085 rows (15.0%), with data from January to June 2022. Finally, the test set comprised 8208 rows (17.2%), with data from July 2022 to November 21, 2022.

DL Algorithms

We used various DL algorithms for in-depth learning. In the following subsections, we will provide explanations for each model algorithm used in our research.

LSTM Network

RNN [] is a simple algorithm that passes information from previous steps to the current step, allowing it to iterate and process sequential data. However, it encounters difficulties in handling long-term dependencies, such as those found in time-series data, owing to the vanishing gradient problem. To address this issue, LSTM [] was developed. LSTM excels in handling sequence data and is commonly used in natural language processing, machine translation, and time-series data analysis. LSTM consists of an input gate, output gate, and forget gate. The “cell state,” is carefully controlled by each gate to determine whether the memory should be retained or forgotten for the next time step.

Bi-LSTM Network

Although RNN and LSTM possess the ability to remember previous data, they have a limitation in that their results are primarily based on immediate past patterns because the input is processed in a sequential order. This limitation can be overcome through a network architecture known as Bi-LSTM []. Bi-LSTM allows end-to-end learning, minimizing the loss on the output and simultaneously training all parameters. It also has the advantage of performing well even with long data sequences. Because of its suitability for models that require knowledge of dependencies from both the past and future, such as LSTM-based time-series prediction, we additionally selected Bi-LSTM as the base model.

Attention Mechanism

Attention mechanism [,] refers to the process of incorporating the encoder’s outputs into the decoder at each time step of predicting the output sequence. Rather than considering the entire input sequence, it focuses more on the relevant components that are related to the predicted output, allowing the model to focus on important areas. This mechanism helps minimize information loss in data sets with long sequences, enabling better learning and improving the model’s performance. It has been widely used in areas such as text translation and speech recognition. Nevertheless, as it is still based on RNN models, it has the drawbacks of slower speed and not being completely free from information loss issues.

Combining Static and Dynamic Features

Data can exhibit different characteristics even at the same time. For instance, in data collected at 1-hour intervals for each hospital bed, we can distinguish between “dynamic data,” which include features that change over time, such as the bed condition, date, and patient occupancy, and “static data,” which consist of information that remains constant, such as the ward and room number.

DL allows us to use all the available information for prediction. Therefore, for predicting the RBOR, we investigated an approach that combines dynamic and static data using an LSTM-based method []. This approach demonstrated better performance than LSTM alone []. Our approach involves adding a layer that incorporates static data as an input to the existing room occupancy prediction model.

Model ArchitectureBase Model

Our objective was to predict the intermediate-term occupancy rates of wards and rooms within the hospital to contribute to hospital operation planning. Bi-LSTM was chosen as the base model owing to its improved predictive performance compared with the traditional LSTM model. However, to quantitatively compare these models, we conducted a comparison of the results for each model (6 for each, with a total of 12 models).

A typical LSTM model processes data sequentially, considering only the information from the past up to the current time step. However, Bi-LSTM, by simultaneously processing data in both forward and backward directions, has a unique feature that allows it to leverage both current and future information for predictions. This bidirectionality helps the model effectively learn temporal dependencies and intricate patterns. However, despite these advantages, Bi-LSTM comes with the trade-off of doubling the number of model parameters, resulting in increased computational costs for training and prediction. While a more complex model can better adapt to the training data, there is an increased risk of overfitting, especially with small data sets. Nevertheless, the reason for choosing Bi-LSTM for tasks like predicting BORs in hospitals, involving time-series data, lies in its ability to harness the power of bidirectional information. Bi-LSTM processes input data from both past and future directions simultaneously, enabling it to effectively incorporate future information into current predictions. This proves beneficial for handling complex patterns in long time-series data [].

Moreover, we have enhanced the performance of our models by adding an attention layer to Bi-LSTM. The attention layer assigns higher weights to features that exert a significant impact on the prediction, allowing the model to focus on relevant information and gather necessary input features. This helps improve the accuracy of the prediction. Furthermore, the attention layer reduces the amount of information processed, resulting in improved computational efficiency. Ultimately, this contributes toward enhancing the overall performance of the model.

The window length of the input sequence was divided into 3 different intervals, namely, 3, 7, and 30 days. The WBOR model was trained on sequences with a window length of 7 and 30 days, whereas the RBOR model was trained on sequences with a window length of 3 and 7 days. The first layer of our model consisted of Bi-LSTM, which was followed by the leaky rectified linear unit (LeakyReLU) activation function. LeakyReLU is a linear function that has a small gradient for negative input values, similar to ReLU. It helps the model converge faster. After applying this process once again, the AttentionWithContext layer was applied, which focuses on important components of input sequence data and transforms outputs obtained from the previous layer. After applying the activation function again, a dense layer with 1 neuron was added for generating the final output. The sigmoid function was used to limit the output values between 0 and 1. Finally, our model was compiled using the MSE loss function, Adam optimizer, and MAE metric. The parameters for each layer were selected based on accumulated experience through research. visually represents the above-described structure.

‎

Figure 2. Base bidirectional long short-term memory (Bi-LSTM) model architecture. LeakyReLU: leaky rectified linear unit; LSTM: long short-term memory. Combining Dynamic and Static Data Using the DL Model

The accumulated bed data, which were collected on a time basis, were divided into dynamic and static data of the rooms, which were then inputted separately. To improve the performance of the BOR prediction model, we designed different DL architectures for the characteristics of these 2 types of data.

We first used a base model based on LSTM and Bi-LSTM to learn the time-series data and then focused the model’s attention using the dense layer to process fixed-size inputs. To prevent overfitting, we applied the dropout function to randomly deactivate neurons in 2 dense layers. The hidden states of the 2 networks were combined, and the resulting output was passed to a single layer, combining the time dynamic and static data.

Finally, the hidden states of the 2 networks were combined, and the combined result was passed to a single layer to effectively integrate the dynamic and static data. This allowed us to use the information from both the dynamic and static data for BOR prediction. This architecture is illustrated in .

‎

Figure 3. Bidirectional long short-term memory (Bi-LSTM) model architecture combining static and dynamic variables. LeakyReLU: leaky rectified linear unit; LSTM: long short-term memory. Hyperparameter Tuning

One of the fundamental methods to enhance the performance of artificial intelligence (AI) learning models is the use of hyperparameter tuning. Hyperparameters are parameters passed to the model to modify or adjust the learning process. While hyperparameter tuning may rely on the experience of researchers, there are also functionalities that automatically search for hyperparameters, taking into account the diversity of model structures.

Various methods for search optimization have been proposed [,], but we implemented our models using the Keras library. By leveraging Keras Tuner, we automatically searched for the optimal combinations of units and learning rates for each model, contributing to the improvement of their performance.

Time Series Cross-Validation

Time-series data exhibit temporal dependencies between data points, making it crucial to consider these characteristics when validating a model. Commonly used K-fold cross-validation is effective for evaluating models on general data sets [], providing effectiveness in preventing overfitting and enhancing generalizability by dividing the data into multiple subsets [,]. However, for time-series data, shuffling the data randomly is not appropriate owing to the inherent sequential dependency of the observations.

Time series cross-validation is a method that preserves this temporal dependence while dividing the data []. It involves splitting the entire hospital bed data set into 5 periods, conducting training and validation for each period, and repeating this process as the periods shift. This approach is particularly effective when observations in the dynamic data set, such as hospital bed data recorded at 1-hour intervals, play a crucial role in predicting future values based on past observations.

Shuffling data randomly using K-fold may disrupt the temporal continuity, leading to inadequate reflection of past and future observations. Therefore, time series cross-validation sequentially partitions the data, ensuring the temporal flow is maintained, and proves to be more effective in evaluating the model’s performance. This method enables the model to make more accurate predictions of future occupancy based on past trends.

Evaluation

We selected various metrics to evaluate the performance of time-series data predictions. Among them, MAE represents the absolute difference between the model’s predicted values and the actual BOR. We also considered MSE, which is sensitive to outliers. Moreover, to address the limitations of MSE and provide a penalty for large errors, we opted for RMSE. We also used the R2 score to measure the correlation between the predicted and actual values.

MAE is a commonly used metric to evaluate the performance of time-series prediction models. MAE is intuitive and easy to calculate, making it widely used in practice. Because MAE uses absolute values, it is less sensitive to outliers in the occupancy rate values for specific dates. MAE is calculated using the following formula:

MSE is a metric that evaluates the magnitude of errors by squaring the differences between the predicted and actual values and then taking the average. It is calculated using the following formula:

RMSE is used to address the limitations of MSE where the error scales as a square, providing a more intuitive understanding of the error magnitude between the predicted and actual values. It penalizes large errors, making it less sensitive to outliers. RMSE is calculated using the following formula:

The R2 score is used to measure the explanatory potential of the prediction model, and it is calculated using the following formula:

Here, SSR represents the sum of squared differences between the predicted and actual values, and SST represents the sum of squared differences between the actual values and the mean value of actual values. shows the prediction method and overall flow in this study.

‎

Figure 4. Overall flow in this study. Bi-LSTM: bidirectional long short-term memory; LSTM: long short-term memory; MAE: mean absolute error; MSE: mean square error; RMSE: root mean square error.
Results

We used 2 DL models, LSTM and Bi-LSTM, and compared the performance of 12 different prediction models. These models have been denoted as ward 7 days (W7D), ward 30 days (W30D), room 3 days (R3D), room 7 days (R7D), room static 3 days (RS3D), and room static 7 days (RS7D). Using Keras Tuner, we adjusted the hyperparameters of the models and subsequently validated the models through a 5-fold time series cross-validation.

The prediction performances of the models for WBOR and RBOR were compared, which showed that they were more accurate at predicting WBOR, with MAE values of 0.06 to 0.07. The W7D model based on Bi-LSTM, which used 7 days of ward data to predict the next day’s ward occupancy, had a MAE value of 0.067, MSE value of 0.009, and RMSE value of 0.094, showing high accuracy. The R2 score was also 0.544, which was approximately 0.240 higher than that of the W30D model (0.304), indicating that the variables in that model explained occupancy reasonably well.

We next compared the performances of the 8 models for RBOR prediction, and among them, the RS7D model based on Bi-LSTM, which was trained on a 7-day time step by integrating static and dynamic data, showed the best performance. It achieved a MAE value of 0.129, MSE value of 0.050, RMSE value of 0.227, and R2 score of 0.260. In particular, the R2 score outperformed that of the R3D model by 0.014. These data are summarized in . Regarding the WBOR prediction model, the model with a shorter training unit, W7D, demonstrated better performance. However, regarding the RBOR prediction model, the model with a longer training unit of 7 days, which incorporated detailed room-specific information, exhibited slightly higher performance than the model with a shorter training unit of 3 days. The model with the added room-specific information still demonstrated superior performance overall.

We visualized the predicted and actual occupancy for Bi-LSTM models and investigated the occupancy trends since July 2022 on our test data set. First, we selected a specific ward in W7D to demonstrate the change in the WBOR over 2 months. The right panel of shows the WBOR change over 5 months from July 2022 in W30D. The blue line represents the actual occupancy value, and the red line represents the predicted occupancy value by the model. This provides an at-a-glance view of the overall predicted occupancy level for each month and allows hospital staff to observe trends to obtain a rough understanding of the WBOR.

shows graphs of occupancy rate values for a randomized specific room, displaying the predicted and actual values for the 4 RBOR prediction models, with 2 graphs for each model. The left graph shows the occupancy rate change over 5 months from July to November 2022, and the right graph shows the occupancy rate for the months of July and August, providing a detailed view of the RBOR. By examining the trends of the predicted and actual values for the 4 models in this period for a specific room, we can observe that the models maintain a similar trend to the actual occupancy rate.

Table 4. Performances of the occupancy prediction models.Model and foldMAEaMSEbRMSEcR2 score
LSTMdBi-LSTMeLSTMBi-LSTMLSTMBi-LSTMLSTMBi-LSTMWard

W30Df

10.0810.0970.0140.0150.1170.1210.040−0.081

20.0740.0640.0110.0070.1070.0850.1060.430

30.1180.1090.0310.0250.1750.161−0.1300.086

40.1500.0870.0330.0130.1820.113−0.5720.399

50.0870.0610.0190.0080.1390.0890.2120.678

Mean0.1020.0840.0210.0140.1440.114−0.0680.304
W7Dg

10.0710.0630.0110.0070.1030.0860.2630.479

20.0670.0540.0090.0050.0940.0710.3020.606

30.1190.0910.0330.0160.1830.126−0.2410.408

40.1160.0680.0210.0090.1450.098−0.0090.537

50.0830.0600.0150.0070.1230.0870.3800.690

Mean0.0910.0670.0180.0090.1300.0940.1390.544Room

R7Dh

10.1200.1110.0570.0450.2380.2120.0260.226

20.1270.1080.0570.0470.2380.2160.0540.222

30.1900.1480.1670.0720.3270.2690.0180.336

40.2090.1620.0680.0550.2610.234−0.0890.125

50.1580.1240.0690.0480.2630.2200.1020.370

Mean0.1610.1310.0710.0530.2650.2300.0220.256
R3Di

10.1340.1150.0580.0450.2420.2120.0010.229

20.1300.0970.0600.0480.2450.2200.0060.195

30.1780.1470.1180.0800.3440.283−0.0840.266

40.2100.2040.0780.0750.2800.275−0.247−0.201

View original article

JMIR MEDICAL INFORMATICS

分享书签

0 0 0 0 0 0 0

More from this channel

Forecasting Hospital Room and Ward Occupancy Using Static and Dynamic Information Concurrently: Retrospective Single-Center Cohort Study

留言 (0)