The Dutch education system broadly encompasses primary education for all children from age 4 to approximately 12. Afterward, they move on to secondary education, where they pursue pre-vocational tracks (age 12–16) or pre-university tracks (age 12–17/18). Following this stage, they may proceed to vocational education (MBO; age 16+), higher professional education (HBO; age 17+), or university (age 18+). The current study focuses on students in vocational education (age 16+).
Data and designThe data for this research came from the Dutch foundation ‘Testjeleefstijl’ (translation: ‘Test your lifestyle’) which collects data about students in vocational education about several lifestyle topics (e.g. physical & mental health, alcohol, drugs, sexuality) that concern young people between the ages of 12 and 25. Testjeleefstijl enables vocational education schools to have their students complete various digital questionnaires on a broad range of topics. Schools can access the results via a dashboard and have the ability to enable or disable certain topics, thus creating a customized combination of information for their own school. Due to privacy regulations, scores from the different topics cannot be linked to each other. As such, we were only able to utilize data pertaining to one topic (depression and suicidality) for this study. The data had a repeated cross-sectional design, which means that each year (from 2013 to 2023), a different group of students filled in the questionnaire.
Participants and recruitmentThe participants who filled out the questionnaires are students in vocational education, who completed a web-based standardized questionnaire between 2013 and 2023. The recruitment of participants was carried out by schools with a Testjeleefstijl subscription. Interested schools can register on the website and decide which questionnaires they want to administer to their students. Schools can also choose to have students complete the questionnaires at school or allow them to do so at home. The Testjeleefstijl data cannot be traced back to individual persons, and teachers therefore do not know the answers individual students provided in the questionnaires. Students were asked to provide informed consent while filling out the questionnaires on the website. Testjeleefstijl processes the answers by theme, and all data are stored anonymously and are therefore not traceable to personal student accounts. Completing the Testjeleefstijl questions on the topic ‘suicidality and depression’ is estimated to have taken approximately five minutes. Students did not receive any financial or other compensation for completing the questionnaires. The study was approved by the medical ethical committee of the Amsterdam University Medical Center (file number: 2023.0568).
MeasuresThe current study included the following repeated cross-sectional data from the past 10 school years (2013–2023): suicide attempt, suicidal ideation, (risk of) anxiety and depression, gender, age and school year.
Suicidal ideation and attemptsSuicidal ideation was assessed with the question: ‘Have you ever seriously thought about ending your life in the last 12 months?’ with the answer options ‘Never/occasionally/sometimes/often’. For the analyses, the answers on the SI question were dichotomized into ‘yes’ (occasionally/intermittently/often) and ‘no’ (never). Suicide attempts were assessed with the question: ‘Have you tried to end your life in the last 12 months?’ and the answer options ‘Yes/No/No answer’. The ‘no answer’ scores were coded as missing values.
Risk of anxiety and depressionAnxiety and depression were assessed with the Dutch version of the Kessler Psychological Distress Scale (K10), a questionnaire with 10 questions about anxiety and depression [25]. These items ask to indicate how often (1 = never to 5 = always) a participant has felt, for example, very tired, nervous, hopeless, restless, or worthless in the past 4 weeks. Based on the answers, scores are calculated that indicate a low (scores of 10–15), moderate [16,17,18,19,20,21,22,23,24,25,26,27,28,29], or high (30–50) risk of anxiety or depression [22].
Demographic variablesGender was assessed by a multiple-choice item with the options: male, female, and other. The option ‘other’ was added in 2020–2021. Because of this, the number of students who fall in this gender group was too small (0.2%) to include them in the analyses and were therefore excluded. Age was assessed by asking students to indicate their age. From 2019 to 2022, a drop-down menu was used with the following options: 15, 16, 17, 18, 19, 20, 21, 22, 23 and older. Age data before 2019–2022 with values below 15 or above 23 was coded as “15” or “23 and older”, respectively. For the analyses, age was categorized into three age groups: 15–17; 18–20; 21–23 and older.
School year was extracted based on the timestamp of each dataset, with the following values: 2013–2014; 2014–2015; 2015–2016; 2016–2017; 2017–2018; 2018–2019; 2019–2020; 2020–2021; 2021–2022; 2022–2023. For the analyses, school year was dichotomized into (school) years before corona (pre-corona: school years 2013/2014 until 2018/2019) and years during/past corona (corona: school years 2019/2020 until 2022/2023).
Statistical analysisOnly students with complete data on the variables of the main analyses were included in analyses. Descriptive analyses were used to calculate the prevalence of SI and SA over time and the prevalence of different subgroups of gender, age, school year and risk of anxiety and depression. Univariate logistic regression analyses were performed to test the predictive value of the different risk factors, namely gender (female/male), age (15–17/18–20/21–23), anxiety and depression (low/moderate/high) and school year (pre-corona and corona) separately, per SI (yes/no) and SA (yes/no) as the outcome.
For the high-risk identification of suicidal ideation and suicide attempts the machine learning model developed by Berkelmans et al. [26] was used in Python. This model allows for the exploration of complex interactions of socio-demographic risk factors while maintaining interpretability [26]. The interaction features of the SI and SA models consisted of combinations of risk factors (see above).
Machine-learning modelA heuristic algorithm was devised to obtain interacting features which provide additional risk of suicide attempts or suicidal ideation or reduce these risks. The interaction features were prioritized based on statistical significance and model improvement. The algorithm consisted of three phases (for each model).
PreparationThe data was split into 2 datasets: the training set (50%) and the validation set (50%). The training set was used to find interactions of interest. This set was further divided into a primary training set (80% of the training set), and a control set (20% of the training set). The validation set was used to estimate the final model.
Phase 1: finding interactions of interestWe started with a base logistic regression model which included just the basic predictors. Interactions of interest to the model were then iteratively added. An empty list (L) was initialized to track all interactions that have been considered. This means that all possible combinations of the variables (gender, age, school years, risk of anxiety, and depression) were tested and recorded in the subsequent step.
Iterative stepWe scored potential interactions of interest (that were no yet in the list L) based on the primary training set. If the interaction with the highest score was above a certain minimal threshold (T) we added it to the model as well as to the list (L). If there were no interactions with a score higher than T we moved on to phase 2. It was then evaluated whether this model actually improved performance on the control set when it was compared to the previous model that didn’t contain this interaction. If the model did not improve: we returned to the model without this interaction, and checked whether there were interactions between predictors of the model that were not in the list L. If there were interactions, the iterative step started again. If there were not, we moved on to phase 2.
Phase 2: estimating final modelTo get unbiased estimates that are unaffected by multiple testing the final model was estimated on a separate dataset: the validation set. This allows for interpreting parameters and confidence intervals in the usual manner.
留言 (0)