Trends in Language Use During the COVID-19 Pandemic and Relationship Between Language Use and Mental Health: Text Analysis Based on Free Responses From a Longitudinal Study


IntroductionBackground

Times of crisis lead to increased psychological distress and mental health symptoms in the general population []. The literature from previous epidemics and emerging literature about the COVID-19 pandemic [,] provide an understanding of the mental health impacts of the COVID-19 pandemic. Increased psychological distress and mental illness is associated with a longer duration of quarantine [,], increased exposure to the virus or status as a health care worker [,,], fear of infection of self or others [,], financial stress [,], preexisting mental illness [,,], and social isolation [,]. It is critical to understand how this pandemic has affected mental health, document those effects, and prepare for future ones.

Language is one option for assessing mental health. Language and, more broadly, qualitative data can provide context for quantitative data and even point to new directions of research or uncover patterns that may not be found quantitatively. Language has been shown to predict states such as personality [] and psychological constructs []. Research on mental health and language use has used machine learning to examine how language features correlate with or predict mental illness []. Other non–content-based metrics related to language have also been associated with mental illness, such as word count [] and post counts on social media [].

Study Aims

In this study, we examined data from a web-based mental health survey on COVID-19 stressors during the pandemic. The survey ended with an open-text free-response prompt, “Is there anything else you would like to tell us that might be important that we did not ask about?” (see for the study overview). Free-response questions have been shown to add context to and validate existing quantitative measures []. We used a variety of text analysis methods on these free responses to investigate the characteristics of our sample population, the content of responses, how responses changed over time, and how language use reflected the participants’ mental state.

To characterize the participants, we measured whether the demographics of the participants who responded to the free-response question differed from those who chose not to respond. On the basis of prior literature on responses to free-text comments in surveys, we predicted that, relative to those who did not respond, respondents would be more likely to be women, be older, have more years of education, and have a preexisting health condition [-]. In addition, we were interested in whether prior mental illness affected responses and language features. On the basis of prior work, we hypothesized that individuals with a history of mental health conditions would be more likely to respond and provide longer responses than those without mental health histories [,,]. We also hypothesized that mental health history would be associated with more negative sentiment [,], greater use of negative emotional words [-], and more first-person singular pronouns (FPSPs) [,-]. The rationale behind these selections is that increased FPSP use is associated with increased self-focus and increased negative valence and negative emotional words are associated with negatively biased thinking patterns [,]. These patterns of thought are associated with several mental disorders, including depression [].

As for the content of responses, we first asked how the sentiment of responses varied over the course of the pandemic across all participants. On the basis of the literature from previous epidemics showing that distress increased with increased quarantine duration [,], we expected the sentiment to become more negative as social distancing and lockdown procedures remained in place. In addition, we expected emotional states to shape responses such that response likelihood and valence would be associated with fluctuations in self-reported loneliness, distress, and the presence of symptoms related to mental illness. Finally, we used various methods to categorize the responses.


MethodsRecruitment and Study Overview

A web-based, longitudinal study (NCT04339790) assessing the mental health impact of the COVID-19 pandemic was launched by investigators at the National Institute of Mental Health Intramural Research Program in early April 2020 (). A convenience sample of adults aged ≥18 years was recruited via listserves, social media, word of mouth, flyers, and ClinicalTrials.gov (for more details, see the study by Chung et al []). After consenting on the web, participants completed self-report surveys upon enrollment and were then requested to respond to follow-up surveys every 2 weeks for 6 months. All survey data and responses were anonymized and associated with a unique ID.

Figure 1. Schematic of study timeline. (A) Study and analysis timeline. Enrollment in the 6-month study proceeded from April 4, 2020, through November 1, 2020, and the final data point was collected on May 7, 2021. Manual coding analysis was conducted in 5 batches during data collection, whereas additional analyses (eg, latent Dirichlet allocation [LDA]) were conducted using the entire sample after data collection was complete. The batch numbers are listed with the number and dates of the responses they contained. (B) Participant free-response rate; 68% of participants provided at least one free response during the 6-month study, with 93% of these respondents providing multiple responses. TF-IDF: term frequency–inverse document frequency. Ethics Approval

This study was approved by the Institutional Review Board of the National Institutes of Health (NIH; 20 M-N085).

Questionnaires and Demographic Measures

At baseline, the participants completed various questionnaires assessing demographics, clinical history, and mental health symptoms (see the study by Chung et al [] for a full list of study questionnaires). Then, they were invited to complete biweekly (ie, every 2 weeks) multiple-choice questionnaires for a 6-month period, including The Psychosocial Impact of COVID-19 Survey [], which consisted of 45 multiple-choice questions that assess various attitudes, behaviors, and impacts surrounding the COVID-19 pandemic and a single free-response question (“Is there anything else you would like to tell us that might be important that we did not ask about?”). We analyzed responses to the free-response item and tested for associations with baseline demographics and clinical history questionnaires (see for details of classification of demographics and mental or physical health history [-]) as well as biweekly measures of loneliness, as measured by the University of California, Los Angeles 3-Item Loneliness Scale [], and psychological distress, as measured by the Kessler-5 []. Participants could complete a maximum of 13 survey responses, one at each study time point. Of 2497 participants who provided free responses at any time point, 0.6% (n=15; range 2-6) of individuals provided duplicate responses across the study weeks. These individuals and their responses were included in the analyses because they covered stable concerns, such as employment, clinical conditions, physical health, and living situations.

Language AnalysesSentiment Analysis and Analysis of Language Features

Sentiment analysis algorithms process text and automatically calculate the emotionality or sentiment of that text. They may simply report whether the overall text is positive or negative or use a continuous scale that quantifies both valence and intensity. To determine the optimal algorithm for the free-response data, the responses were tokenized into sentences, preprocessed, and inputted into 8 commonly used sentiment analysis applications: Stanza [], VADER [], LIWC2015 [], SentiStrength [], TextBlob [], NLPTown model [], Pysentimiento [], and TweetEval []. We also used singular value decomposition (SVD) and a majority vote measure, which combined the outputs of 8 applications into a continuous and categorical aggregate score. Of all 10 possible options for sentiment analysis (8 different tools and aggregation of their predictions using either SVD or a majority vote), TweetEval performed the best. It obtained a precision of 0.76, recall of 0.75, F1-score of 0.75, and accuracy of 0.80 and was therefore selected to measure sentiment. TweetEval is a roBERTa-based model [] trained on approximately 60 million tweets. TweetEval represents the sentiment of the text on a scale of −1 to 1, with −1 being the most negative and 1 being the most positive.

We performed a formal evaluation by assigning a polarity category to 130 sentences drawn at random, which were manually labeled by a separate observer. This allowed us to compute F score, precision, and recall to compare the 8 polarity scores of the algorithms; the SVD score; and a majority vote of the 8 polarity scores. The TweetEval score outperformed the other options and was therefore selected for further analyses.

The TweetEval values were aggregated by response so that each response had a score that was the mean of the sentence-level TweetEval values. Then, those scores were aggregated by date so that each date had a mean TweetEval value. The 7-day rolling averages of TweetEval and the number of responses were computed using the zoo package and plotted by date [].

In addition to sentiment, we focused on 3 additional language features of interest: word count (“WC” in LIWC2015 software), percentage of negative emotional words (“negemo”), and percentage of FPSPs (“i category”), which were calculated using the output from LIWC2015 [].

Manual Content Analysis

We used manual content analysis to evaluate the responses in addition to automated algorithms. Two clinicians and 4 other members of the research team (SS, JYC, LYA, Molly Cosgrove, RW, and MR-M) created initial manual content analysis categories. One clinician and 5 other members of the research team (SS, LYA, Molly Cosgrove, RW, MR-M, and JSS) annotated 4 small practice batches and met after each to discuss ambiguities and refine categories and definitions. In total, 36 categories and definitions were agreed upon and sorted into 6 overarching themes: mental health, physical health, social factors, career and finances, society (including government, community, or both), and other. A complete list of the categories is presented in . Free responses were divided among the 4 coders (SS, MR-M, RW, and JSS), and each response was reviewed and scored by 2 randomly selected coders. Each coder labeled the responses based on their content as belonging to ≥1 manual categories. The responses were annotated in 5 batches. For the fifth batch, with a date range from October 11, 2020, to May 5, 2021, LYA annotated instead of MR-M and the category “Vaccines” was added based on a consensus of the coders after noting changes in the themes of responses. Responses such as “No,” “NA,” and “Nothing to report” were not categorized by any coder and were classified as nonresponses that were removed from subsequent analyses. Clinicians (SS and JYC) reviewed responses marked as clinically significant to evaluate severity.

To assess agreement between coders, interrater reliability (IRR) was calculated using the irrCAC package to find both the Fleiss κ and Gwet AC1 statistic []. The 2 methods were chosen to complement each other because the κ statistic is very commonly used for IRR, whereas the Gwet AC1 statistic overcomes some of the κ statistic’s weakness with data with low variability [,]. For both measures, we evaluated agreement using the 1991 Altman interpretation of the κ statistic, in which <0.2 is poor, 0.2 to 0.4 is fair, 0.4 to 0.6 is moderate, 0.6 to 0.8 is good, and 0.8 to 1.0 is very good agreement [].

Automated Topic Analysis

To supplement automated coding and manual scoring in predetermined categories, we used exploratory analyses to identify the topics that emerged in the responses over time. We focused on terms unique to each month. We used term frequency–inverse document frequency (TF-IDF), a technique that finds the words that appear the most frequently in 1 document (ie, all words for a given month) and the least frequently in the others (ie, all other months). Words were lemmatized using the textstem package [], and TF-IDF was calculated using the tidytext package []. This analysis was performed independently from our manual content analysis to address topics that might have been omitted from our manual content analysis, for which categories were selected early in the pandemic and analysis proceeded in real time relative to data collection. The lemmas "coronavirus," "covid19," and "covid" were all classified as the lemma "covid," and the lemmas "vaccination," "vaccinate," and "vaccine," were all classified as "vaccine".

Finally, we used natural language processing methods, such as topic modeling and multiword expression extraction, to explore people’s thoughts and concerns during the COVID-19 pandemic. Topic modeling automatically identifies clusters of words and themes from text data sets. One of the most popular methods is latent Dirichlet allocation (LDA), which seeks to classify text documents as a mixture of distinct topics [] and has been widely used in automatic content analysis. The advantages of topic modeling are its high scalability and ability to infer topics or themes without being biased by users. However, a limitation is the potential lack of interpretability. This can occur because perplexity—a measure to evaluate the quality of topic modeling outputs and select model parameters—may be inversely correlated with human interpretability [,].

Therefore, the incorporation of additional evaluations or measures to validate the comprehensibility of topic-modeling outputs, such as human judgment [,], is necessary. In this study, we added human judgment in 3 steps to overcome the lack of interpretability. First, we used different values for the number of topics, calculated the performance for each iteration using perplexity, and compared the results to the manual number of themes found earlier by the annotators. Second, we calculated the agreement between the human and LDA topic assignments for over 100 sentences. We did this by randomly selecting 100 sentences and having 1 author sort them into the topics created by LDA. Then, the human- and LDA-selected topics were compared. Of the 100 sentences, 35 had complete agreement and 23 had weak agreement (the topic selections differed, but the author thought the LDA selection was reasonable or a closely related topic). There was no agreement among the remaining 42 sentences. Close inspection of sentences with disagreement revealed that in 14 cases, LDA-selected topics were based on keywords, but those words did not reflect the meaning of the sentence (eg, “DBT therapy has been big positive” was marked as topic 13 [relating to test results] by LDA likely because of the word “positive,” whereas the human coder rated it as topic 21 [relating to mental illness and medication]). Third, we intuitively evaluated the most representative keywords per topic (single-word terms extracted by LDA) by adding multiword terms to help represent topics better. Indeed, the output of topic modeling methods generally consists only of groupings of single-word terms. However, in natural languages, single-word terms are often part of multiword expressions and therefore do not provide complete context alone. Thus, an alternative to improve the identification of relevant topics is to incorporate multiword terms. These are expressions composed of ≥2 words with a grammatical structure and a specific meaning. Thus, we used LIDF-value [], an information retrieval measure that extracts multiword terms. LIDF-value is based on several linguistic patterns also known as lexical categories such as nouns, adjectives, etc. Therefore, to automatically assess the content of the participants’ responses, our approach consisted of four basic steps: (1) preprocessing, (2) topic modeling with LDA, (3) multiword term extraction with LIDF-value, and (4) word cloud creation. Further details regarding the automated topic analysis can be found in .

Statistical AnalysesComparing Respondents With Nonrespondents

A logistic regression was run comparing respondents and nonrespondents by gender, race, ethnicity, age, income, education, and preexisting mental health and medical conditions. Before running the logistic regression for age (the only continuous variable), the assumption of linearity between age and free-response response was tested. Participants’ ages were divided into quantiles, and logits were plotted by age category. The relationship was monotonic, therefore meeting the assumptions of logistic regression. R was used for all analyses, and ggplot2 within the package tidyverse was used for all figures, except where noted [,].

Evaluating the Likelihood of Free Response and Sentiment as a Function of Psychological State

We used a multilevel logistic regression implemented using the function glmer in the R package lme4 [] to determine whether loneliness (measured using the University of California, Los Angeles 3-Item Loneliness Scale total score) and distress (measured using the Kessler-5 overall score) influenced the likelihood of an individual providing a free response at a given time point and whether the likelihood of responding changed over time.

We also used linear mixed models restricted to participants who provided multiple free responses (n=2322) to determine whether loneliness or psychological distress influenced the mean sentiment of an individual’s response at a given time point. Linear mixed models were implemented using the function lmer within the R package lme4 []. We used a similar linear model to test whether free-response length (ie, number of words) varied over time and whether response length was related to loneliness or psychological distress. We also explored whether sentiment is associated with response length.

For each model, we included fixed effects of week (ie, time in the study relative to each participant’s time of enrollment), modeled psychological state both within- and between-subjects (ie, mean-centered within individuals and grand mean–centered across individuals), and included interactions between within- and between-subjects factors to test whether individual differences moderated the effects over time. Intercepts and slopes were treated as random in linear models, whereas logistic models included only random intercepts because of issues with model convergence. Because psychological predictors were correlated, we analyzed both combined models (reported in the main manuscript) and models that separately evaluated associations with loneliness and distress (reported in ).

Correlations Between Patient Clinical History and Language Features

We were interested in whether those with a history of mental health treatment, termed patients, used language features differently from controls. We focused on 4 language features of interest, each aggregated to be the mean by subject across all their responses over the course of the entire survey: sentiment, word count, percentage of negative emotional words, and percentage of FPSPs.

Two methods were used to determine mental health status. We determined whether an individual had a history of mental health concerns using a clinical history questionnaire. Patients were defined as individuals who reported prior mental health treatment including hospitalization, psychotropic medication, or treatment for drug or alcohol use. We used 2-sample t tests (2-tailed) to assess whether patients differed from controls (ie, individuals with no prior treatment for mental illness) for each language feature. Hedges g was calculated using the package effsize to show the effect size []. One participant did not complete the clinical history question at baseline and was therefore excluded from this analysis.

We also explored associations between language features and a continuous measure of each individual’s probability of being a patient, the patient probability score (PPS) []. PPS scores were trained on baseline questionnaire data from a subset of participants who were seen at the NIH before the pandemic and underwent a Structured Clinical Interview for DSM-5 (Diagnostic and Statistical Manual-5). Each participant who had not been seen at NIH was assigned a PPS value based on similarity to the patient or control group. For additional information and validation, refer to the study by Chung et al []. We used Spearman correlations to evaluate the associations between PPS and the 4 language features listed earlier. Seven participants were missing a PPS and were excluded from the analyses.


ResultsFree-Response Sample

Of the 3655 participants enrolled in the study [], 2497 participants responded at least once to the free-response item; these participants will be referred to as “respondents.” depicts the distribution of the respondents as a function of the number of times they provided free-response entries. The demographics of the total sample comparing respondents and nonrespondents (ie, participants who never provided free-response entries) are reported in . There was a total of 9738 free-response item responses.

Figure 2. Distribution of free responses. (A) Response frequency. Histogram showing the number of participants who responded from 1 to 13 times to the free-response question. (B) Response length. Histogram showing the distribution of word count across all responses, which ranged from 1 to 744 words. Table 1. Comparison of respondents and nonrespondentsa.
Respondent, n (%)Nonrespondent, n (%)Responding, ORb (95% CI)Gender
Female2065 (56.4)866 (23.7)1 (—c)
Male346 (9.5)255 (7)0.57d (0.48-0.68)
Nonbinary58 (1.6)21 (0.6)1.16 (0.71-1.96)
Unknown28 (0.8)16 (0.4)—Ethnicity
Hispanic or Latino124 (3.4)79 (2.2)0.71e (0.53-0.95)
Not Hispanic or Latino2271 (62.1)1026 (28.1)1 (—)
Unknown102 (2.8)53 (1.5)—Race
American Indian or Alaska Native7 (0.2)4 (0.1)—
Asian43 (1.2)47 (1.3)0.40d (0.26-0.62)
Black or African American62 (1.7)44 (1.2)0.62e (0.42-0.93)
White or Caucasian2256 (61.7)997 (27.3)1 (—)
Hawaiian or Pacific Islander0 (0)0 (0)—
Multiple races92 (2.5)48 (1.3)0.85 (0.60-1.22)
Unknown37 (1)18 (0.5)—Age (years)
Values, mean (SD)48.0 (14.9)43.7 (14.4)1.02d (1.02-1.03)Income (US $)
<35,000340 (9.3)165 (4.5)1 (—)
35,001-75,000629 (17.2)294 (8)1.04 (0.82-1.31)
75,001-100,000395 (10.8)166 (4.5)1.15 (0.89-1.50)
100,001-150,000505 (16.4)243 (6.6)1.01 (0.79-1.28)
≥150,000598 (16.4)274 (7.5)1.06 (0.84-1.34)
Unknown30 (0.8)16 (0.4)—Education
Less than high school4 (0.1)6 (0.2)—
High school graduate or above52 (1.4)41 (1.1)1 (—)
Some college or above179 (4.9)130 (3.6)1.09 (0.67-1.73)
Associate degree or above111 (3)71 (1.9)1.23 (0.74-2.04)
Bachelor’s degree or above788 (21.6)366 (10)1.70e (1.10-2.60)
Advanced or professional degree1355 (37.1)540 (14.8)1.98f (1.29-3.01)
Unknown8 (0.2)4 (0.1)—Mental health status
Mental health history1384 (37.9)609 (16.7)1 (—)
No mental health history1113 (30.5)548 (15)0.89 (0.78-1.03)
Unknown0 (0)1 (0)—Physical health status
Has medical illness1369 (37.5)522 (14.3)1 (—)
Does not have medical illness1128 (30.9)635 (17.4)0.68d (0.59-0.78)
Unknown0 (0)1 (0)—

aThis table compares the demographics and clinical history of participants who responded at least once to the free-response question (“respondents”) and those who did not (“nonrespondents”). The odds ratio of responding for each group compared with the reference group is shown in the third column. The reference groups were denoted by those with an odds ratio of 1.

bOR: odds ratio.

cGroups with too small a sample size and those whose demographics were unknown were not included in the logistic regression.

dMean values <0.001.

eMean values between 0.05 and 0.01.

fMean values between 0.01 and 0.001.

Comparing Respondents With Nonrespondents

Logistic regressions indicated that the likelihood of responding was influenced by several demographic factors, including gender, race, ethnicity, education, and age, as reported in . For example, the odds of male participants responding compared with female participants were 43% lower (P<.001), and the odds of Asian and Black participants responding compared with White participants were 60% (P<.001) and 38% (P=.02) lower, respectively. Although education influenced the likelihood of responding such that the odds of participants with bachelor’s or advanced degrees responding compared with participants who were high school graduates were 70% (P=.02) and 98% (P=.002) higher, respectively, we did not observe any influence of income. For additional demographic factors, please refer to .

Interestingly, there was no impact of mental health history on an individual’s likelihood of providing free responses (). However, physical health history did influence an individual’s likelihood of providing free responses, and the odds of participants without physical health conditions responding compared with those with these conditions was 32% lower (P<.001).

Impact of Psychological State on Likelihood of Providing a Free Response

We used multilevel models to evaluate the likelihood of an individual providing a response on a given week as a function of time and psychological state. All models revealed that an individual’s likelihood of providing a free-response decreased over time (), although the effects were quite small based on odds ratios. Individuals were more likely to respond to the free-response item when feeling more distressed, as measured by the Kessler-5, and individuals with higher average distress were more likely to respond. Interestingly, we observed an interaction between within-subjects distress and between-subjects distress such that the effect of distress on the likelihood of responding for a given week was strongest for individuals with low average distress, perhaps because individuals with high average distress responded consistently over time. There was no effect of loneliness on the likelihood of responding when it was included in the same model as the distress measure; however, the fixed effects of loneliness and distress were correlated across individuals (r=0.633), and we therefore computed separate models for each predictor (). Modeling distress alone confirmed the findings from the model that included all factors with similar coefficients. When loneliness was included in a separate model, we found that individuals were more likely to respond when they reported higher loneliness (B=0.05; P=.004) and that individuals who reported being more lonely on average were more likely to respond (B=0.09; P<.001).

Table 2. Multilevel logistic model examining association among distress, loneliness, and likelihood of responsea.PredictorORb (95% CI)P value(Intercept)0.37 (0.34-0.40)<.001Week0.98 (0.98-0.98)<.001Distress1.13 (1.11-1.15)<.001Mean distress1.03 (1.01-1.06).005Loneliness0.98 (0.95-1.02).35Mean loneliness1.05 (0.99-1.10).10Distress × mean distress0.99 (0.98-0.99)<.001Loneliness × mean loneliness0.98 (0.96-1.01).18

aThis table presents the results of a multilevel logistic model examining the association between the likelihood of response on a given week and self-reported distress (measured using the Kessler-5) and loneliness (measured using the 3-item Loneliness Scale). Distress and loneliness were modeled both within (ie, dynamic fluctuations across intervals) and between participants (ie, mean distress and mean loneliness). There were 26,073 observations across 3163 individuals, with intraclass correlation coefficient=0.49, marginal R2=0.017, conditional R2=0.495, random error variance (σ2)=3.29, and variance of random intercepts (τ00SUBJECT_NUMBER)=3.11. The results of the models that separately analyzed distress and loneliness are presented in .

bOR: odds ratio.

Sentiment During the Study Period

The results of sentiment analysis are shown in . As the TweetEval scores range from −1 to 1, it is clear from the figure that the average sentiment of free responses remained negative for the entire study period. We observed a gradual upward tendency in sentiment starting in November, which coincides with announcements about the Pfizer vaccine (). However, our sample size and proportion of responses were reduced at this time, and we did not run statistical analyses on the influence of time on sentiment; therefore, we do not make strong inferences about these overall patterns based on group averages.

Figure 3. Sentiment over time: this figure plots the 7-day rolling average of sentiment by day from April 7, 2020, to May 3, 2021 (responses from before April 7, 2020, or after May 3, 2021, are omitted due to the 7-day rolling average). The opacity of the line represents the 7-day rolling average of response count. TweetEval Sentiment below 0 is considered negative. Red vertical bars mark the dates of major national events in the United States, which emerged in free-response comments based on term frequency–inverse document frequency.

Important events throughout the pandemic that may have affected groupwide sentiment are marked in . These events were selected based on the keywords seen in the TF-IDF analysis (see analysis below in Themes of Free Responses Across Time). The selected events were important events in the United States, given that most of the study participants came from the United States, with all 50 states represented; of free-response respondents, 2474 were based in the United States and 23 were international. The 5 events chosen were the death of George Floyd (May 25, 2020), the death of Ruth Bader Ginsburg (September 18, 2020), the 2020 US Presidential Election (November 3, 2020), the beginning of COVID-19 vaccination in the United States (December 14, 2020 []), and the US Capitol attack (January 6, 2021). As depicted in , these events were followed by steep changes in the average sentiment of responses, as measured by TweetEval.

Association Between Psychological State and Sentiment of Responses

We used multilevel models to evaluate the dynamic association between self-reported psychological states and response sentiment, as measured by the mean TweetEval score per response. A model that combined distress and loneliness () indicated that sentiment was negative on average, based on the intercept, and that sentiment increased over time within individuals, which is consistent with the overall average depicted in . Responses were more negative at time points when individuals reported greater distress () or loneliness. We also observed that individuals with higher mean distress had more negative sentiment on average () and that there was a substantial interaction between within-subjects distress and between-subjects distress, such that the effect of distress on sentiment was strongest for those with low average distress scores. Between-subjects variations in loneliness did not influence sentiment when loneliness was included in the same model as distress; however, when loneliness and distress were modeled separately, we observed substantial associations with each measure, both within and between participants ().

Table 3. Linear mixed model examining association among distress, loneliness, and likelihood of responsea.PredictorsEstimates (95% CI)P value(Intercept)−0.385 (−0.396 to −0.373)<.001Week0.002 (0.001 to 0.004)<.001Distress−0.038 (−0.043 to −0.033)<.001Mean distress−0.021 (−0.024 to −0.017)<.001Loneliness−0.019 (−0.029 to −0.010)<.001Mean loneliness0.002 (−0.007 to 0.010).68Distress × mean distress0.002 (0.001 to 0.004)<.001Loneliness × mean loneliness−0.002 (−0.009 to 0.004).49

aThis table presents the results of a linear mixed model examining the associations between negative sentiment, self-reported distress, and loneliness (see the Methods section). There were 9253 observations across 2314 individuals, with intraclass correlation coefficient=0.14, marginal R2=0.064, conditional R2=0.199, random error variance (σ2)=0.17, variance of random intercepts (τ00SUBJECT_NUMBER)=0.03, and variance of random slopes=0. The results from the models that separately analyzed distress and loneliness are presented in .

Figure 4. Association between distress (Kessler-5) and sentiment (TweetEval score) in free responses. Scatterplot illustrating association between biweekly measures of distress (as measured by Kessler-5) and mean sentiment of free responses as measured by TweetEval. Linear mixed models indicate that distress is negatively associated with sentiment within individuals, and that individuals with higher mean distress (visualized in lighter blue) use more negative language on average. Association Between Psychological State and Response Length

We tested whether response length varied as a function of loneliness, distress, and time (). Response length ranged from 1 word to a maximum of 744 words (mean 41.63, SD 49.05; median 27; ). Linear mixed models indicated that response length decreased slightly over the course of an individual’s participation, such that each biweekly interval was 0.5 words shorter on average. Response length was positively associated with distress (), such that an increase of 1 unit of distress on a given week was associated with an additional 0.8 words, and individuals who report higher distress provided responses that were 0.8 words longer on average. There was no effect of loneliness on response length when distress and loneliness were included in the same model, but separate analyses indicated that responses were longer in lonely individuals than in nonlonely individuals (P<.001), such that an increase of one unit in average loneliness was associated with 1.3 more words on average ().

Table 4. Linear mixed model examining association among distress, loneliness, and response lengtha.PredictorsEstimates (95% CI)P value(Intercept)36.726 (35.294 to 38.157)<.001Week−0.249 (−0.358 to −0.140)<.001Distress0.811 (0.361 to 1.261)<.001Mean distress0.827 (0.371 to 1.282)<.001Loneliness−0.081 (−0.988 to 0.826).86Mean loneliness0.166 (−0.867 to 1.200).75Distress × mean distress−0.044 (−0.162 to 0.073).46Loneliness × mean loneliness−0.235 (−0.851 to 0.381).46

aThis table presents the results of a linear mixed model examining associations between word count, self-reported distress, and loneliness (see the Methods section). There were 9272 observations across 2314 individuals, with intraclass correlation coefficient=0.33, marginal R2=0.009, conditional R2=0.340, random error variance (σ2)=1406.63, variance of random intercepts (τ00SUBJECT_NUMBER)=705.36, variance of random slopes for distress=1.96, and variance random slopes for loneliness=17.25. The results from the models that separately analyzed distress and loneliness are presented in .

Associations Between Mental Health History and Language Features

As reported in , individuals who reported prior mental health treatment had more negative sentiments (as measured by TweetEval), wrote longer responses, used more negative emotional words, and had higher frequencies of FPSP use. The effect sizes and P values are presented in . We also observed small but substantial associations with language features when we used our continuous PPS: PPS was associated with more negative sentiment (r=−0.12; P<.001), higher word counts (r=0.09; P<.001), more negative emotional words (r=0.04; P<.05), and higher FPSP use (r=0.13; P<.001).

Table 5. Relationship between mental health history and language usea.
Sentiment (TweetEval)Word countNegative emotional words (%)First-person singular pronoun (%)Mental health history, mean (SD)−0.0044 (0.0034)37.1 (33.8)5.53 (6.15)7.46 (4.77)No mental health history, mean (SD)−0.0038 (0.0036)34.2 (30.2)4.77 (4.42)6.56 (4.46)P value<.001.03<.01<.001Hedges g effect size−0.150.090.140.19

aA 2-sample t test (2-tailed) was run to compare the use of 4 language features by mental health history, as determined by mental health or drug or alcohol treatment or mental health hospitalization. The language features selected were the same as those used in the Spearman correlation analysis.

Themes of Free Responses Across Time

The results of the manual coding are reported in . The most frequently annotated categories were mental health or emotion (5159/9738, 53% of responses), social or physical distance (2475/9738, 25% of responses), and policy or government (1938/9738, 20% of responses). The Fleiss κ coefficient for IRR for all responses was 0.73 (95% CI 0.72-0.73), which is characterized as “good” agreement between raters []. The Gwet AC1 statistic coefficient for IRR for all responses was 0.96 (95% CI 0.96-0.96) or “very good” agreement []. Agreement for individual categories is presented in . Only 2 categories (“non–health-related concern for the immediate circle” and “clarification of survey response”) were characterized as “fair” by the Fleiss κ statistic; all others ranged from moderate to very good agreement.

Table 6. Manual coding of free-response topicsa.Theme and categoryExample responseCoding count (N=9738), n (%)Mental health
General negative mental health (ie, negative emotion or cognitive symptom)“I have experienced a lot of physical symptoms of stress/anxiety, including fatigue and pain.”5159 (53)
Clinically significant (eg, mention of diagnosis, treatment, suicidality, or domestic violence)“I have been on Prozac for 3 weeks now”1837 (9)
Mood disorder“I’m experiencing some depression, but I’m not having suicidal thoughts. I have a hard time thinking about a future that is different than it is now.”475 (5)
Anxiety disorder“I took wellbutrin a few years ago, started a new RX this week for anxiety over Covid”309 (3)
Other psychiatric diagnosis“PTSDb symptoms re-activated by feeling trapped, uncertainty, amount of unknowns, untrustworthy authority figure”275 (3)
Suicidality“had to call suicide prevention hotline due to crisis”130 (1)Physical health
Non–COVID-19–related physical health“Symptoms listed above due to chronic asthma/allergies”1444 (15)
Change in health behaviors, activities, or hobbies“I am eating more and gaining weight”798 (8)
Suspected or confirmed COVID-19 illness or self-test“I did antibody testing with a home-kit because I was worried about my symptoms”486 (5)
Sleep“My sleep schedule has been completely thrown off.”366 (4)
Deferred medical care“Close friend diagnosed with cancer and treatment was delayed due to pandemic so it spread faster than expected.”159 (2)
Drugs and alcohol“Watching way too much TV; and smoking a heck of a lot more than I used to.”154 (2)
COVID-19–related risk factors in self“Constant worry because I have asthma/COPDc”149 (2)
Pregnancy“I am pregnant with my first child and am very nervous about contracting COVID or my partner contracting COVID.”42 (1)Social factors
Experience with social or physical distance and masks“extended family has different beliefs about social distancing which increase stress”2475 (25)
Health condition or health-related concern about immediate circle“Most of my stress is related to a sick family member (not Covid)”1697 (17)
Providing care for dependents“It is increasingly challenging to work full-time and parent children who are attending school at home.”570 (6)
Strained relationships“My relationship with my spouse has been more rocky. It’s been a lot to rely on one introverted person for my extroverted needs.”547 (6)
Non–health-related concern for immediate circle“I’m anxious about the fears and anxieties of my closest friend. He’s not handling the virus threat well at all.”513 (5)
Mention of non–COVID-19–related death“Mother passed away from pancreatic cancer”393 (4)
Loneliness or isolation“I am beginning to notice the lack of and to miss physical presence and physical contact with people – besides my partner.”308 (3)
Positive relationships“I live in a beautiful place with my wife, who is the love of my life.”172 (2)
Mention of COVID-19–related death“My sister in law died from COVID-19 after 23 days in ICUd. Buried her last Saturday.”95 (1)Career and finances
Other work-related issues“It has been more difficult to focus on work while working from home.”

留言 (0)

沒有登入
gif