Cross-Platform Detection of Psychiatric Hospitalization via Social Media Data: Comparison Study


IntroductionBackground

Despite its relatively low prevalence compared with other mental health disorders, the burden of schizophrenia spectrum disorder (SSD) on patients, families, and society is substantial []. To mitigate the burden of SSD, early diagnosis and treatment are crucial. However, psychotic disorders, including SSD, often receive delayed attention and care, resulting in worse health outcomes [,]. At the same time, the use of social media is high among patients with serious psychotic disorders such as SSD, especially among adolescents and young adults, when SSD typically emerges [,]. For instance, Birnbaum et al [] studied social media use among adolescents and young adults with psychotic and mood disorders and found that 97.5% of participants (mean age 18.3 years) regularly used social media, spending approximately 2.6 (SD 2.5) hours per day on the web. Similarly, Miller et al [] studied the use of digital technologies among patients diagnosed with SSD and found that, among participants with access to the internet, 98% reported using at least one social media service and 57% used social media daily.

Given this information, there has been an established body of research on using social media data to identify and predict psychiatric outcomes of social media users with SSD using machine learning classifiers [-]. The most robust data sources available to train these classifiers consist of textual content posted on the web. Prior work in speech and text analysis among patients with SSD has identified reliable linguistic markers associated with SSD, which have been successfully used as features for the aforementioned classifiers [,,]. These include certain word frequencies, word categories, and self-referential pronouns [,]. Given that the use of image- and video-based social media platforms such as Instagram, Snapchat, and TikTok is associated with youths, there has also been prior work in the analysis of images comparing between patients with SSD and healthy controls [,]. Hänsel et al [] identified additional image markers associated with SSD, such as the image’s colorfulness and saturation and the average number of faces per image. By exploiting these markers, previous research conducted by Birnbaum et al [] and Ernala et al [] built classifiers to distinguish between users with a confirmed diagnosis of SSD and healthy controls on Facebook and Twitter with area under the receiver operating characteristic curve (AUROC) scores of 0.75 and 0.82, respectively.

Although such results demonstrate the potential of automated techniques in predicting the mental health outcomes of individuals with SSD via social media data, many research gaps remain that need to be addressed before psychiatrists can reliably deploy such techniques for clinical purposes. Most prior work in this area primarily focused on a single source of social media data, either exclusively from Twitter or Facebook, for downstream classification and analysis tasks []. However, previous research has also shown that many social media users, especially youths, use different social media platforms for different purposes because of their variety in affordances and culture. Among youths, Facebook use is associated with keeping up with close and distant friends, whereas Instagram and Snapchat use is associated with self-expression and gratification [,]. In addition, researchers have argued that social media users have fragmented identities across platforms [,]. Therefore, using a single source of social media data to build psychiatric hospitalization prediction models may potentially lead to low-sensitivity prediction models, making them unsuitable for clinical purposes. However, few studies have quantified the extent to which classifiers trained on data from one social media platform are generalizable to other platforms. To this end, our study aimed to measure the generalizability of social media–based classifiers aimed at predicting upcoming psychiatric hospitalizations to data from unseen social media platforms. In addition, we aimed to surface any evidence of the differing fragmented identities that are reflected on 3 popular social media platforms—Twitter, Facebook, and Instagram—that might affect the models’ generalizability.

Objectives

The research question we attempted to answer was as follows: given the preliminary evidence of fragmented identities that are reflected on the investigated social media platforms, can we build classifiers that can effectively detect users at risk of an upcoming psychiatric hospitalization using social media data from platforms unseen in the training data?

To answer our research question, we collated textual and image content (if available) from consenting participants’ social media data from Facebook, Twitter, and Instagram. We then trained platform-specific classifiers to distinguish between social media data from healthy controls and data from patients with SSD with an upcoming psychiatric hospitalization. We compared the performance of classifiers on testing data between seen and unseen social media platforms from the training data. We also compared and analyzed the top predictive features and the feature importance distributions between the 3 platform-specific classifiers, with a view toward finding potential empirical evidence for fragmented identities between the various social media platforms.


MethodsRecruitment

We recruited participants clinically diagnosed with SSD and clinically verified healthy controls aged between 15 and 35 years. These data were collected as part of a broader research initiative involving the authors of this paper to identify technology-based health information to provide early identification, intervention, and treatment for young adults with SSD [].

For participants with SSD aged between 15 and 35 years (141/268, 52.6%), diagnoses were based on clinical assessment of the most recent episode and were extracted from participants’ medical records at the time of their consent. Participants in this group were recruited from the Northwell Health Zucker Hillside Hospital and collaborating institutions located in East Lansing, Michigan. Participants were excluded if they had an IQ of <70 (per clinical assessment), autism spectrum disorder, or substance-induced psychotic disorder.

In addition, healthy volunteers aged between 15 and 35 years (127/268, 47.4%) were approached and recruited from an existing database of eligible individuals who had already been screened for previous research projects at Zucker Hillside Hospital and had agreed to be recontacted for additional research opportunities. Healthy status was determined by either the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders conducted within the past 2 years or the Psychiatric Diagnostic Screening Questionnaire [,]. Participants were excluded if clinically significant psychiatric symptoms were identified during the screening process. Additional healthy volunteers were recruited from a southeastern university via a web-based student community research recruitment site. Finally, healthy volunteers were also recruited from the collaborating institutions located in East Lansing, Michigan.

Data Collection

All consenting participants were asked to download and share their Facebook, Twitter, and Instagram data archives. We collected all linguistic content from participants’ Facebook and Twitter archives (ie, status updates and comments on Facebook and posts shared on Twitter). In addition, we collected image content from participants’ Facebook and Instagram archives, including profile pictures and story photos.

Next, we also collected the medical history of each participant (following consent and adoption of Health Insurance Portability and Accountability Act–compliant policies). This included primary and secondary diagnosis codes, the total number of hospitalizations, and admission and discharge dates for each hospitalization event. Hospitalization data were collected from the medical records at the time of consent. As all consented patient participants in the study had also received care at the Zucker Hillside Hospital, the medical records at the hospital were accurate and up to date to the best of the hospital’s efforts. We only counted psychiatric hospitalizations (not hospitalizations for other nonpsychiatric reasons). Thereafter, the study team accessed the corresponding consented patients’ medical records to extract all their recorded hospitalization events in a similar manner to previous studies using this source of data [,].

Finally, we collected social media data from all available platforms for each participant with at least one known hospitalization event within a 6-month window before the latest hospitalization event, ensuring that there were no hospitalization events within these 6 months. This was done to ensure that the data gathered were representative of the participants’ healthy mental status before symptomatic exacerbation and subsequent hospitalization. A 6-month period, which we refer to as the windowed data, was selected as it represents an interval of time long enough to identify changes signaling symptomatic exacerbation while also containing sufficient data required to train machine learning models. For healthy control participants without any hospitalizations, we randomly sampled a nonempty 6-month window of social media data for each available social media platform (nonempty meaning that there was at least some social media activity). provides a visual description of the windowing process.

Figure 1. Diagram representing the windowing process used to gather participants’ social media data before hospitalization events. Bold text represents the selected data windows. Crosses represent hospitalization events. The X represents invalid data windows. A: Windowing—with hospitalizations; B: Windowing—without hospitalizations. View this figureFeature Engineering

To encode participants’ social media data for the downstream classification and analysis tasks outlined in our research objectives, we identified and extracted the following categories of features from these data for all 3 investigated social media platforms: (1) n-gram language features (n=500), (2) Linguistic Inquiry and Word Count (n=78), (3) lexico-semantic features (n=3), (4) activity features (n=9), and (5) image features (n=23; Instagram and Facebook only).

The specific feature categories were chosen based on relevant previous literature, particularly relating to the use of social media data to infer mental health attributes and psychiatric outcomes [,]. Note that all features were computed at the individual participant level. More details about this process can be found in [,,,-].

Feature Selection

Using the aforementioned features, for each of the 3 examined social media platforms, we encoded available participants’ textual and image data on Facebook and Instagram into 613-dimensional feature vectors and textual data on Twitter into 590-dimensional feature vectors. This yielded a Facebook data set of dimension 254 × 613, a Twitter data set of dimension 51 × 590, and an Instagram data set of dimension 134 × 613. We shall refer to these data sets as F, T, and I for Facebook, Twitter, and Instagram, respectively.

As the feature set might contain features that are noisy and irrelevant, the classification models may be unstable and produce suboptimal results []. To maximize the predictive power of the models while also reducing the redundancy and computational resources needed to train them, feature selection methods were used []. More specifically, we adopted the ANOVA F test to rank the features based on their F statistic under the test, which has been shown to produce optimal feature sets in previous research on the classification of social media data belonging to patients with SSD [,].

We trained a random forest model, with 5-fold stratified cross-validation to fine-tune hyperparameters, on data sets F, T, and I with an 80:20 train-test split, using only the top k percent of features based on the ranking given by the ANOVA F test on the classification, where k is between 10 and 100 in increments of 10. Via an examination of the evaluation metrics on the test sets (described in the Classification Algorithms and Metrics section), we determined that using only the top 20% of the features (based on their F statistic under the ANOVA F test) yielded the best results on unseen data across all 3 platforms. We will be using this subset of features moving forward.

Combinatorial Classification Methods

To answer the research question laid out in the Introduction section, we adopted a 3 × 3 combinatorial classification design, where we trained and tested machine learning models on the psychiatric hospitalization prediction task using all possible pairs of training and testing data sets. provides a visual description of our experimental design. For intraplatform experiments (where the training and testing data came from the same platform; eg, training and testing on Facebook data), we trained and tested the models on an 80 to 20 train-test label-stratified split based on the Scikit-learn train_test_split() function (version 0.24.1) []. For interplatform experiments (where the training and testing data came from different platforms; eg, training on Facebook data and testing on Instagram data), we trained the model on the entirety of the training data set and evaluated it on the entirety of the testing data set.

Figure 2. Diagram representing the classification experiments performed and their nature within the 3 × 3 combinatorial design. View this figureClassification Algorithms and Metrics

For both intra- and interplatform experiments, training data represented by the top 20% of features (as described in the Feature Selection section) were fed into a model to learn the classification task. We tried training the model over several algorithms, including random forest, logistic regression, support vector machine, and multilayer perceptron []. We selected these algorithms as they represented a variety of different types of learning algorithms []. This ensured that our analysis of performance differences between intra- and interplatform experiments would hold irrespective of the learning algorithm selection. We used the Scikit-learn implementation (version 0.24.1) for all the aforementioned algorithms []. For each algorithm, we fine-tuned its hyperparameters using 5-fold stratified cross-validation via the Scikit-learn GridSearchCV() pipeline, retaining the best hyperparameters per algorithm for analysis []. The chosen hyperparameters for each classification algorithm are provided in (all other hyperparameters were left as default according to the Scikit-learn specification).

We measured the performance of the models using the metrics outlined in , all of which are commonly used in binary classification models. In this case, we abbreviated the number of true positives, true negatives, false positives, and false negatives as TP, TN, FP, and FN, respectively [].

Hyperparameters chosen for each classification algorithm.

Random forest

max_depth: 15n_estimators: 100max_features: none

Logistic regression

Support vector machine

Kernel: rbfC: 0.01Gamma: scale

Multilayer perceptron

Alpha: 0.0001Hidden_layer_sizes: (512, 256, 128)Textbox 1. Hyperparameters chosen for each classification algorithm.Metrics used to measure model performance.

Accuracy

Also known as Rand accuracy, the ratio of correct predictions to all predictions

Precision

The ratio of correct positive predictions to the total number of positive predictions

Recall

The ratio of correct positive predictions to the total number of true positive instances

F1-score

The harmonic mean between precision and recall

Area under the receiver operating characteristic curve (AUROC)

The AUROC, which plots the false positive rate against the true positive rate and, in practice, is often estimated using the trapezoidal rule with the following formula:Textbox 2. Metrics used to measure model performance.Feature Importance Selection

We used Shapley Additive Explanations (SHAP) to examine how certain features affected our model’s decision to predict users with potential psychiatric hospitalization because of SSD given their social media data from the 3 inspected social media platforms. Our decision to use SHAP rather than other explainability methods stems from the fact that SHAP is not only model-agnostic but also the most theoretically sound explainability framework among the available options. This is because SHAP feature scores can be calculated for localized samples and for the entire global data set []. SHAP is based on Shapley values, a game-theoretical concept that intuitively describes each feature’s contribution to the outcome after considering all possible combinations of features [].

For each of the intraplatform experiments within the 3 × 3 combinatorial design and each machine learning model, we calculated the average SHAP values for each of the features (ie, their importance to the prediction) across all instances within the testing set. We then recorded the list of features sorted in descending order according to the average SHAP values measured by each model. In the case of models with native support for feature importance extraction, including random forest (Gini importance) and logistic regression (feature coefficients), we also calculated and recorded them in an equivalent manner to SHAP values.

Robustness Checks

To ensure that our findings regarding differences in model performance between models and between intra- and interplatform experiments still held when certain aspects of the training and testing data sets were made more ideal, we performed several robustness checks, which are described in .

Ethics Approval

The study was approved by the institutional review board of Northwell Health (the coordinating institution) and the institutional review board of the participating partners (Georgia Tech approval H21403). Participants were recruited from June 23, 2016, to December 4, 2020. Written informed consent was obtained from adult participants and legal guardians of participants aged <18 years. Assent was obtained from participating minors.


ResultsData Characteristics

In total, 268 participants (mean age 24.73, SD 5.64 years; male: 127/268, 47.4%; SSD: 141/268, 52.6%) with nonempty windowed data for at least one platform were included. Of these 268 participants, 254 (94.8%; SSD: 133/254, 52.4%) had valid windowed Facebook data, 51 (19%; SSD: 7/51, 13.7%) had valid windowed Twitter data, and 134 (50%; SSD: 42/134, 31.3%) had valid windowed Instagram data. Among participants with valid data for more than one platform, 17.5% (47/268; SSD: 5/47, 10.6%) had valid data for both Facebook and Twitter, 14.2% (38/268; SSD: 4/38, 10.5%) had valid data for both Twitter and Instagram, and 44.4% (119/268; SSD: 34/119, 28.6%) had valid data for both Facebook and Instagram. Finally, 14.2% (38/268; SSD: 4/38, 10.5%) of participants had valid data for all 3 platforms. shows the demographic and clinical characteristics of these 268 participants. describes the summary statistics, including mean and median, for these windowed data for each of the 3 social media platforms grouped by clinical status (SSD vs control). shows the distribution of available posts for participants in each of the 3 investigated platforms.

Table 1. Demographic and clinical characteristics of the participants (N=268).CharacteristicSSDa (n=141)Control (n=127)Full sampleAge (years), mean (SD)24.86 (5.49)24.57 (5.82)24.73 (5.64)Sex, n (%)
Male89 (63.1)38 (29.9)127 (47.4)
Female52 (36.9)89 (70.1)141 (52.6)Race or ethnicity, n (%)
African American or Black64 (45.4)19 (15)83 (31)
Asian20 (14.2)23 (18.1)43 (16)
White37 (26.2)75 (59.1)112 (41.8)
Mixed race or other15 (10.6)5 (3.9)20 (7.5)
Hispanic5 (3.5)4 (3.1)9 (3.4)
Pacific Islander0 (0)1 (0.8)1 (0.4)Primary diagnosis, n (%)
Schizophrenia67 (47.5)N/Ab67 (25)
Schizophreniform26 (18.4)N/A26 (9.7)
Schizoaffective25 (17.7)N/A25 (9.3)
Unspecified SSDs23 (16.3)N/A23 (8.6)
No diagnosisN/A127 (100)127 (47.4)

aSSD: schizophrenia spectrum disorder.

bN/A: not applicable.

Table 2. Summary statistics for windowed data for both the control class and the schizophrenia spectrum disorder (SSD) class (ie, participants hospitalized with SSD). In this table, we consider data from Facebook, Twitter, and Instagram, as mentioned previously.
Facebook (user: n=254; post: n=169,425)Twitter (user: n=51; post: n=23,777)Instagram (user: n=134; post: n=23,551)
SSD classControl classSSD classControl classSSD classControl classTotal users, n (%)133 (52)121 (48)7 (14)44 (86)42 (31)92 (69)Total posts, n (%)114,793 (68)54,632 (32)991 (4)22,786 (96)7111 (30)16,440 (70)Posts, mean (SD)863.1 (2365.1)451.5 (818.87)141.6 (255)519.9 (1166.9)169.3 (445.4)178.7 (234.6)Posts, median2601843713854.5103Posts, range2-23,5891-48521-7581-70561-29091-1328Figure 3. Cumulative distribution function (CDF) curves of users and their number of posts for the schizophrenia spectrum disorder and control classes per data set: (A) Facebook (left), (B) Twitter (center), and (C) Instagram (right). View this figureResults of Combinatorial Classification

We report the full results of the intraplatform experiments in . We also report the full results of the interplatform experiments in to . Finally, we report the receiver operating characteristic curves for the best-performing logistic regression model for the experiments from to in .

Elaborating on the results from , we found that, among the 4 classification algorithms that we used, the logistic regression model performed the best across the 3 intraplatform experiments, with the best performances for all of them. More elaborately, for the intraplatform experiments, performance reached its peak with the logistic regression model with an average F1-score of 0.72 (SD 0.07), accuracy of 0.81 (SD 0.08), and AUROC of 0.749 (SD 0.06). In contrast, the worst-performing model (in this case, multilayer perceptron) achieved an average F1-score of 0.521 (SD 0.19), accuracy of 0.714 (SD 0.19), and AUROC of 0.623 (SD 0.16) for the intraplatform experiments. Thus, we will be using the logistic regression model for further analysis regarding feature importance between platforms. These results align with previous research and, thus, could be considered a soft replication of those findings [,].

By contrast, by aggregating the metrics for the interplatform experiments presented in to , the average F1-score decreased to 0.428 (SD 0.11), accuracy decreased to 0.559 (SD 0.06), and AUROC decreased to 0.533 (SD 0.03) for the logistic regression model. This constitutes, on average, a drop of 40%, 31.4%, and 28.8% in F1-score, accuracy, and AUROC score, respectively, from the intraplatform experiments. As just demonstrated, when comparing the effectiveness of models between intraplatform and interplatform experiments, we found a consistent drop in performance for all the investigated social media platforms. The drop in test F1-score, given the best-performing logistic regression model, was the most drastic for Facebook at 0.364 (46%) and least drastic for Twitter at 0.08 (14%), averaging a drop of 0.285 (40%, SD 0.13) going from 0.713 for intraplatform experiments to 0.428 for interplatform experiments. Such trends hold even when disparities in data set size and dual-platform data availability (as described in the Methods section under Robustness Checks) are applied to the training and testing data ().

Table 3. Classification results for all intraplatform classification experiments. In this table, for instance, Facebook indicates the Facebook-Facebook experiment.ModelFacebookTwitterInstagram
AccaPbRcF1AUROCdAccPRF1AUROCAccPRF1AUROCRandom forest0.7390.7390.7380.7380.7090.7450.1500.1160.1160.4940.70.6480.6370.6370.681SVMe0.7220.7470.6920.7150.7230.8540.5410.450.4630.6970.7400.7370.7570.7430.805MLPf0.5060.4060.5070.3670.5160.8450.4580.450.4260.6920.7920.7710.7940.770.840Logistic regression0.7590.7670.7580.7560.7270.8810.7420.60.630.7720.7920.7710.8010.7730.848

aAcc: accuracy.

bP: precision.

cR: recall.

dAUROC: area under the receiver operating characteristic curve.

eSVM: support vector machine.

fMLP: multilayer perceptron.

Table 4. Classification results for the interplatform classification experiments for Facebook training data.ModelTwitterInstagram
AccaPbRcF1AUROCdAccPRF1AUROCRandom forest0.3920.2210.880.3540.5790.3790.3280.9520.4880.537SVMe0.5450.2530.720.3730.6120.4320.3370.8600.4830.550MLPf0.5870.2400.550.3340.5730.4350.3320.8120.4710.539Logistic regression0.6280.2460.470.3230.5670.4720.3440.7750.4760.555

aAcc: accuracy.

bP: precision.

cR: recall.

dAUROC: area under the receiver operating characteristic curve.

eSVM: support vector machine.

fMLP: multilayer perceptron.

Table 5. Classification results for the interplatform classification experiments for Twitter training data.ModelFacebookInstagram
AccaPbRcF1AUROCdAccPRF1AUROCRandom forest0.5310.5690.3780.4520.5360.6280.3310.2070.2520.512SVMe0.5140.530.5370.5300.5130.5630.3400.420.3730.523MLPf0.5330.5610.4400.4920.5360.5570.3250.3950.3560.512Logistic regression0.5340.5520.5220.5350.5350.5780.3620.470.4080.548

aAcc: accuracy.

bP: precision.

cR: recall.

dAUROC: area under the receiver operating characteristic curve.

eSVM: support vector machine.

fMLP: multilayer perceptron.

Table 6. Classification results for the interplatform classification experiments for Instagram training data.ModelFacebookTwitter
AccaPbRcF1AUROCdAccPRF1AUROCRandom forest0.510.5230.6120.5630.5070.7510.3690.420.3860.624SVMe0.5240.5440.510.5240.5250.6910.2130.250.2290.521MLPf0.5540.5840.480.5260.5570.6830.2010.230.2140.51Logistic regression0.5160.5240.6890.5950.510.6280.2560.520.3420.587

aAcc: accuracy.

bP: precision.

cR: recall.

dAUROC: area under the receiver operating characteristic curve.

eSVM: support vector machine.

fMLP: multilayer perceptron.

Figure 4. Receiver operating characteristic (ROC) curves for the classification experiments given the best logistic regression model. (A), (B), and (C) are curves for the Facebook, Twitter, and Instagram intraplatform results, respectively, from . (D) and (E) are the ROC curves for the interplatform experiments from , where Facebook was used as the training data. View this figureFeature Importance Analysis

We hypothesized that the decrease in performance from intraplatform experiments to interplatform experiments, as presented previously, was driven by differences in feature importance learned by models when trained on data from different social media platforms (even when they shared the same feature set). By extracting the list of SHAP features from the models per the method described previously, we found support for this hypothesis. Specifically, we observed little overlap between them across platforms among the top 25 features for each model and platform (when holding the model constant). On average, there were only 4.66 overlapping features for the same logistic regression classification model across platforms (the best-performing model based on the previous discussions). In addition, we found that the lists of feature importance for each of the platforms, based on the logistic regression model, had very weak rank correlation pairwise. Fully elaborating on the statistical results for the Kendall rank correlation coefficient, we found very weak rank correlations between the ranked lists of feature importance for Facebook and Twitter (τb=0.081; P=.003), Facebook and Instagram (τb=0.041; P=.01), and Twitter and Instagram (τb=0.055; P=.05). We report the average SHAP values and logistic regression coefficients of the top 10 features based on their SHAP values, along with their average value in the SSD class and the control class, in .

Table 7. Top 10 features for the logistic regression (LR) model for each of the platforms (Linguistic Inquiry and Word Count features are italicized) based on their Shapley Additive Explanations (SHAP) values.Platform and feature acronymFeature descriptionSHAP valueLR coefficientSSDa group average (SD)Control group average (SD)Facebook
Avg_post_readabilityAverage post readability, as measured using the SMOGb index0.761−0.2685.6341 (2.74)6.8048 (1.92)
QuantRatio of words within the “quantifiers” category0.4195−0.1890.0012 (0.0012)0.0016 (0.0012)
NegemoRatio of words within the “negative emotions” category0.09530.2440.0043 (0.0035)0.0031 (0.0022)
MoneyRatio of words within the “money” category0.0739−0.2160.0007 (0.001)0.0011 (0.002)
SwearRatio of words within the “swear” category0.06280.2360.0017 (0.0025)0.0007 (0.001)
Ratio_octile8Ratio of activities from 9 PM to midnight0.04430.0770.1443 (0.149)0.1241 (0.158)
Ratio_octile7Ratio of activities from 6 PM to 9 PM0.04090.1770.1561 (0.1745)0.1054 (0.125)
AngerRatio of words within the “anger” category0.00950.1910.0018 (0.002)0.0009 (0.001)
DreamRatio of “dream” within the overall bag of words0.00770.2240.2028 (0.468)0.0746 (0.24)
FunRatio of “fun” within the overall bag of words0.0043−0.2090.5722 (1.19)1.1315 (1.76)Twitter
ConjRatio of words within the “conjunctions” category0.2319−0.0630.0001 (0.0002)0.0003 (0.0004)
AdjRatio of words within the “adjectives” category0.1825−0.050.0057 (0.004)0.0080 (0.005)
Avg_post_negativityAverage post negativity, as calculated using the VADERc library0.15090.0820.071 (0.042)0.0519 (0.036)
MaleRatio of words within the “male” category0.13550.0390.0011 (0.0013)0.0007 (0.001)
Ratio_octile_8Ratio of activities from 9 PM to midnight0.12650.0450.0231 (0.356)0.1227 (0.188)
IngestRatio of words within the “ingest” category0.0627−0.0560.0003 (0.0007)0.0014 (0.0018)
InsightRatio of words within the “insight” category0.05160.0530.0044 (0.004)0.0035 (0.003)
PowerRatio of words within the “power” category0.0308−0.0580.0024 (0.0026)0.0042 (0.0036)
WeRatio of words within the “we” category0.0196−0.0560.0001 (0.0002)0.0002 (0.0004)
PrepRatio of words within the “prepositions” category0.01170.0630.0028 (0.0026)0.0017 (0.0017)Instagram
Avg_post_readabilityAverage post readability, as measured using the SMOG index0.761−0.2035.1018 (1.15)6.2564 (1.638)
SpaceRatio of words within the “space” category0.733−0.1470.0031 (0.0025)0.0042 (0.0025)
AffiliationRatio of words within the “affiliation” category0.6839−0.1810.0032 (0.0027)0.0056 (0.0034)
FriendRatio of words within the “friend” category0.5336−0.1590.0009 (0.0027)0.0018 (0.0034)
FemaleRatio of words within the “female” category0.4576−0.1680.0008 (0.001)0.0019 (0.0023)
SadRatio of words within the “sad” category0.45540.1130.0011 (0.0008)0.0007 (0.0012)
QuantRatio of words within the “quantifier” category0.4195

留言 (0)

沒有登入
gif