The biomedical corpus for analysis consisted of full text articles from the Pubmed Central Open Access Subset (N = 4,993,411) and abstracts from leading neuroscience journals (N = 253,022). Of the total corpus, 4,375 expressions from 897 peer-reviewed journals were found that matched expressions of the form ‘the brain is …’. Due to the disproportionate representation of large open-access journals in the corpus, journals such as PLoS ONE (N = 208), International Journal of Molecular Sciences (N = 187), Frontiers in Neuroscience (N = 172), Psychology (N = 130), Human Neuroscience (N = 105), and Scientific Reports (N = 70) constituted the largest share of matched expressions. Articles from the current decade ( > = 2020) constituted the largest share of matched expressions (~ 56%). Of the 2,204 (51%) matched expressions identified from sections of text with ‘standard’ titles (e.g., titles containing ‘Abstract’, ‘Introduction’, ‘Results’, ‘Discussion’, ‘Methods’, etc.), 54% were found in introduction sections, 22% were found in discussions, and 16% were found in abstracts.
Following extraction, the text in each matched expression was converted to a vector space via a pretrained embedding model (Deka et al., 2022). To identify commonly used expressions, we applied dimensionality reduction to the embedding space to obtain two dimensions using UMAP (McInnes et al., 2020) and clustered the phrases using a hierarchical density-based clustering algorithm (HDBSCAN; Campello et al., 2013). Of the 26 clusters extracted by the HDBSCAN algorithm (see Methods and Materials), 21 were found to constitute semantically coherent groups of expressions – i.e., groups of expressions that express similar meaning. Two clusters (18 & 19) were found to contain similar meanings and were merged, leaving a total of 20 semantically coherent clusters. Labels for each cluster were generated from manual inspection. In terms of overall organization of the semantic embedding space, the phrases were organized along a single dominant dimension, such that abstract/metaphorical phrases were concentrated in the top-left, to more cellular/biochemical phrases in the bottom-right (Fig. 1).
Fig. 1‘The brain is…’ Expressions in a Two-Dimensional Embedding Space. Expressions matching the form ‘The brain is…’ embedded into a two-dimensional space via a dimension-reduction (UMAP) applied to their semantic embeddings. The distance between points in this space reflect the semantic similarity between the expressions – i.e., expressions (points) in this space that are closer together reflect similar meanings. Expressions are color-coded according to their cluster assignment from the HDBSCAN clustering algorithm. Each semantically coherent cluster (N = 21) is labeled by a manual interpretation of the expressions in the cluster
Commonly used noun phrases following the phrase ‘The brain is…’ span multiple levels of organization, from the biochemical (e.g., ‘a lipid rich organ’) to the structural (e.g., ‘a heterogenous organ’); as well as different types of expressions, including metaphors (e.g., ‘a prediction machine’ and ‘a control system’). Of the 20 semantically coherent clusters, the top clusters were (in descending order), the brain is ‘an energy demanding organ’ (N = 461), ‘a complex, dynamic system’ (N = 301), ‘a complex network’ (N = 277), ‘a heterogenous organ (N = 257), and ‘a control system’ (N = 160).
The occurrence of each expression was unequally distributed across journals (Supplementary Table 1), with expressions clustering into journals with distinct disciplinary concentrations. Some of these were predictable – e.g., biochemical expressions, such as ‘cholesterol-‘ and ‘lipid-rich’ tended to appear more frequently in journals with a focus in biochemistry and molecular biology (e.g. Oxidative Medicine and Cellular Longevity, Molecules), and mentions of the brain as a ‘common site of metastasis’ appear more frequently in oncology journals (e.g. Cancers, Frontiers in Oncology).
Expressions of metaphors or conceptual analogies tended to appear more frequently in psychology and human neuroscience journals. For example, most of the expressions found in the journal Frontiers in Psychology were the brain as a ‘prediction machine’ or ‘computer/information processor’. The expressions of the brain as a ‘non-linear’ or ‘complex, dynamic’ system and ‘complex network’ tended to occur in human neuroscience/neuroimaging journals, including Human Brain Mapping, Frontiers in Human Neuroscience, Neuroimage, Brain and Behavior, and Network Neuroscience.
A further question is whether the expressions attributed to the brain are unique to that organ or appear in reference to other bodily organs. Using the same methodology, we separately embedded and clustered expressions where the subject was the lungs, kidneys, pancreas, heart, stomach, or liver. Interestingly, we found that all organs surveyed, except for the stomach, contained expressions of the form the organ is a ‘complex’ or ‘dynamic’ organ. For example, the heart is variously referred to as both a ‘complex’ and a ‘dynamic’ organ in the biomedical literature (Supplementary Fig. 1). In addition, expressions involving the frequency (or infrequent) occurrence of metastases were found across all organs. Other overlapping expressions between the brain and other organs, included the kidneys and heart as an ‘energy-demanding organ’ and the liver as a ‘heterogenous’ and ‘immune-privileged’ organ. However, the number and diversity of analogical expressions involving the brain was distinct among all organs surveyed. For example, metaphors such as the brain is a ‘prediction machine’, ‘information processor’, and a ‘control system’ were unique to the brain.
留言 (0)