FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

Before we discuss the technical advantage of the instance-based ABox approach and its practical implications, we want to emphasize once more that the here discussed limitations of the TBox approach apply in the context of documenting empirical data and metadata in a knowledge graph. There are many other contexts, in which TBoxes can be superior to ABoxes. For instance, when documenting or using invariant knowledge (see 104) and thus universal statements instead of assertional statements, where ABoxes cannot be used. In anatomy, this would relate to the context of canonical anatomy.

When reasoning over your data is important, TBoxes may in some cases also be superior to ABoxes. However, whereas, reasoning has primarily been applied for validating the consistency of class hierarchies and for inferring additional subsumption relationships [85], the need for reasoning over ABoxes has been identified by now and corresponding reasoners such as Arachne [86] are being developed that support reasoning on, e.g., property relationships. Reasoners such as ELK, that are commonly used with TBoxes, use the OWL EL profile, which does not support ABox reasoning very well. Arachne uses the OWL RL profile, which is better suited for instance data. Arachne can, e.g., be used when adding an ABox to a knowledge graph for suggesting additional inferred statements and for checking for consistency in real time―TBox reasoners such as ELK are well suited for tasks like ontology classification and consistency checking of ontology classes, but do not perform well for real-time multi-user online systems focused on ABox graphs, because they do not support axioms like inverse properties, property ranges, and materialization of object property assertions [86]. When having to compare an actual state of a system, as it can be recorded, e.g., via sensors and documented as an ABox, against a target state, which could be an established standard documented as a TBox, you can check the ABox for consistency against the TBox using Arachne.Footnote 13

Due to the tabular architecture of relational databases, TBoxes have an advantage over ABoxes when storing data in a relational database, because assertional statements can be documented as instances of ontology classes that, in turn, follow the EQ or EAV model and provide the description of the actual content in their class axioms. Therefore, one only has to store the URI of the ontology class as a value in a respective table to document the content specified through that class’s axioms.

The choice of whether to use a relational database or a knowledge graph for storing, documenting, and managing research data should be driven by the requirements of your study or project and the competency questions that you derive from your respective user stories. Relational databases are well suited for closed world systems, for which you can specify the data schema before populating your database with data, whereas knowledge graphs are well suited for open world systems and thus systems that assume incomplete knowledge by default, where you can easily extend the data schema on-the-fly. Also, (i) if the query structure is well known and expected to be stable―you know, which questions the dataset has to answer and these questions will not likely change in the future, (ii) if you know that the dataset may grow, but only the same type of data will be added, or (iii) if your dataset is not complex and its data points are not heavily interconnected so that it can be easily represented in the tabular structure of a relational database, relational databases may be superior to knowledge graphs as a technical solution for your data management.

In the following, we discuss the technical difference between semantic phenotypes and phenotype knowledge graphs as examples for the class-based TBox and the instance-based ABox approach and the practical implications of this difference in the context of documenting empirical data and metadata in a knowledge graph.

Decomposing phenotype descriptions into separate observation-based statements

Unlike Semantic Phenotypes, Phenotype Knowledge Graphs can be fragmented in various ways into meaningful subgraphs. As a consequence, they provide significantly more flexibility in what can be done with them. Each subgraph can be organized in its own particular named graph that possesses its own URI (see Fig. 3). Each named graph resource can be associated with a corresponding ontology class that it instantiates. These classes can be defined in a domain reference ontology for anatomy that specifies a semantic data model for anatomy [57]. In this way, one could define an ontology class for each type of descriptive statement relevant for phenotype descriptions. Each class defined this way can be understood to correlate with a specific perceptual question that can only be answered by studying the relevant parts of the given ODU. The respective question thereby functions like a perceptual category that is part of a general phenotype structure concept [8, 47, 49]. Examples for such questions would be: What is the weight of this anatomical structure? What is the length of this anatomical line? What is the volume of this anatomical space? What is the position of this anatomical point? What is the color of this anatomical surface? What is the general shape of this anatomical structure? What is the biological function of this anatomical structure? From which structure did this anatomical structure develop?

Each named graph belonging to a phenotype description refers to the combination of (i) a particular part of the ODU and (ii) a specific perceptual category. Fragmenting a given phenotype description into several such named graphs can be understood as the decomposition of the description into its smallest units of empirical information and thus into a set of particular descriptive statements. As a consequence, any given Phenotype Knowledge Graph can be fragmented into its descriptive statements in the form of subgraphs and these subgraphs can be united again to return the Phenotype Knowledge Graph. This general approach is not restricted to anatomy and can be applied to any empirical data.

The decomposability of Phenotype Knowledge Graphs in particular and of instance-based ABox semantic graphs in general is the most important technical difference compared to Semantic Phenotypes and class-based TBox semantic graphs and has significant consequences that substantially affect various practical aspects.

The explorability of phenotype descriptions

Based on the ontology classes of descriptive named graphs discussed above, one can flexibly define various data views for exploring Phenotype Knowledge Graphs [57]. Each data view is defined in reference to one or more such classes. One data view could, for instance, be defined in reference to the class of weight measurements, whereas another one could comprise all classes that contain measurements in general. Applying the former data view on a given description would result in the union of all subgraphs of the description that contain weight measurement data, whereas the application of the latter would result in the union of all subgraphs containing measurement data in general. The definition of various such data views would significantly improve the possibility to meaningfully navigate semantic graphs of phenotype descriptions without users of respective applications having to write deeply nested SPARQL queries, because only the corresponding named graphs must be identified. This, again, applies in general to all kinds of empirical data that are represented as instance-based ABox semantic graphs.

Unfortunately, Semantic Phenotypes and any other class-based TBox semantic graph cannot be fragmented this way, because the entities, related entities, and qualities that class axioms refer to are anonymous resources and thus cannot be individually referenced and identified (see above). Therefore, Semantic Phenotypes cannot be explored to the same degree as Phenotype Knowledge Graphs.

Linking relevant metadata and supplementary contents to phenotype descriptions

Metadata are statements about statements. In the case of phenotype descriptions, metadata refer to who contributed which parts of the description, based on which evidence, and using which instruments, where and when (see Fig. 4). Modeling statements about statements within OWL/RDF is not trivial and various approaches have been suggested [87]. OWL itself provides the possibility to make statements about statements using standard reification, by specifying the statement about which one wants to make statements through three additional triple statements (i.e., statement_URI subject subject_URI; statement_URI predicate predicate_URI; statement_URI object object_URI). While this may be a practical solution for making statements about a single triple statement, it becomes very impractical if one has to make statements about a subgraph that consists of several triple statements (see example Fig. 4). For such cases, the use of named graphs is a good choice. Moreover, named graphs also outperform other metadata representation models when conducting more complex queries [87].

Fig. 4

Metadata about a Phenotype Knowledge Graph. The Phenotype Knowledge Graph from Fig. 3 with its associated metadata. The Phenotype Knowledge Graph is organized into three subgraphs (I, II, and III), each of which has its own set of metadata statements associated. When compared to Fig. 3, one can identify the three subgraphs in reference to named graphs: subgraph I) refers to the ‘shape named graph’, subgraph II) to the ‘weight measurement named graph’, and subgraph III) to the union of the two ‘parthood named graphs’

Because each descriptive statement belonging to a Phenotype Knowledge Graph is organized in its own particular named graph and this named graph has its own URI, it can be individually referenced for associating relevant metadata information to it, such as on which specimen the observation is based, which microscope has been used or the literature source from which the information in that subgraph of the description has been taken and how reliable that source is [57]. Each such metadata, in its turn, can be documented in its own named graph and thus be clearly separated from the actual description. The combination of a particular descriptive named graph and its associated metadata named graph can be published separately from the whole description as a nano-publication [88,89,90].

Moreover, by referring to the URI of the particular named graph, one can also link natural language descriptions and semantically annotated media contents to each descriptive statement, as well as comments and other annotations. And because each described part, quality, and property possesses its own URI in a Phenotype Knowledge Graph, images can be annotated with regions of interest using these URIs to indicate that they depict a particular part, quality or property, which is not possible with Semantic Phenotypes.

As a consequence, the use of the description named graphs allows for differentially assigning metadata, unstructured natural language texts, and media contents at the level of smallest units of semantically meaningful empirical information contained in a Phenotype Knowledge Graph instead of having to assign them to the description as a whole, and this information can be published as a micro-publication [91]. And again, this is not restricted to the domain of anatomy, but can be applied to all kinds of empirical data that are represented as instance-based ABox semantic graphs.

Unfortunately, Semantic Phenotypes and any other class-based TBox semantic graph cannot be fragmented this way and thus assigning metadata, natural language texts, and media contents at the level of smallest units of empirical information is not that straight forward.

Expandability of phenotype descriptions

It is impossible to describe a given specimen covering all aspects that could be relevant. Like any other description of a particular material entity or process, each phenotype description represents a decomposition that is based on a virtual partition of the ODU into the parts that are relevant for the specific frame of reference applied by the person making the description [92,93,94]. Due to the phenotypic complexity of anatomical entities, which often covers several levels of granularity, ranging from the molecular level to the cellular level and the level of gross anatomy, descriptions of specimens are never complete, irrespective of the applied frame of reference. This applies to Semantic Phenotypes in the same way as to Phenotype Knowledge Graphs. The problem of the incompleteness of phenotype descriptions, however, confronts the Semantic Phenotype approach and the class-based TBox semantic graphs in general with a conceptual dilemma. If a given Semantic Phenotype must be complemented with additional information, resulting in a more detailed representation of the described phenotype, one can choose between:

(1)

Defining a new phenotype class that incorporates all information of the original phenotype class and, additionally, also covers the new information. The new phenotype class then replaces the original class and the new Semantic Phenotype the original Semantic Phenotype. This, however, would not only result in increasingly complex axiom expressions, which become increasingly incomprehensible, but tracking provenance and all relevant metadata across the different versions will be problematic as well, especially since Semantic Phenotypes cannot be easily fragmented.

(2)

Defining a new phenotype class that only covers the additional information. The corresponding Semantic Phenotype would complement the original Semantic Phenotype. This is also problematic since the parts and properties mentioned in the class axiom of the original phenotype class cannot be referenced in the complementing phenotype class, because they are anonymous resources. As a consequence, the complementing Semantic Phenotype will, for instance, describe in more detail one of the parts mentioned in the class axiom of the original phenotype class, but the original and the complementing Semantic Phenotype graphs will not connect due to the anonymity of the described parts.

Phenotype Knowledge Graphs and instance-based ABox semantic graphs in general, on the other hand, can easily be expanded with additional information. Because each described part, property, and quality possesses its own URI, existing descriptions can be easily expanded through nano-publications and their corresponding metadata be tracked independently of the metadata of the original description.

Integrating different frames of reference in a phenotype description

As mentioned above, any given phenotype can be described from different frames of reference, e.g., from a purely spatio-structural, a functional, or a developmental perspective. Each frame of reference will likely virtually partition the underlying ODU in its own particular way. Descriptions of the same phenotype that are based on different frames of reference thus often result in incongruent partitions [94]. As a consequence, the representation of a phenotype through a single phenotype ontology class will make it very difficult to cover all information relevant to the various frames of reference relevant in the life sciences because the corresponding class axiom can only model one of the many possible virtual partitions. In other words, a purely spatio-structural description of a given phenotype must be represented with a different phenotype class then a functional, a developmental, or an evolutionary description of that same phenotype. This would result in a spatio-structural Semantic Phenotype, a functional Semantic Phenotype, a developmental Semantic Phenotype, and an evolutionary Semantic Phenotype, each of which would refer to the same given ODU. Due to the problem of anonymous resources, even if each of these descriptions would refer to the same part in the ODU, the resulting graphs would not connect because this part would be represented as anonymous resources.

With the Phenotype Knowledge Graph approach, on the other hand, any given phenotype can be described in reference to a specific frame of reference and the resulting graph will connect spatio-structural descriptions of a given described part with its functional, developmental, and evolutionary descriptions, because this part possesses its own URI and thus can be referenced in any possible virtual partition of a given ODU. Contrary to the Phenotype Knowledge Graph approach, the Semantic Phenotype approach with its class axioms seems to be not well suited for integrating different frames of reference in a given phenotype description.

The open world assumption and the need for negations and specifications of quantities of parts

No ODU can be comprehensively described across all possible frames of reference, scales, and granularity levels. No semantic representation of an ODU can be exhaustive in that respect. Any ODU possesses a virtually infinite number of possible partitions so that no phenotype description can be considered to cover all of them. This situation is dealt with by the so-called Open World Assumption (OWA). OWA assumes incomplete information by default. A direct consequence of OWA is that the lack of knowledge about a fact does not immediately imply knowledge of the negation of that fact. This means, for instance, that when a description does not state that a particular insect head has cells as its parts, we cannot conclude that the head is not composed of cells.

OWL and description logics-based ontologies adhere to OWA by default, and so do both the Phenotype Knowledge Graph and the Semantic Phenotype approach. In both approaches, when starting to describe an ODU, everything is considered to be possible. This space of possibilities becomes more and more constrained and restricted with the addition of information. Following this notion, phenotype descriptions restrict what is possible [58].

OWA is not problematic for phenotype descriptions per se. It for instance allows reusing and extending phenotype descriptions, adding more information to already existing descriptions whenever necessary. But in some cases, we want to make clear that a given ODU possesses, e.g., only two antennae and lacks an ovipositor―information that cannot be provided by describing only two antennae and not describing any ovipositor. While one could introduce specific properties to model such information as instance-expressions (see Fig. 5, top, and Fig. 6, top), any such model will not be compliant with description logics and could therefore not be reasoned on. Making these expressions machine-actionable would thus require additional efforts. Alternatively, one can describe this type of information with the help of class-expressions and thus TBox expressions, using OWL Manchester Syntax. The observation “insect abdomen lacks an ovipositor” translates to the Manchester expression ‘not ( has part some ovipositor )’ and the observation “insect head has part exactly 3 ocelli” to ‘has component exactly 3 ocellus’.Footnote 14 Both Manchester expressions can be represented as class-based semantic graphs and be used within the Semantic Phenotype approach as well as the Phenotype Knowledge Graph approach (see Fig. 5, bottom, and Fig. 6, bottom).

Fig. 5

Two alternative models for documenting absences using the instance-based ABox semantic graph approach. Within the Phenotype Knowledge Graph approach, the observation “insect abdomen lacks an ovipositor” can be modeled in two alternative ways. Top: Shows a representation of the observation using only ABox expressions. This requires the introduction of the object property has not part any that has an instance as a domain restriction and a class as range restriction. This is not part of the OWL syntax and would require the introduction of additional tools for making it machine-actionable. Bottom: Shows a representation of the observation using a combination of ABox and TBox expressions. The instance of insect abdomen instantiates not only the class insect abdomen but also the class absent ovipositor phenotype, which is characterized as the complement class of the class of entities that have some ovipositor as their part. This description is compliant with description logics and is directly machine-actionable. Purple-bordered box = instance resource; yellow-bordered box with rounded corners = ontology class resource; grey-bordered box with rounded corners = anonymous class; blue-bordered octagon = object property class; labeled arrow = property resource

Fig. 6

Two alternative models for documenting exact counts of parts using the instance-based ABox semantic graph approach. Within the Phenotype Knowledge Graph approach, the observation “insect head has part exactly 3 ocelli” can be modeled in two alternative ways. Top: Shows a representation of the observation using only ABox expressions. This requires modeling the parthood relation as a directed relational quality and as a consequence of that the introduction of an object property towards class that has an instance as a domain restriction and a class as range restriction. Unfortunately, modeling the observation this way is not compliant with description logics and would require the introduction of additional tools for making it machine-actionable. Bottom: Shows a representation of the observation using a combination of ABox and TBox expressions. The instance of insect head instantiates not only the class insect head but also the class exact ocellus count phenotype, which is characterized as a cardinality restriction on a combination of a property and a class. This description is compliant with description logics and is directly machine-actionable. Purple-bordered box = instance resource; yellow-bordered box with rounded corners = ontology class resource; grey-bordered box with rounded corners = anonymous class; grey-bordered box with sharp corners = value; blue-bordered octagon = object property class; labeled arrow = property resource

Demarcating units of description

Another problem with Semantic Phenotypes is whether a given specimen should be described using a single complex Semantic Phenotype or a set of multiple Semantic Phenotypes. Should a phenotype be defined in a single phenotype ontology class or in several such classes? Should the unit of description equal the smallest unit of semantically meaningful empirical information? In the end, it is the question of what is the criterion for demarcating units of description [47]? And again, part of the problem with Semantic Phenotypes and class-based TBox semantic graphs is the anonymity of the resources referenced in their class axioms. If you want to describe a given ODU using several Semantic Phenotypes, the entities, related entities, and qualities mentioned in the axioms of phenotype classes of different Semantic Phenotypes do not relate to each other, although they may actually refer to the same real entities, because they cannot be individually referenced and identified through the information provided by the graph. This is not the case with Phenotype Knowledge Graphs and instance-based ABox semantic graphs in general because each described part and property possesses its own URI and thus can be referred to in several different graphs.

Correcting mistakes in phenotype descriptions

Researchers are human beings, and human beings make mistakes. Therefore, phenotype descriptions should allow for effective ways to correct for mistakes and thereby unambiguously track what information has been changed and ideally document that change in RDF as well. And again, because Semantic Phenotypes cannot be easily fragmented and the particular parts, properties, and qualities referenced in class axioms do not possess their own URIs, explicitly tracking what information has been changed between the original Semantic Phenotype and the corrected version of that Semantic Phenotype, and documenting in RDF all the changes that have been made, is rather difficult to accomplish. Phenotype Knowledge Graphs, in contrast, can be easily corrected for mistakes. Because the descriptive statements of Phenotype Knowledge Graphs are organized into different named graphs, one can easily correct information in one of them and track provenance and relevant metadata for it, as well as document in the metadata all changes that have been made.

Universal usability and reusability of phenotype descriptions

Being able to fragment a Phenotype Knowledge Graph into smaller subgraphs allows using only those parts of the data that are relevant for a given research question, while ignoring all parts that are irrelevant. The differentiation of types of observation and the modelling of respective data into corresponding named graphs allows meaningful fragmentation of data and reuse in various frameworks. While this is in principle also possible with Semantic Phenotypes, the extraction of only the relevant data is not as straightforward.

Generation of phenotype descriptions

As already mentioned above, in order to generate a Semantic Phenotype, the corresponding phenotype must be first defined as an ontology class before the description itself can be generated, which in turn only specifies that a given ODU instantiates that specific class. Technically, the actual phenotype description is contained in the definition of the ontology class.

Defining such phenotype ontology classes is usually conducted using OWL Manchester Syntax, which can become very complex, especially if the underlying phenotype is complex and the description fine-grained. For instance the EQ statement “head color: reddish brown, except for dark brown to black postgena, occiput, vertex; mandibles, maxillary and labial palps yellowish; scape, pedicel, F1 and F2 yellow, subsequent flagellomeres progressively darker” translates to the OWL Manchester Syntax expression (example taken from suppl. Material 2 of [95]):

has part some ( head and ((not ( clypeus )) and (not ( mandible and ((((not ( antenna )) and ( bearer of some red )) and (not ( labial palp ))) and (not ( maxillary palp )))))) and ( has part some ( labial palp and ( bearer of some yellow ))) and ( has part some ( mandible and ( bearer of some yellow ))) and ( has part some ( maxillary palp and ( bearer of some yellow ))) and ( has part some ( occiput and ( bearer of some dark brown ))) and ( has part some ( pedicel and ( bearer of some yellow ))) and ( has part some ( postgena and ( bearer of some dark brown ))) and ( has part some ( scape and ( bearer of some yellow ))) and ( has part some ( vertex and ( bearer of some dark brown ))) and ( has part some ( first flagellomere and ( bearer of some yellow ))) and ( has part some ( second flagellomere and ( bearer of some yellow ))) and ( has part some ( fifth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some sixth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( third flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some fourth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( fourth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some fifth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( sixth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some seventh flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( seventh flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some eighth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( eighth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some ninth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( ninth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some tenth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( eleventh flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some twelfth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( twelfth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some thirteenth flagellomere )))) and ( bearer of some light brown ))) and ( has part some ( thirteenth flagellomere and (( bearer of some color brightness ) and ( increased in magnitude relative to some ( color brightness and ( inheres in some flagellomere 14 )))) and ( bearer of some light brown )))).Footnote 15

Obviously, respective class axioms can consist of many levels of nested expressions organized in parentheses, which many researchers have a hard time to read and comprehend. Also, this method of description is very error-prone due to this nested syntax. Alternatively, such OWL Manchester Syntax based expressions can be restricted to a certain threshold of slots. In Phenoscape, for example, templates are used with three slots. Restricting the descriptions to three slots keeps Semantic Phenotypes from getting too complicated, but also prevents them from being as precise and detailed as possible.

Another problem with the Semantic Phenotype approach becomes apparent when considering morphometric data.Footnote 16 When describing phenotypes based on a set of multiple measurements, the Semantic Phenotype approach would require for every possible combination of measurements the definition of a corresponding phenotype class. With the addition of more quantitative properties, this would result in exponentially increasing numbers of possible phenotype classes. Documenting every type of measurement as a single Semantic Phenotype somewhat mitigates the problem but results in the above-mentioned problem of disconnected information due to anonymous resources.

Whereas the generation of Phenotype Knowledge Graphs does not face these problems, it requires the development of an adequate application that allows researchers describing phenotypes respectively. This application could utilize the hierarchical structure of parthood relations between described parts of a given description to organize its interface. For each description, the partonomy could be visualized as a tree-like structure of described parts. This partonomy could also function as a navigator for selecting a particular described part. Each part, in turn, has its own input form associated with it that allows a detailed description of that part and can be accessed by selecting the part within the partonomy. We are currently developing such an application for the online anatomical data repository Morph‧D‧Base [96] and a functional prototype is available. The interface has been developed in close cooperation with several anatomy-experts from different backgrounds, who served as use-cases during its development. They considerably contributed to it, allowing an intuitive generation of Phenotype Knowledge Graphs. All data is stored in a Jena tuple store and descriptions are organized into several description named graphs as described above. The interface provides a human-readable HTML-version of the description while retaining a machine-actionable and reasoning-capable version that can be accessed through a SPARQL endpoint, thus allowing exploiting semantic technology to its full potential and offering Phenotype Knowledge Graphs as Linked Open Data.

Potential suitability of ABox and TBox semantic graphs for data and metadata standards

In times of eScience, a standard for data and metadata must cover machine-actionability regarding terminological aspects relating to concepts (meaning) and nomenclature (reference) and assertional aspects relating to formats (syntax and file format) and contents (data model) [8, 9, 19, 31] (see Table 1). Moreover, it must also comply with the FAIR Guiding Principles [7, 97,98,99] (see Table 2).

Table 1 Potential suitability of TBox and ABox semantic graphs for meeting eScience-compliant data and metadata standards, using Semantic Phenotypes and Phenotype Knowledge Graphs as examplesTable 2 Potential suitability of TBox and ABox semantic graphs for meeting the FAIR Guiding Principles, using Semantic Phenotypes and Phenotype Knowledge Graphs as examples (criteria taken from [7], criteria for reusability not shown)

An eScience-compliant concept standard requires a machine- and human-readable specification of the meaning of all concepts used in data and metadata statements. The specification provides information about what we know of the corresponding real universal, i.e., the kind. Semantic Phenotypes and Phenotype Knowledge Graphs both comply with this by referencing ontology terms that, in turn, provide unambiguous definitions of meanings for concepts both in human- and machine-readable ways.

The nomenclatural standard requires unambiguous specification of the reference of the words, symbols, and IDs used in data and metadata statements. It provides an unambiguous link between term and concept. Again, Semantic Phenotypes and Phenotype Knowledge Graphs both comply with this standard by using machine-readable persistent URIs in addition to human-readable labels for referring to ontology classes. The link between word, symbol, or ID and its corresponding concept, which in turn provides the meaning, is thus clear and unambiguous. This allows the reuse of ontology terms in any semantic graph without the necessity to include the entire ontology specification. However, only Phenotype Knowledge Graphs provide this standard also for all parts and properties mentioned in the description, which Semantic Phenotypes only reference anonymously.

The combination of concept and nomenclatural standard covers the terminology-related aspects of an eScience-compliant standard and ensures that phenotype descriptions are semantically transparent, allowing even non-experts to understand and interpret them correctly. In addition to these terminology-related aspects, eScience-compliant data and metadata standards must also cover assertions-related aspects, which is covered by a combination of a format and a content standard that ensures that phenotype descriptions are comparable, reusable, computer-parsable, and communicable through the Web.

The format standard requires a machine-readable specification of the syntax and file format to be used when documenting, storing, communicating, and processing data and metadata statements on the Web. Semantic Phenotypes and Phenotype Knowledge Graphs provide this through the possibility to store the respective semantic graphs in OWL files, which can be serialized to RDF. As a consequence, Semantic Phenotypes and Phenotype Knowledge Graphs both provide a basic level of findability, accessibility, and explorability because they can take the form of semantic graphs and any semantic graph can be searched using SPARQL. The query pattern of a SPARQL query is itself represented as a semantic graph that may contain variables and wildcards. The main mechanism of a SPARQL query is matching the query pattern with the semantic graph to be queried. A repository for Semantic Phenotypes or Phenotype Knowledge Graphs stored in a tuple store would allow searching for descriptions of heads of a specific taxonomic group that possess a specific type of antenna and that have a weight larger than 10 mg and retrieve a list of corresponding phenotype descriptions.

Regarding querying semantic graphs, however, it is important to note that querying TBox expressions is more difficult than querying ABox expressions. In case the graph contains class definitions in the form of axioms expressed in OWL, the basic graph-pattern-matching of SPARQL must be defined using entailment regimes [100]. Querying under entailment regimes is more complex and computationally difficult under full expressivity of OWL [101, 102]. As a consequence, querying Phenotype Knowledge Graphs is more straight forward and computationally less difficult than querying Semantic Phenotypes.

In ABox semantic graphs, we can associate a specific content standard for each descriptive named graph class. The content standard specifies the general structure of how to express the corresponding type of empirical information in terms of RDF triples by defining a corresponding semantic graph pattern [

View original article

JOURNAL OF BIOMEDICAL SEMANTICS

分享书签

0 0 0 0 0 0 0

More from this channel

FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example

留言 (0)