ParTRE: A relational triple extraction model of complicated entities and imbalanced relations in Parkinson’s disease

Parkinson’s disease (PD), characterized by static tremors and slow movements, is the world’s second most common progressive neurodegenerative disease, afflicting over 10 million people worldwide [1]. The diagnosis of PD [2] is challenging, especially in the differential diagnosis of Parkinsonism and early PD detection. The knowledge graph organizes diverse text information in PD, which benefits PD prediagnosis and facilitates medical intelligence. Relational triple extraction (RTE), as a part of information extraction, is a critical procedure for the construction of medical knowledge graphs [3], which undoubtedly attracts lots of attention. The task of RTE in PD is to extract entity-relation triples from unstructured medical texts. These relational triples store knowledge in the form of < subject, relation, object >. For example, a triple < Compound levodopa, treat, Parkinson > expresses the knowledge that “Compound levodopa can be used to treat Parkinson”.

Early RTE task, which adopted the pipeline method [4], [5], regarded entity extraction and relationship judgment as two independent processes to train each model respectively. Given the error propagation and information interaction of entity extraction and relationship judgment, an increasing number of novel joint learning methods of triples had been proposed [6], [7], [8], [9], [10], [11], [12]. However, most existing models could not effectively deal with the case where a sentence contains multiple overlapping entity pairs with patterns of SingleEntityOverlap(SEO) and EntityPairOverlap(EPO), illustrated in Table 1. Later, Zeng et al. [12] presented an end-to-end model of sequence-to-sequence learning based on the replication mechanism to extract triples. Some further attempts were triple extraction based on reinforcement learning [13] and joint entity and relation extraction based on relation-weighted GCN [14]. Apart from the above approaches, Wei et al. [15] identified relationships and object entities based on identified subjects, which solved the overlapping triple problem very well. Error transmission and relation redundancy were challenges to Wei et al.’s study. Subsequently, Ren et al. [16] and Zheng et al. [17] presented a bidirectional extraction framework and relationship judgment component, respectively, to improve this situation. Another concern in tripe extraction was the class imbalance problem. Most entity pairs in medical texts had very few relationships, which led to the bias of relationship judgment to the main class. Then, resampling-based methods [18] and reweighting-based methods [19], [20], [21] were used to address the class imbalance problem. In the latest research, researchers used prompt-based approaches [22], [23], [24], [25], [26], [27] for the RTE task, which were leading a revolution in natural language processing (NLP).

However, blamed on complicated entities of PD, the aforementioned models cannot give satisfactory results. Besides, rare PD corpus is available in spite of the dramatically increased medical texts such as clinical cases, journals, and literature. To tackle the above problems, we propose a three-stage joint learning model for the extraction of entities and relations in PD, named ParTRE. First, inspired by the approach proposed by Ren et al. [16] and Lee et al. [28], a conditional normalization layer is added. Then, the relation detection module that can handle class imbalance issues is developed to determine the type of relationship between entity pairs. The major contributions of this paper are as follows.

(1) We propose a three-stage relational triple extraction model (ParTRE) whose performance has been demonstrated by ParRE and two public datasets.

(2) A loss function strategy based on focal loss is proposed to solve the class imbalance issue.

(3) A Parkinson’s corpus (namely ParRE) with manual annotation is built in the current study.

(4) Extensive experiments on ParRE indicate that our approach can address overlapping triples and class imbalance effectively.

留言 (0)

沒有登入
gif