Delta.EPI: a probabilistic voting-based enhancer-promoter interaction prediction platform

Three-dimensional (3D) genome architecture has been recognized as an important player in the regulation of nuclear processes, e.g., gene expression and DNA replication (Roy et al., 2011; Li et al., 2012; Zhang et al., 2013). In the last decade, chromosome conformation capture (3C) and its high-throughput variations, e.g., Hi-C (Lieberman-Aiden et al., 2009), ChIA-PET(Handoko et al., 2011; Tang et al., 2015), and capture Hi-C (Mifsud et al., 2015a), have made it possible to survey 3D genome architecture (de Wit and de Laat, 2012).Global 3D genomes in mammals (Lieberman-Aiden et al., 2009; Dixon et al., 2012; Vietri Rudan et al., 2015), fly (Sexton et al., 2012) and yeast (Duan et al., 2010) have been profiled. The genomes were found to be organized into hierarchical, i.e., active and inactive, compartments (A and B, respectively) (Lieberman-Aiden et al., 2009), domain structures, such as "topologically associating domains" (TADs) (Dixon et al., 2012; Nora et al., 2012) and chromatin loop structures (Rao et al., 2014; Tang et al., 2015).

Enhancer-promoter interactions (EPI) regulate gene expression (Schoenfelder and Fraser, 2019; Kyrchanova and Georgiev, 2021). It has long been recognized that enhancers constitute a major group of regulators in the higher Eukaryotes (Schoenfelder and Fraser, 2019; Kyrchanova and Georgiev, 2021). Most enhancers regulate promoters at a distance of tens to hundreds of kilobases. Although not fully understood, it is widely believed that the spatial colocalization of promoters and enhancers through the formation of chromatin loops is one way of ensuring direct interaction between them (Mora et al., 2016). EPIs are specific to cell types or conditions and should be studied in that context.

Computational and experimental methods for predicting or identifying EPIs have appeared in the literature (Zhao et al., 2006), respectively. However, it remains a major challenge to identify EPIs for given genome locus. Experimental technologies, such as 3C, can be labor- and time-consuming, and they can also be limited by materials required for experiments performed in clinical conditions. Instead, computational methods for EPI prediction may provide a practical alternative during the early screening stage of projects.

Computational methods can be largely classified into two groups: integrating multiomics data (Moore et al., 2020), and mining potential chromatin loops from high-throughput 3C data (Zheng et al., 2021). Multiomics data integration methods can identify cell type-specific EPIs, which require the use of high-throughput multiomics datasets available for the same cell/tissue type, e.g., CISD_loop (Zhang et al., 2017). On the other hand, they can define consensus promoter-enhancer contacts across different cell lines/tissues, e.g., Targetfinder (Whalen et al., 2016). Such multiomics data integration methods may not be so easy to apply to noncanonical objects, which, however, may be limited by public data resources. Many computational tools are available in the literature for chromatin loop prediction from Hi-C data, such as HICCUPS (Rao et al., 2014), Homer (Lin et al., 2012; Heinz et al., 2018), FitHiC (Ay et al., 2014; Kaul et al., 2020), cLoops (Cao et al., 2020), HIPPIE (Hwang et al., 2015), diffHiC (Lun and Smyth, 2015), GOTHiC (Mifsud et al., 2017), PSYCHIC (Ron et al., 2017), FastHiC (Xu et al., 2016), HIFI (Cameron et al., 2020), and HICExplorer (Wolff et al., 2020). In these methods, the data from capture Hi-C on promoters (Mifsud et al., 2015b), or ChIA-PET with RNA Pol II (Fullwood et al., 2009; Li et al., 2012), were commonly taken as the gold standard for EPI assessment. Early benchmarking work has also appeared in the literature. For instance, Forcato et al. (2017) compared 6 predictors using the proportion of predicted chromatin loops that anchored with ChromHMM annotated enhancers and TSS. However, a comprehensive benchmarking framework remains to be elucidated, as is an end user-friendly tool for EPI prediction.

In the present work, we developed an easy-to-use web-based EPI prediction pipeline, termed as Delta.EPI. Delta.EPI integrates the results of four preselected state-of-the-art EPI predictors, including HiCCups (Rao et al., 2014), FitHiC (Kaul et al., 2020), Homer (Heinz et al., 2018) and cLoops (Cao et al., 2020). The predicted EPIs were scored and sorted according to our probabilistic model for their relevance to EPI. Ten Hi-C datasets were installed in Delta.EPI as the standard reference dataset representing multiple cell/tissue types in human and mouse. The results can be displayed in Delta, a web-based 3D genome visualization and analysis platform (Tang et al., 2018), which integrates massive genomic tracks for data annotation. Last, we show the utility of Delta.EPI by case studies, indicating that it not only helps to identify most target gene/enhancers from the genome locus of interest, but also helps to systematically survey EPI.

留言 (0)

沒有登入
gif