pVACview: an interactive visualization tool for efficient neoantigen prioritization and selection

pVACview is written in R and is implemented as part of pVACtools, which is a computational toolkit that helps identify and visualize neoantigen candidates [16, 25]. While pVACview can be used as a stand-alone tool (see “Overall architecture of the software implementation”), we recommend using pVACtools to generate the required inputs in order to access the maximum functionality. Code changes are integrated using GitHub pull requests (https://github.com/griffithlab/pVACtools/pulls). Documentation is hosted on Read the Docs (readthedocs.org) and can be viewed at https://pvactools.readthedocs.io/en/latest/pvacview.html.

A demonstration data set is provided and consists of class I and class II neoantigen candidate files generated from the HCC1395 breast cancer cell line and its matched lymphoblastoid cell line HCC1395BL (please refer to data availability section). The tumor and normal datasets were processed using an immunogenomics pipeline written in WDL (immuno.wdl available at https://github.com/wustl-oncology/analysis-wdls). This pipeline accepts raw tumor/normal exome and tumor RNA-seq data in FASTQ or unaligned BAM format and performs alignment, HLA typing, germline variant calling, somatic variant calling, variant phasing, variant annotation, expression analysis, RNA fusion detection, and neoantigen identification. The pipeline also generates the aggregated neoantigen reports and metrics files used as inputs to pVACview. These datasets are available at https://github.com/griffithlab/pVACtools/tree/latestpvactools/tools/pvacview/data.

To acquire pVACtools output (specifically, pVACseq output) for use with pVACview, users can run pVACseq from the command line using variants from their own pipeline (in VCF format), or start with raw sequence data and use an end-to-end pipeline on the cloud by launching our pre-configured workflow on Dockstore (https://dockstore.org/workflows/github.com/griffithlab/analysis-wdls/immuno) via various platforms (e.g., DNAnexus, Terra, eLazi, AnVIL, NHLBI BioData Catalyst). A step-by-step guide for employing the pre-configured immuno workflow to run pVACtools on Terra is available at https://workflow-course.pvactools.org/index.html.

Overall architecture of the software implementation

pVACview has three modules: (1) main, (2) NeoFox, and (3) custom. The main module supports output from pVACseq while the NeoFox and custom modules support exploration of output from other neoantigen prediction tools. A detailed comparison of neoantigen features provided by pVACseq and several of these alternative prediction tools is provided in Additional file 2.

pVACview main module

The pVACview main module is split into the following components: user data upload, neoantigen feature visualization and exploration, and export of prioritized neoantigens and associated annotations for downstream applications (Fig. 3). Below, we step through these components in detail. A screenshot and description of each visual element of pVACview can also be found in Additional file 3.

Fig. 3figure 3

Overview of example workflow for prioritizing neoantigens using pVACview. pVACview can be broken down into three main sections: upload, visualize/explore, and export. When exploring the neoantigen candidates, users are presented with three levels of information: variant, transcript, and peptide. This example workflow guides the user through critical questions that may be considered when prioritizing neoantigen candidates. Each section is organized by the corresponding feature in the pVACview interface

Configuration and data import

Generation of the neoantigen candidate input files requires preprocessing using pVACseq starting from patient samples’ variant information (supplied as a VCF file). pVACseq produces neoantigen candidates with numerous features to be considered during prioritization. Two of pVACseq’s output files, an aggregated candidate file (tsv format) and a metrics file (json format), serve as input files to pVACview (Additional file 3: Fig. S1). The aggregated candidate file contains a list of all variants with summary-level information, including the best predicted neoantigen candidate and its overall prediction score, DNA/RNA depth, variant allele frequencies, gene and allele expression, and more. The metrics json file contains extensive additional variant, transcript, peptide, and individual algorithm-level information that is needed for certain features of the pVACview application. For further details, please refer to the online documentation at pvactools.org.

Users have the option to additionally include a tsv file with supplemental candidate information from a different set or class of HLA alleles. This allows users to view basic median binding information of class II results while looking at detailed class I prediction results or vice versa. For users investigating a specific gene set of their own interest, we provide the option of uploading a tsv file where each line contains an individual gene name (e.g., names of known cancer driver genes). These genes, if found in the aggregate report file, will be highlighted in a green box with bold font in the Gene report column of the visualization interface.

Neoantigen visualization and exploration

Uploaded neoantigen candidates can be explored and analyzed in several different ways. Users are provided with neoantigen features that are organized into three levels of detail: variant-level, transcript-level, and peptide-level (Fig. 3).

Variant-level information is presented in the main aggregate report table, showcasing the best neoantigen candidate for each variant as well as genomic information (e.g., gene identifier, amino acid change, and position of the variant within the core binding peptide), expression level, DNA/RNA variant allele frequency, median binding prediction scores, percentile ranks, and the total number of peptides beyond the best one that meet specified cutoffs (Additional file 3: Fig. S2). Each variant in the main aggregate report table is assigned to an overall tier based on criteria including binding affinity, expression, transcription support level, clonality, and anchor scenario. By default, the variants in this table are ordered based on their assigned tier.

Once a specific variant is selected, users are provided with a variant and gene info box, which provides further information on the exact genomic location and nucleic acid change (Additional file 3: Fig. S3). We have also included a link to the OpenCRAVAT variant report for the respective variant [26]. This report allows users to explore rich variant information including variant effect annotations, associated cancer types, population allele frequencies, clinical relevance, gene annotation, and pathogenicity predictions.

Additionally, users are provided with individual transcripts containing the variant. The selected variant may occur within multiple transcripts, which may result in distinct neoantigen peptide sequences. Peptides that produce good binding predictions against at least one HLA allele are shown in the transcript table (Additional file 3: Fig. S4). The expression level of each transcript is provided as further guidance when selecting the best neoantigen candidate. In some cases, transcript sequence context impacts the peptide sequence surrounding a variant (e.g., nearby exon–intron boundaries as depicted in Fig. 2). Multiple transcripts that give rise to the exact same list of peptide candidates are grouped into a single transcript set and those that give rise to different peptides are grouped into distinct transcript sets.

Upon selecting a specific transcript set, users are provided with a peptide table (Additional file 3: Fig. S5). The peptide table displays all peptide sequences from the selected transcript that are predicted to be good binders (for at least one HLA allele). Both mutant (MT) and wild type (WT) sequences are shown, along with median binding affinities (if the MT score passed the binding threshold), potential problematic positions for manufacturing, and whether non-specificity of the peptide sequence could indicate potential for autoimmunity or central tolerance [23].

By selecting each pair of MT/WT peptides, users can access (1) plots of the individual IC50 binding affinity predictions of the strong binding MT peptides and their corresponding WT, (2) plots of the individual percentile binding affinity predictions, (3) a binding affinity table with numerical IC50 and percentile rank values across algorithms used, and (4) a table of prediction scores from algorithms trained on mass spectrometry elution data (e.g., BigMHC_EL, MHCFlurryEL, NetMHCPanEL) and immunogenicity data (e.g., BigMHC_IM, DeepImmuno) (Additional file 3: Figs. S6, S7, S8, S9). Note that each peptide may have up to 8 binding algorithm scores for class I alleles (with pVACseq version 3.0 or higher) or up to 4 binding algorithm scores for class II alleles. These views facilitate evaluation of algorithm concordance and integration of predictions pertaining to MHC binding, processing, and immunogenicity.

For each peptide, we also provide users with an allele-specific anchor prediction heatmap, based on computational predictions from our previous work [23]. These predictions are normalized probabilities representing the likelihood of each position of the peptide to participate in anchoring to the HLA allele. The top 15 MT/WT peptide pairs per HLA allele from the peptide table are shown with anchor probabilities overlaid as a heatmap. The anchor probabilities shown are both allele and peptide length specific. In the anchor heatmap view, the mutated amino acids are marked in red and MT/WT pairs are separated using a dotted line (Additional file 3: Fig. S10). The probabilities used for determining allele specific anchors sites are provided along with the actual positions that are considered anchors for each allele-peptide length combination (Additional file 3: Fig. S11). Different anchor scenarios are also depicted to guide users during candidate evaluation (Additional file 3: Fig. S12).

To ensure that the candidate is a non-self peptide, users can also check if the sequence of the peptide candidate matches any sequence found in the reference proteome (Additional file 3: Fig. S13). If the user specifies potential problematic amino acids when running pVACseq, candidates with these problematic amino acids will be flagged by a red box in the “Prob Pos” (Problematic Positions) column of the main aggregate report table (Additional file 3: Fig. S14). One example use of this feature is to flag cysteines (C) as problematic and deprioritize peptides containing them to avoid peptide synthesis and stability issues associated with this amino acid [27].

After consulting the breadth of information displayed in pVACview, users can assign an evaluation to each variant by clicking the appropriate evaluation button in the aggregate report view (Additional file 3: Fig. S15). The number of evaluations performed (accept, reject, review) are tracked in the peptide evaluation overview section. Users may also record a comment for each candidate describing, for example, any notable features, concerns, or special criteria considered to determine the selected evaluation.

If a user has uploaded a tsv file with supplemental candidate information, this data can be viewed in the Additional Data tab (Additional file 3: Fig. S16). This data can, for example, be used to prioritize candidates with poor class I binding affinity but otherwise good metrics. Such candidates may have good class II binding and can be rescued.

Export of neoantigen evaluations and final report

When users have either finished evaluating neoantigen candidates or need to pause and would like to save current evaluations, they can export the current main aggregate report using the export page (Additional file 3: Fig. S17). pVACview provides two download file types (tsv and excel). The excel format is user-friendly for downstream visualization and manipulation. However, if the user plans to continue editing the aggregate report and would like to load it back in pVACview with the previous evaluations preloaded, they must use the tsv format. The export feature thus serves as a way to save progress as all evaluations are cleared upon closing or refreshing the pVACview app.

NeoFox moduleData import

pVACview also takes the output of the neoantigen annotation pipeline NeoFox [17] as input. NeoFox output is a tab-separated file, where each row corresponds to one neoantigen candidate. The NeoFox format also optionally supports annotation of each candidate with a patient identifier and gene-level information (gene name, DNA/RNA allele frequencies). The peptide-level information generated by NeoFox is comprehensive and includes scores for ranking peptides based on 16 neoantigen features and prediction algorithms. These features include several that are not otherwise supported by pVACtools directly such as recognition potential, generator rate, PRIME, and HEX [17].

Neoantigen visualization and exploration

pVACview provides three panels for NeoFox data exploration. The first panel “Annotated Neoantigen Candidates using NeoFox” will show all neoantigen candidates and their corresponding information from the input. In the second panel “Data Visualization,” users can select up to 6 information categories of the neoantigens to visualize in the form of violin plots. If the user selects a specific peptide in the first tab, the corresponding values of the peptide will be highlighted in red in the plot(s). The third panel “Dynamic Scatter Plot” gives an overview of characteristics of all candidates in the dataset. Users can choose the variables to plot on the x and y axis, as well as the variable which defines the size of the scatter plot. The variables can be transformed and limited in range, if desired. As the user hovers the cursor over any candidate, all information tied to the candidate will be displayed. With these features, users can quickly and interactively narrow down candidates satisfying criteria of interest. A curated subset of NeoFox scores that we believe are particularly useful and/or complementary to that provided by pVACtools are selected by default in the pVACview NeoFox data exploration module. Users can display additional columns by selecting from the “Column visibility” dropdown.

Similar to the main module, users can select an evaluation for each variant by clicking the desired evaluation button in the annotated neoantigen candidates table. The number of evaluations performed (accept, reject, review) are tracked in the “Peptide Evaluation Overview” section on the top left of the page. Users are also able to leave a comment for the selected variant(s) in the section on the top right of the page.

Export of neoantigen evaluations and final report

The NeoFox module offers the same export functionalities as the pVACview main module. During export, the selected evaluations and comments are saved to a tsv or excel file alongside the original NeoFox data.

Custom moduleData import

Users can also supply pVACview with any tsv file from any neoantigen prediction algorithm or pipeline. The custom module reads each column in the tsv as a feature and further tailors the view based on user’s selected options in the three following drop-down menus. (1) “Group peptides by” will group peptides together by a user-selected feature. For example, grouping by variant would consolidate all candidate peptides derived from a common variant. (2) “Sort peptides by” will order the candidate peptides by a user-selected feature. For example, a user might order peptides by binding score. (3) "Features to display for each group of peptides" is used to select which features in the dataset will be included in the detailed data section. By default, all features, with exception of the features chosen to group and sort peptide by, will be included. To demonstrate the custom input module, we provide users with example results from other neoantigen prediction pipelines: vaxrank [24], NeoPredPipe [28], and antigen.garnish [29].

Neoantigen visualization and exploration

The custom module of pVACview offers three panels for data visualization. The first panel “Overview of Neoantigen Features” displays groups of peptides. For each group, a single representative peptide will be shown. To see and compare the representative peptide with other peptides in the same group, users can click “Investigate” and see all peptides in the second panel—“Detailed Data.” In this second panel, the peptides in the group by default will be sorted by the user-selected feature. The third panel “Dynamic Scatter Plot” allows users to quickly and interactively narrow down candidates satisfying criteria of interest (as described in the “NeoFox module” section above).

Overall, pVACview provides a complex interactive interface to explore many neoantigen features and prioritize neoantigen candidates. A comprehensive analysis of the biological rationale and relative importance of individual features is beyond the scope of this report but several reviews and detailed guidelines have been published [15]. In addition, we provide a list of suggested features and a brief description of their use in candidate prioritization in Table 1. More extensive discussion of many of these features is provided in instructional videos and a comprehensive vignette available in the online documentation (see Availability of data and materials).

Table 1 Summary of pVACview features that facilitate neoantigen prioritization

留言 (0)

沒有登入
gif