Ten challenges and opportunities in computational immuno-oncology

Clinical trial design

The design of IO trials is critical to determining the best strategy for assessing the safety and efficacy of new therapies in patients. Given the complexity of interactions between the immune system and cancer, advanced computational and analytical methodologies have become indispensable in accelerating drug development and optimizing trial parameters, from preclinical studies to phase IV post-marketing studies.

The optimization of the dose and timing of candidate IO drugs is a crucial task in trial design. Quantitative systems pharmacology models, along with pharmacokinetic (PK) and pharmacodynamic and spatial-temporal models, are used to simulate and predict patient responses.11 12 These methods are also increasingly used in the preclinical phase and to inform regulatory interactions in clinical research.13–15 To optimize efficacy and minimize severe irAEs in trials, computational algorithms have been employed in trial design to simulate scenarios across combinations of different treatment modalities,16 17 where the proper assessment of immunogenicity endpoints is increasingly recognized as crucial.18 19 Classical phase I dose-escalation trials, which generally assume that the highest dose meeting safety criteria will consequently be the most effective, and that a lower dose is inherently safer than a higher dose, have proven inadequate in the IO drug setting due to the non-linear relationship between dose and efficacy or toxicity.20 New models that account for and continually reassess the safety of both long-term and short-term irAEs are essential to identify the safest and most effective dose. These models are multivariate in nature, incorporating measures of immune response and tumor activity.20

Cost-effective and expeditious phase II and phase I/II trial designs have become a priority, in part due to the need for receiving US Food and Drug Administration (FDA) rapid approval.21–23 An optimal statistical design that employs early stopping points for efficacy and futility,24 and that aims to obtain response profiles under both frequentist and Bayesian frameworks, is desirable.25 26 Adaptive randomization schemes, which can reweigh treatment allocations, are now being employed in the phase II setting for IO.26 Computational tumor dynamic models are also increasingly being applied to support trial readouts, go/no-go decision-making, and regulatory interactions during early clinical developments.27 28 For large-scale phase III randomized trials, the design must consider delays in effects relative to progression-free survival (PFS) and overall survival (OS). Modeling delayed PFS and OS beyond the median, or employing time-varying treatment effect estimates, may be appropriate when modeling IO drug outcomes.29 Precision medicine approaches, such as large basket trials that leverage immune profile testing and other companion diagnostics to assign agents to subjects, are increasingly important.30 This has been demonstrated in the iMATCH trial, which uses biological mechanisms of resistance to categorize tumors,31 32 and the MyPathway phase IIa trial, which uses tumor mutational burden (TMB) as a predictor for the response to atezolizumab.33

A highly impactful component of phase II and III trials is the correlative studies that are crucial for elucidating the biological mechanisms driving treatment responses and resistance.34 35 These studies, supported by comprehensive data repositories like the Genomic Data Commons,36 Imaging Data Commons,37 and the National Clinical Trials Network (NCTN) and the NCI Community Oncology Research Program (NCORP)Data Archive,38 contribute to a robust understanding of the tumor-immune microenvironment. While profiling can be performed in all of these trials, neoadjuvant designs provide surgical specimens that are most amenable to high-throughput profiling studies to discover cellular drivers and pathways of therapeutic response in patients. Profiling studies are accompanied by the development of computational and mathematical algorithms, as well as multifaceted bioinformatics tools, to inform drug repurposing and development of future novel combination trials.39 The National Cancer Institute (NCI) provides sources of clinical trial biospecimens and annotated clinical data that researchers may request through the NCTN Navigator resource40 for novel assay development and biomarker discovery. In addition, the NCI Informatics Technology for Cancer Research program (ITCR) supports new methodology and software developments to enable wide adoption of these new methods throughout the IO research community.41

Throughout the entire drug development pipeline, from preclinical studies to phase III/IV trials, computational methodologies play a central role in accelerating medicines to market.42 Early phase target discoveries in IO require accurate modeling of the complex interactions among tumor, immune, and stroma cells. High-dimensional data integration from genomics, proteomics, and transcriptomics43 can identify targets however suffer from data heterogeneity and a lack of standardized protocols for target validation. Molecular docking44 has been used to predict binding affinities and interaction sites for compound selection, however, in silico models often struggle to predict off-target effects and the overall PK profile of lead compounds, which can lead to failures in later stages of development. Similarly, AI-driven drug repurposing45 46 approaches are limited by the dynamics of cell–cell interactions, which are often not well-characterized in existing drug databases. Combining virtual chemical screening with machine learning (ML) techniques can better capture the complexity of ligand-receptor interactions and enhance prediction.42

Adverse events

Patients undergoing IO treatment are vulnerable to developing irAE because the treatment itself can compromise the immune system, making it harder for the body to fight off infections or inflammation. Common irAEs include cytokine release syndrome, neurotoxicity, pneumonitis, and rash, among others. These conditions increase the disease burden and, in rare cases, can lead to fatality.47 For example, immunotherapy-induced pneumonitis48 is a type of lung inflammation that can cause symptoms such as coughing, shortness of breath, and chest pain; if untreated, it can lead to serious complications like respiratory failure. Hyperprogressive disease49 (HPD) is another phenomenon, wherein cancer grows at an accelerated rate after the initiation of IO treatment, leading to worse outcomes and reduced survival rates. The incidence of HPD varies between different tumor histologies and treatments. These adverse events limit the effectiveness of new IO treatments, as treatment may be discontinued due to a severe irAE.

Computational strategies to predict and monitor irAEs have become a priority in IO. Forecasting irAE incidences may help doctors adjust treatment plans to potentially prevent the worsening of the disease or design alternative therapies for patients. Radiomics from CT scans has been used to identify IO-induced pneumonitis50–52, and to predict treatment response and pneumo-toxicity from programmed cell death-1 pathway inhibition in patients with non-small cell lung cancer.52 Radiomics can also distinguish between radiation- and IO-induced pneumonitis.53 54 Identifying HPD is critical because it can influence the decision to continue or discontinue immunotherapy treatment, as well as the choice of alternative therapies that may be more effective for the patient. By analyzing radiologic images, AI algorithms have been demonstrated for predicting the risk of irAE such as pneumonitis and HPD.55 This information can assist healthcare professionals in making more informed decisions about treatment plans, thereby improving patient outcomes and safety. Integrating clinical and radiomic parameters from baseline, pretreatment CT scans could facilitate the identification of HPD in patients with lung cancer being treated with immunotherapy.56 The combination of radiomic parameters representing texture patterns of lung nodules and features related to vessel tortuosity of a nodule could distinguish between responders, non-responders, and hyperprogressors in immunotherapy.55

In addition to CT scans, recent studies have reported a correlation between specific tumor microenvironment (TME) characteristics, such as TIL density and irAE, with evidence suggesting organ-specific co-occurrence of toxicity.57 58 Notably, predictive modeling has shown promise in anticipating drug combinations that minimize the risk of irAEs.57 However, accurately predicting and understanding the incidence of irAE remains challenging due to the complex nature of human immune responses, which vary significantly according to individual genetic backgrounds and environmental factors present across different hospital settings. Enhancing computational algorithms, as well as acquiring training data that encompasses diverse patient demographics and clinical contexts, should be priorities for the field. In particular, in the design of effective treatment strategies while minimizing the risk of irAE, developing interpretable models that provide insights into the underlying biological mechanisms driving irAE occurrence should be carefully considered.

Cancer disparities

Disparities in IO are a significant concern to equitable and effective cancer care and can lead to differences in trial accrual, treatment outcomes, and survival rates.59 These disparities are influenced by a variety of factors such as socioeconomic status, geographic location, race, ethnicity, sex, and underlying health conditions. To reduce health inequities and ensure that the benefits of IO treatments are accessible to all patient populations, it is essential to understand and address these factors. Here, we discuss two main aspects: the disparities in molecular profiling and trial access, and how compIO strategies may help overcome these barriers.

Disparities can lead to unequal access to healthcare, resulting in higher rates of disease and mortality in certain populations. This is also true for risk calculators developed to identify individual patient risk. For example, a report in JAMA Oncology indicated that the Oncotype DX assay, a 21-gene expression test designed to help identify which estrogen receptor positive, patients with lymph node negative breast cancer would benefit from adjuvant chemotherapy, was not prognostic when used in black women.60 AI could help alleviate the issue of health disparities, especially in oncology, by providing more accurate and personalized diagnoses and treatments. However, to achieve this, an intentional focus is needed to ensure biases are not incorporated into AI algorithms.61 AI could also assist in identifying potential morphologic and molecular differences between populations, allowing for the creation of more specific and accurate population-tailored risk prediction models. For instance, AI has shown promise in analyzing digital pathology images to identify morphologic differences in the appearance of prostate cancer between black and white patients, which in turn could aid in creating more population-tailored models that are more accurate in predicting post-surgical recurrence risk compared with population-agnostic models.62 Such approaches could help reduce variability in cancer diagnosis and treatment, ensuring that all patients receive the best care possible. Additionally, AI can be used to develop targeted interventions to improve cancer outcomes in populations disproportionately affected by health disparities.

Financial toxicity refers to the economic burden experienced by patients with cancer due to the high cost of cancer treatment,63 which can lead to financial distress, psychological stress, and potentially compromise the quality of life for patients and their families. AI can play a crucial role in mitigating financial toxicity by analyzing routine data to identify patients who are likely to benefit from specific therapies. For example, AI could help predict patients who will, or will not, respond to ICI, which may cost nearly US$200K per patient per year but only work in 20–25% of cases.64 By analyzing routine CT or pathology images, AI can guide the use of these drugs through patient stratification towards those with a higher likelihood of clinical benefit, thereby enhancing treatment efficacy and reducing unnecessary financial strain.65 66

Addressing disparities in IO requires a multifaceted approach that includes policy changes, community engagement, and tailored healthcare strategies. Future efforts should focus on generating more data from molecular profiling and improving clinical trials accrual for under-represented populations, implementing education and outreach programs to increase awareness of IO treatments, building trust between doctors and the community, and developing policies that promote health equity. Research should also continue to explore the underlying causes of these disparities, with a particular focus on integrating social determinants of health into the design and implementation of IO therapies.

Data integration

As discussed above, addressing computational and clinical challenges in IO requires multimodality data and proper strategies to identify meaningful signals without introducing bias. Multistudy and multimodal data integration is a pivotal aspect of compIO across diverse research domains. Careful integration of data from preclinical and clinical IO studies can enable (1) cross-trial correlative analyses for reverse translation and biological discovery, and (2) cross-assay studies that use multimodal data types for enhanced inference, which can guide biomarker development strategies.67 Within these realms, multiple foundational studies have not only demonstrated the promise of data integration for IO discovery but have also exemplified the ongoing challenges.

Data integration across clinical trials is necessary because IO trials and the associated correlative analyses from trials and concomitant model systems are typically too small for biomarker identification, particularly with datasets that contain errors or inconsistencies. These errors can arise from a variety of factors, such as measurement errors, data entry errors, equipment malfunctions, or environmental factors. Noise in data can also be introduced by variability in sample collection and processing methods across different clinical sites, including sample handling, storage conditions, and preparation protocols. Additionally, differences in patient demographics, disease characteristics, and treatment regimens among clinical trials contribute to data heterogeneity. Thoughtful data integration across clinical trials helps advance knowledge so that further studies are statistically powered to identify underlying molecular commonalities and differences between responding and non-responding patient populations. Such data reuse68 requires careful statistical and analytic approaches, including conditioning on treatment regimens, harmonized clinical data elements and outcome associations, and standardization of molecular analysis pipelines. However, when done carefully, such analyses can reveal novel insights, especially in the context of ICI, which can induce durable treatment responses across different cancer histologies. For example, building on an initial cross-trial ICI molecular study of 249 patients across 5 histologies, a follow-up analysis69 of over 1,000 samples from patients with solid tumors, including samples from multiple trials and contexts, revealed the role of different mutational signatures and candidate molecular targets that may be relevant to all tumors treated with ICI.

The aforementioned studies also highlight the difficulties that may be encountered when integrating data across trials. One of the major limitations in combining multiple studies is the harmonization of clinical data to enable patient selection and research discovery at scale following Common Data Model and FAIR (Findable, Accessible, Interoperable, Reusable) principles.70 Variation in data resources such as electronic health records (EHR), pharmacy data, and patient registries, as well as nomenclature and annotations in healthcare systems can contribute to differences in data content and format.71 Natural language processing and deep-learning techniques72 such as large language models73 74 can be leveraged to reduce such heterogeneity and enhance information extraction and standardization. Furthermore, it has been shown that even with thoughtful data integration, statistical power constraints cannot be overcome for cross-histology analysis, despite that the underlying data are properly harmonized.75 Similarly, it has been demonstrated that after overcoming major challenges regarding harmonization through extensive computational techniques (eg, correction of batch effects stemming from numerous clinical and technical factors), no previously reported signatures of ICI response or those derived in the study could be validated across contexts.76 While this demonstrates some of the challenges of cross-trial comparisons, it also highlights the critical need to validate data from one cohort in other independent and rigorously vetted cohorts with uniformly processed data. Companion findings in neoantigen discovery highlight similar standardization and reproducibility challenges.77 Taken together, numerous ongoing efforts to harmonize the increasing number of cross-trial cohorts for uniform analysis and interpretation are critical to increase predictive power and minimize false positive signals within patient data in IO.

Similarly, computational innovations have enabled significant advances in understanding immune responses to cancer through the integration of data from different assays, such as gene expression profiling, flow cytometry, and imaging. For example, a combination of single-cell and spatial analysis was used to discover myeloid cell-attracting hubs at the tumor-luminal interface in mismatch repair deficient tumors.78 Transfer learning approaches can be used for annotating descriptions of immune cell function across reference atlases, which facilitates the standardized annotation of single-cell datasets,79 and this need extends to the tracking of lymphoid cell state changes in the periphery across ICI studies.80 Likewise, advances in deep learning (DL) are enabling the multimodal fusion of data types across many domains simultaneously, such as genomics, transcriptomics, and pathology images,81 82 and these efforts may yield novel IO insights. However, similar to cross-trial considerations, innovations in statistical paradigms are critical to maximize the potential of multimodal high-dimensional data generated from the same specimens for inference. Furthermore, data heterogeneity and standardization issues must be addressed to ensure that data can be easily shared and integrated across different platforms.

Taken together, novel data integration and inference strategies may unlock generalizable insights for IO research programs. However, significant investments are necessary to harmonize data types across contexts as well as to innovate new methods for learning from complex, yet complementary, data types.

Artificial intelligence

With the exponential growth in data volume and the complexity of cancer-immune biology, AI has transformed the methodologies we use to conduct research. As discussed in preceding sections, AI offers immense potential to enhance clinical trial design, improve data integration and fusion, and predict irAEs as well as therapy responses in IO.83 However, alongside these promising advancements, it is important to approach AI strategies with a balanced focus on computational excellence, ethical considerations, and clinical relevance.

Traditionally, FDA-approved tissue clinical biomarkers for solid tumors, such as programmed death-ligand 1, microsatellite instability, and TMB, have been routinely used for patient stratification in ICI treatment.84 Transcriptomic-based biomarkers have also been widely recognized for predicting ICI response, with available tests like NanoString’s tumor inflammation signature (TIS) gene expression assay measuring suppressed adaptive immunity. Similarly, Tempus’ Immune Profile Score (IPS) algorithmic test uses an AI-driven algorithm to integrate DNA and RNA data, providing a high/low IPS score that predicts response to ICI. Despite these advances, the accuracy of existing biomarkers is often limited, and their performance varies by cancer type and study cohort. Techniques such as dynamical network biomarkers (DNBs)85 86 represents a promising approach by modeling multimodal and/or longitudinal data.85 86 Unlike traditional biomarkers, which are often static and measured at a single time point, DNBs focus on capturing the dynamic changes in biological networks over time, potentially leading to improved reliability and reproducibility of biomarkers. For instance, DNB models have been shown to determine a tipping point between immune control and immune evasion that can explain differential response in patients, and the distance of a patient from the tipping point may determine survival outcome.87

Radiology and pathology are areas where AI has demonstrated considerable progress in IO. AI algorithms have been used to analyze radiologic images and identify tumor characteristics, such as size, shape, and texture, and assess treatment responses. Radiomics, a technique that extracts quantitative features from images, has shown promise in predicting the response to immunotherapy in patients with melanoma.88 The study found that radiomic features, such as tumor heterogeneity, entropy, and contrast, were significantly associated with clinical outcomes. In pathology, AI-driven analysis of histopathological images has enabled the identification of immune cells including location and density, predicting the response to immunotherapy. For example, one study used neural network to analyze histopathological images from patients with melanoma.89 The study found that DL could accurately predict the response to immunotherapy based on the density and location of immune cells in the TME. AI algorithms can analyze large-scale genomic data, such as DNA sequencing data, to identify genetic mutations that may affect the response to immunotherapy. For instance, a study employed a convolutional neural network to combine genomic and clinical features to stratify patients with non-small cell lung cancer according to their response to immunotherapies.90

In recent years, the development of foundation models has catalyzed transformative breakthroughs across omics, imaging, and EHR in cancer research.91 Using histopathology images and annotations sourced from social media and public databases, researchers have pioneered generative AI capable of identifying and annotating tumor, immune, and other cell types to assist with pathology education and diagnosis.92 LMMs have also demonstrated utility in significantly enhancing knowledge accessibility by simplifying legal jargon into easily-understandable formats for medical patient consent forms.93 Furthermore, the development of CONCH94 and UNI,95 pathology generative AI models trained on histopathology slides from academic medical centers and industry collaborators, has demonstrated high accuracy in recognizing cell types and generating pathology reports that resemble expert opinions.

AI has made remarkable advancements in clinical imaging however the progress in genomic, transcriptomic, and spatial data has been slower. This discrepancy arises from the inherent biological complexity, technical noise, different platforms, and smaller cohort sizes of omics data, which hinders model training and validation. Collaborative efforts to create large, standardized multiomics datasets with harmonized clinical annotation, such as Immuno-Oncology Translational Network (IOTN), Cancer Immunoprevention Network, and Human Tumor Atlas Network (HTAN), are essential to accelerate AI advancements in these areas. Simultaneously, the rapid evolution of AI in healthcare faces regulatory and policy challenges that have not kept pace with technological advancements. The practical integration of AI into clinical practice remains uncertain due, in part, to regulatory and accountability concerns. From a methodology perspective, ethical, representation, and technological biases are inherent from the data collection stage, and persist throughout model development and ultimately to downstream decision-making and applications. Strategies that make such biases more visible, accountable, and quantifiable should be prioritized, along with the development of AI solutions capable of effectively correcting for bias and validation of their performance in real-world datasets.

Spatial biology

Moving beyond predictive biomarkers, it is important to understand the underlying biology that drives correlative patterns and causal relationships. Understanding why patients respond differently to treatments, why some develop irAEs while others do not, and how we can design new clinical trials to overcome resistance are all pressing issues in advancing IO. High-resolution, high-throughput spatial technologies offer a unique advantage in dissecting the tumor-immune ecosystem, systems immunity, and host-environmental interactions, ultimately linking these insights back to clinical outcomes.

With co-detection by indexing (CODEX),96 tissue-based cyclic immunofluorescence microscopy,97 and multiplexed ion beam imaging, researchers can now scale the number of traditionally measured biomarkers from 10 or fewer to upwards of 40–100 antibodies simultaneously for a single FFPE (formalin-fixed, paraffin-embedded) slide of tumor tissue. This expansion in data capturing capabilities is critical for in-depth investigations of cell–cell communications within and between distinct tumor niches. Notably, studies have used these technologies to identify unique subsets of cell populations associated with patients’ responses to ICIs and CAR T-cell therapy.98–100 On account of further achievements, single-molecule localization imaging by photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) can now provide intracellular levels of resolution for target transcripts, proteins, or metabolites of interest, as well as spatial data on subcellular organelles such as lysosomes and extracellular vesicles, which play important roles in cancer immunoediting and immunotherapy.

On a systems level, recent advances in computational reconstruction of tissues from sequentially sectioned H&E (hematoxylin and eosin)-stained images have revealed tumor-lymphocyte interfaces that contribute to cancer progression.101 Because FFPE tissues are commonly stored by biorepositories, there exists a vast resource that can be used for high-resolution spatial analyses, such as CODEX,96 Visium, GeoMX, CosMX,102 and other MICCCS (Multiplex Immuno-Cohort Characterization in Cancer Studies) technologies. These technologies are capable of analyzing multiple protein and RNA biomarkers in tissue samples simultaneously, thus providing deeper insights into the complexity of tumor biology. There is a growing need to develop computational tools that are accurate and scalable for the analysis of such spatial imaging data. Current approaches are designed to accommodate specific assays with potential to scale, such as STARmap,103 HistomicsML,104 and others. Of note, super-resolution spatial gene expression prediction from histopathology images105 and three-dimensional spatial transcriptomics106 107 have become a reality with the integration of spatial omics and sequential sectioned H&E slides, leveraging AI and the unique morphology of cells in tumor.

Omics technologies that augment the dimensionality of tissue imaging have enabled spatially resolved, large-scale profiling of the human transcriptome, proteome, and metabolome. For example, advances in digital spatial profiling have provided new insights into the architecture of tertiary lymphoid structures and responses to immunotherapies in melanoma.108 109 Recent studies on cancer evolution have demonstrated that single-cell copy number alterations, inferred from 10× Visium spatial transcriptomics, can be used to detect early events that arise in normal tissues before malignancy110 and intratumor heterogeneity in immunotherapy response.111 The broader application of mass spectrometry (MS) imaging, such as matrix-assisted laser desorption/ionization, in IO studies continues to grow and is starting to uncover the underlying tumor metabolism-driven resistance mechanisms to ICIs. Multiple NCI-supported spatial and imaging initiatives, including those of the Human BioMolecular Atlas Program and the HTAN, are starting to untangle the spatiotemporal characteristics of the TME. Such studies have enabled the multifaceted discovery of spatial biomarkers and computational algorithms driven by ML.80 112

While spatial omics and imaging technologies offer invaluable biological insights, their clinical scalability is hindered by cost considerations. In contrast, H&E staining is a routine and readily available process in clinical practice, applicable to almost all patients without additional processing. However, H&E staining has its limitations: the interpretation can be subjective and vary between pathologists, leading to inconsistencies; and variations in tissue preparation and fixation protocols can affect staining quality. Prioritizing strategies that harness AI for stain normalization113 as well as create spatially aware biomarkers and elucidate tumor heterogeneity in the context of therapy response should be the next direction for clinical translation efforts. For example, studies have shown that AI can analyze digital pathology images to accurately predict TIL levels,114 which have been linked to the response to ICIs. Additionally, AI can analyze the spatial distribution and heterogeneity of TILs within the TME, which have been shown to be associated with the response to ICIs.66 With these insights, clinicians and researchers can better identify patients who are likely to respond to ICIs and potentially improve patient outcomes.

Tumor antigen discovery

As spatial biology continues to evolve as an exploratory field, our focus shifts towards driving biology-informed advancements in CompIO that can translate into tangible changes in clinical practice. A prime example of this is discovering and characterizing tumor antigens for new and effective IO treatments. While cellular immunotherapies targeting somatic mutation-derived antigens (ie, “neoantigens”) or lineage specific antigens have been remarkably successful, suitable antigen targets remain elusive for many cancer types. Recent studies indicate that RNA-level dysregulation, such as cancer-specific RNA processing events, can generate a large catalog of tumor antigens for potential immunotherapy targeting.115 However, various computational challenges exist for discovering and targeting this novel class of tumor antigens. Classic short-read RNA sequencing (RNA-seq)-based methods for interrogating cancer transcriptomes are limited in their ability to predict full-length transcript and protein products in cancer cells.116 Newer long-read RNA-seq-based methods allow end-to-end sequencing of full-length transcripts, but their lower base-calling accuracy and modest sequencing yield represent a major bottleneck.117 New computational tools that leverage short-read and long-read RNA-seq technologies as well as data resources, and address their relative technological limitations, are urgently needed. Such computational tools will enable a more comprehensive and accurate characterization of cancer transcriptomes and proteomes, thus improving antigen discovery.

Multiple factors need to be considered when prioritizing and selecting antigen targets for therapy development. The cellular heterogeneity of antigen targets across cancer and normal tissues may critically affect therapy outcomes, yet such heterogeneity remains to be comprehensively assessed. Recognizing the risks associated with invasive tumor specimen collection, patients’ safety must be a primary consideration when designing protocols for multisite sample collection. Another important topic is the choice between “public” versus “personalized” antigen targets. While public antigens shared by many patients may have broader clinical applicability, there may be safety concerns due to off-tumor expression in normal tissues. In contrast, personalized antigens may provide targets with higher tumor specificity and lower off-tumor toxicity, but concerns about cost and feasibility may hinder their adoption. Biases in training data, including the scarcity of ethnically diverse patient cohorts, may also limit the accuracy of antigen prediction in all populations requiring further research. Computational tools and large-scale data resources, representing a wide array of diverse immunogenetic backgrounds, may help address these issues. Specifically, computational tools for single-cell or spatial analysis of RNA and protein variation across diverse populations and cell types may pinpoint antigen targets with high efficacy, low toxicity, and broad utility, and bring equitable therapeutic opportunities to a broader patient population.

Cancer immunotherapies such as ICIs are designed to unleash antitumor T cells, which in turn mediate antitumor immunity that provides long-term protection. However, open questions remain about the mechanisms that lead to a successful antitumor T-cell response: What determines the fate trajectories of therapeutically relevant T cells? How can we systematically identify tumor-reactive T cells and their cellular phenotypes? Insights into these questions will help characterize the specificity and strength of endogenous antitumor T-cell responses during immunotherapy, which may lead to significant advances in vaccine and immunotherapy engineering. Despite the increasing number of high-throughput, standardized technologies to experimentally identify tumor-antigen-specific T cells,118–121 experiments are costly and few studies to date have successfully identified neoantigen-specific T cells.122–125 In individuals with cancer, only a modest number of neoantigen-specific T cells are detected in the TME; this is confounded by the presence of bystander T cells with viral specificities.122 123 125–127 Remarkably, neoantigen-specific T cells exhibit congruent transcriptional programs across cancer studies.122–125 This supports the joint analysis128 129 of existing single-cell transcriptomic datasets paired with matching TCR data as a basis to more effectively uncover transcriptomic and TCR sequence similarities that underlie tumor-reactive T cells. However, understanding the technology used to generate these datasets is essential. Single-nuclei RNA-seq often under-represents immune cells compared with single-cell RNA-Seq (scRNA-seq),130 and unlike the 5’ sequencing protocol, 3’ sequencing protocol (10× Genomics) does not cover TCR or B-cell receptor sequences. These technical differences highlight the need for careful consideration in selecting single-cell datasets that provide comprehensive coverage of both the transcriptome and immune receptor repertoire to accurately identify tumor-reactive T cells.

Advances in AI for protein structure prediction have radically transformed the field of protein optimization and de novo design.131 While significant progress has been made in ML-based methods for T-cell antigen specificity prediction, a hallmark problem in immune engineering, the complexity and variability of antigen interactions continue to present substantial challenges. Unsupervised ML methods have been developed to annotate T-cell antigen specificity based on cross-referencing with the known TCR-epitope databases132 or by calculating similarities between TCR sequence properties and/or transcriptional profiles in single cells.133 134 Supervised methods, including DL models, have been developed to use TCR-antigen pairs135 as training data to predict epitope specificity, achieving some success in model training but with diminished predictability for unseen antigens.

One of the major barriers in building ML models to predict T-cell antigen specificity is the scarcity of paired TCR-antigen specificity data. Single-cell studies have yielded insights into transcriptional programs of individual T-cell clones and offered clues about T-cell antigen specificity.136 However, with the exception of a few single-cell studies to date,137 TCR and cognate antigen pairs are largely unlabeled or lacking. Publicly available TCR-epitope databases (VDJdb, McPAS-TCR, MIRA, and IEDB) contain an abundance of viral antigen specificity data, and thus, bias can be introduced when using these data for model training. Hence, the paucity of TCR-tumor antigen data poses a significant challenge toward building robust ML models to predict T-cell specificity to tumor antigens.

Data sharing, access, and curation

CompIO has the potential to transform cancer care by enabling more personalized treatments, improving patient outcomes, and reducing healthcare costs. However, challenges remain such as acquiring diverse datasets, standardizing data integration and analysis, and validating AI algorithms in clinical environments. The Surgery Committee of the Society for Immunotherapy of Cancer (SITC) delineates best practices for tissue procurement in IO trials, highlighting critical factors like tissue site, tumor representation, and preservation, which are essential for harvesting tumor-reactive T cells.138 Moreover, accurate documentation of tissue resources and metastatic burden is crucial in IO, exemplified by the immunosuppressive effect of liver metastases.139 In data management, standardization is key to reducing the heterogeneity in gene and protein markers defining immune cell subsets for increased dataset comparability. Efforts by the NCI’s Cancer Immune Monitoring and Analysis Centers and Cancer Immunologic Data Commons (CIMAC-CIDC) Network140 and Cancer Research Data Commons,141 including the recent introduction of the Immuno-Oncology Data Commons, aim to standardize clinical data models across trials and studies and enhance the understanding of how treatments alter the tumor-immune landscape.

Resource sharing, including the sharing of clinical sample and data, is crucial for advancing research in IO. The National Institutes of Health’s (NIH) Data Management and Sharing (DMS) Policy underscores the importance of de-identifying and securely managing human subject data.142–145 Despite these guidelines, the implementation of data-sharing policies is often hindered by long processes like establishing Material Transfer Agreements and Data Use Agreements, which can significantly delay research. To overcome these obstacles, innovative approaches are needed to expedite data sharing while safeguarding privacy. Potential solutions could include streamlined institutional review processes for quicker data access, template consents for genomic data sharing, the generation of synthetic data that retains utility for result reproduction and model training, or the use of privacy-preserving methods like federated learning, which allows for comprehensive model validation without direct data exchange.

Resource sharing through platforms like the NCI’s NCTN Navigator142 is mandatory across oncology NCTN groups, facilitating access to clinical trial biospecimens and annotated clinical data for assay and biomarker development. The application processes typically involve initial discussion based on preliminary data with the group that conducted the trial, followed by submission of a detailed proposal for review by the Core Correlative Sciences Committee, overseen by the NCTN Program and consisting of independent peer-reviewers from the cancer research community. Both academic and private industry entities use NCTN Navigator to obtain histopathology images and radiologic scans for AI development and validation in outcome and treatment response prediction.146 147 The CIMAC-CIDC Network, by collecting biospecimens from immunotherapy trials,140 empowers community-wide research efforts to identify new biomarkers and therapeutic targets to enhance treatment strategies for patients. This endeavor requires advanced computational, mathematical, and bioinformatics tools to drive innovations in drug development and combination therapies.

In CompIO, where ML and DL models are increasingly used in clinical applications including designing personalized cancer vaccines and T-cell therapies, maintaining the highest standards of reproducible research is essential. This requires rigorous preprocessing, quality control, and transparent code sharing. The complexity of ML/DL workflows in this domain presents unique challenges. Automation tools, including Nextflow148 or Snakemake,149 enable the creation of scalable, reproducible, and sustainable computational workflows that can be easily shared and adapted by the community. Centralized platforms such as Google Colab150 and Code Ocean, along with containerization technologies like Docker and Singularity, facilitate data and code sharing by allowing researchers to rebuild the models and predictions within consistent environments. Open-source software enables community-wide assessment and reuse, ensuring the accuracy of these methods and providing additional validation.151 Software ecosystems that offer peer-reviewed packages,152 such as R/Bioconductor and Python’s PyPI, provide an additional layer of quality assurance. The practice of reproducible research may require substantial overhead and additional personnel beyond the efforts of a single analyst, particularly given the complexity of integrating multiomics data to understand antitumor immunity. However, this investment is justified to ensure the reliability of findings that could directly influence patient care. Nonetheless, balancing the protection of intellectual property and adherence to best practices in reproducible research will be the key to advance both innovation and credibility in clinical applications.

As new biomarkers are developed and technologies continue to evolve, the lack of universal standards for in silico validation, reproducibility, and mechanistic analysis significantly hinders the clinical translation of computational discoveries in IO. Enforcing bidirectional bench-to-bedside research and developing community-driven best practices for AI in IO are imperative for ethical and effective patient care. Collaborative efforts, particularly within multidisciplinary teams, are vital to ensure that computational findings are relevant and rigorously validated with ethical consideration. Comprehensive real-world validation of new biomarkers and predictions is crucial, especially in diverse patient populations, to underpin biomarker-guided trials. The design of clinical trials must be grounded in a strong biological rationale, integrating functional studies to delineate causality and avoid oversimplified drug combinations. By adhering to these rigorous validation standards, the field can improve the efficiency and success rate of clinical translations.

Education and training across computational, biological, and clinical domains

Breakthroughs in high-throughput sequencing, drug screening, and MS-based technologies have led to the rapid accumulation of large-scale omics data across a multitude of disciplines, including IO.153 The increased access to and capability of deriving such datasets demand the proper training and retention of individuals equipped with the skill sets required to properly leverage and interpret them for novel biological insight. Therefore, a concerted effort to boost the number of skilled bioinformatics scientists, computational biologists, and data scientists to effectively handle this profusion of data is critical for advancing IO research.

Given the multidisciplinary nature of bioinformatics and the challenge in designing cross-discipline courses, developing such programs has been complicated by a limited consensus in the field regarding the essential knowledge for a successful bioinformatician, and the most effective means of delivering it.154 This lack of agreement was compounded by the wide range of scientific and non-scientific fields in need of computational training, each with different backgrounds and intended applications. The Curriculum Task Force of the International Society for Computational Biology Education Committee has provided a framework curriculum, which outlines a set of bioinformatics core competencies that span various disciplines, backgrounds, and training programs. These core competencies include topics such as recognizing and critically reviewing the format, scope, and limitations of different biological data-generating platforms, technology applications, and methodology in consideration of the experimental material available and experimental objectives, data management plan, and reproducibility. Bioinformatics education programs range from informal workshops and training courses to structured, didactic learning in the form of certified degree programs, including doctoral-level education.155 156 Many of these programs are offered at institutions with Comprehensive Cancer Centers in the USA. However, the training might not be cancer-specific and could benefit from a tailored, widespread, open-source curriculum provided by initiatives such as the NCI ITCR Training Network152 and the SITC-NCI Computational Immuno-oncology Webinar Series.157

Currently, funded training grants in bioinformatics lag far behind their experimental counterparts, which support trainees producing vast amounts of data requiring advanced analytical attention. This discrepancy suggests a further widening gap and a diminishing capacity to meet computational demands. For instance, the number of T32 awards for bioinformatics training programs has increased only incrementally, despite the exponential growth of data in the omics space (figure 2A). Moreover, few NIH agencies have funded these awards to date, resulting in pronounced disparities among them (figure 2B). Further, none of the currently funded awards specifically focus on multidisciplinary training in areas of need, such as computational training within the IO space, where there are unique data challenges unmet by generic computational training programs and a rapidly rising clinical urgency. In total, the training of individuals in bioinformatics and computational biology has not kept pace with the advances in high-throughput omics techniques and big data production, leading to a bottleneck and dampened productivity in the translational research pipeline. Collectively, these examples highlight an urgent need for educators and funding agencies to support a sufficient number and variety of programs in CompIO accessible to trainees at all levels.

Figure 2Figure 2Figure 2

Limited growth in bioinformatics-focused institutional research training (T32) grants. Despite the significant increase of data in omics space, funded T32 grants targeted towards bioinformatics training have seen only a modest rise from 2.7% to 3.5% over the past 9 years (A). The distribution of these grants is uneven among NIH funding agencies (B). Our search results for funded T32 grants over the past 22 years used five search terms: “Bioinformatics”, “Computational Biology”, “Computational Oncology”, “Immune Bioinformatics”, and “Omics”. We acknowledge the potential incompleteness of awards from this search. NCI:National Cancer Institute; NIDA:National Institute on Drug Abuse; NIMH:National Institute of Mental Health; NIA:National Institute on Aging; NIEHS:National Institute of Environmental Health Sciences; NIAMS:National Institute of Arthritis and Musculoskeletal and Skin Diseases; OD: Office of the Director; NHGRI:National Human Genome Research Institute; NIGMS:National Institute of General Medical Sciences; NHLBI:National Heart, Lung, and Blood Institute; NIAID:National Institute of Allergy and Infectious Diseases; NIDDK:National Institute of Diabetes and Digestive and Kidney Diseases; NIH: National Institutes of Health; NICHD:Eunice Kennedy Shriver National Institute of Child Health and Human Development.

Significant emphasis has been placed on providing bioinformatics training necessary to enable the implementation of bioinformatics research. As multidisciplinary teams are becoming the norm, it is important to acknowledge that many computationally trained investigators might have limited training and experience in clinical research. The issues of liability, patient safety, patient privacy, and clinical research ethics, which are second nature to clinical investigators, could be foreign concepts to bioinformatics scientists and computational biologists. Consequently, not all members of a research team may be equally comfortable with or capable of rapidly adapting to implementing these approaches in the context of clinical care. Therefore, training programs that also provide in-depth clinical research training to computational scientists play a vital role in the ethical use of data science and AI methods in translational IO research.

Team science

Bringing together diverse expertize in team science is fundamental for accelerating progress in IO and solving its complex problems. Knowledge sharing and the establishment of new computational solutions have been key drivers of new scientific discoveries.112 158 159 As the scale and complexity of data continue to increase, collaboration with bioinformatics scientists, computational biologists, and data scientists—who develop and apply sophisticated algorithms for data evaluation, integration, and interpretation—has become essential.

Historically, bioinformatics scientists were seen as playing supportive roles in translational or clinical research. However, this outdated view has shifted to a modern, convergence science model. Today, research often begins with large-scale data production and in silico discoveries on human specimens, proceeds to in vitro and in vivo validations, and ultimately moves the needle in clinical practice, all in collaboration with experts across computational, biological, and clinical domains. This culture shift is reflected in a new trend of transdisciplinary research, where laboratories specialized in high-dimensional computational and biological techniques recruit bench scientists for functional experiments and seek clinical partners to translate discoveries into improved patient outcomes. To recruit and retain talent in academia, it is important to adequately recognize the contributions of bioinformatics scientists, computational biologists, and data scientists in team science projects. This requires innovative strategies beyond traditional tenure stream evaluations. Several institutions have implemented a new “team science” track for faculty, where collaborations are valuated as equally important as single-lead projects. The general trend of multi- PI (principal investigator) grants as well as increasing the number of coauthors on transdisciplinary IO papers demonstrates the growing importance and impact of collaborations among experts from different fields.160

The NCI has implemented several programs and networks to incentivize team science. These initiatives encourage collaboration and coordination among researchers from different disciplines, institutions, and sectors. Some specific examples include the IOTN,161 the Specialized Programs of Research Excellence, and the Cancer Center Support Grants. With the implementation of the new NIH DMS policy,162 there is an opportunity to further promote team science, especially involving contributions from bioinformatics scientists, computational biologists, and data scientists. This policy requires that researchers make data from their NIH-funded studies publicly available and encourages

留言 (0)

沒有登入
gif