Emerging trends in the optimization of organic synthesis through high-throughput tools and machine learning

Introduction

Organic synthesis plays a crucial role in drug discovery, polymer synthesis, materials science, agrochemicals, and specialty chemicals. Their synthesis and process optimization require substantial resources and are labor-intensive, often exploring only a single variable in search of the optimal conditions while disregarding the intricate interactions among competing variables within the synthesis process. The complexity of the problem requires consideration that process optimization often demands solutions that meet multiple targets, such as yield, selectivity, purity, cost, environmental impact, etc. In recent years, the advancement of artificial intelligence (AI), machine learning (ML), and automation has produced a paradigm shift for chemical synthesis optimization techniques. By leveraging on ML models to predict reaction outcomes and ML optimization algorithms, this new approach has demonstrated the ability to navigate the complex relationships between reaction variables and finding the global optimal conditions within a fewer number of experiments than with traditional methods . In addition, machine-guided optimization has emerged as a promising framework to obtain reaction conditions that perform optimally for single- or multiple-target objectives, enabling researchers to explore diverse solution spaces and uncover the optimal conditions that strike a balance between consonant and/or conflicting targets. In addition, the incorporation of lab robotics into chemical synthesis has enabled the development of closed-loop optimization platforms capable of executing optimization campaigns rapidly with minimal human intervention, relieving experimenters from labor-intensive tasks and reducing the overall process development lead time .

A standard workflow and general methodology for organic reaction optimization through ML methods is shown in Figure 1. The workflow comprises (i) careful design of experiments (DOE); (ii) reaction execution with commercial high-throughput systems or in-house designed reaction modules; (iii) data collection by in-line/offline analytical tools; (iv) mapping the collected data points with the target objectives; (v) prediction of the next set of reaction conditions towards attaining optimal solutions; and (vi) experimental validation of suggested optimization results. Through an examination of methodologies, algorithms, and various case studies, this article offers our perspective on the state-of-the-art techniques for optimizing the synthesis of organic molecules, highlighting both challenges and prospects. The structure of this review follows the steps presented in Figure 1. In the following section, we review the high-throughput platforms currently used to perform chemical reaction optimization. Thereafter, we discuss the development and use of analytical tools and data processing algorithms. After that, we discuss the latest trends in the selection of optimization algorithms for chemical synthesis. Finally, we highlight the future directions and opportunities in the field. For an in-depth overview on the topic of chemical reaction optimization, the readers are referred to prominent reviews by Taylor et al. , Griffin et al. , and Sagmeister et al. . The first two offer valuable perspectives on chemical reaction optimization, particularly focusing on process scale-up, while the latter discusses the potential of flow platforms for self-optimization reactions. Additionally, we refer the readers to the following literature in other areas relevant to the application of ML to chemical synthesis that are not covered by our review, such as small molecule discovery , drug discovery , retrosynthesis , and catalyst selection and design .

Figure 1: A high-level representation of the workflow and framework used for the optimization of organic reactions through optimization algorithms and high-throughput experimentation (HTE) tools.

Review HTE platforms

HTE platforms were designed to accelerate the discovery and development of organic molecules by the rapid screening and analysis of large numbers of experimental conditions simultaneously. For the purpose of this article, we define HTE as a technique that leverages a combination of automation, parallelization of experiments, advanced analytics, and data processing methods to streamline repetitive experimental tasks, reduce manual intervention, and increase the rate of experimental execution in comparison to traditional manual experimentation. In conventional chemical synthesis, several sequential steps are typically undertaken, involving the setup of the reaction, mixing of reactants, reaction workup, product analysis, and product purification. To perform all these basic chemistry tasks effectively, customizable HTE platforms are available from various laboratory instrument manufacturers or can be assembled using a mix of commercial and in-house developed equipment. Normally, HTE for organic chemistry will include a liquid transfer module, a reactor stage, and analytical tools for product characterization. When the full experimental process is automated and coupled with a centralized control system performing ML optimization, the HTE can function as a self-driving platform where the next iteration of experiments is automatically selected by algorithm without human intervention. This section will highlight the key features of various HTE platforms, including benefits, limitations, and applications to organic molecule synthesis.

HTE using batch modules

Batch reactions occur without flow of the reagents/products into or out of the reaction vessel until a target conversion has been achieved. HTE batch platforms leverage on parallelization of experiments to perform several reactions under different conditions simultaneously to increase the experimental throughput. Commonly, batch platforms include a liquid handling system for setting up reactions based on a plunger pump (e.g., syringe, pipette), a reactor capable of heating and mixing, and in-line/online analytical tools. HTE in batch excels in the control of categorical and continuous variables, in particular for the stoichiometry and chemical formulation of reaction mixtures. Many HTE batch experiments have been performed in self-contained automated platforms developed by various instrument manufacturers (Chemspeed, Zinsser Analytic, Mettler Toledo, Tecan, Unchained Labs, etc.). In these HTE platforms, microtiter well plates (MTP) and reaction blocks containing 96/48/24-well plates are widely used as reaction and characterization vessels . UltraHTE configurations typically incorporate 1536-well plates, enabling the exploration of lager spaces of reaction parameters. While ultraHTE was initially tailored to biological assays, the versatility of these modules has been extended to optimizing chemistry-related processes . The Chemspeed SWING robotic system, equipped with two fluoropolymers and PFA-mat-sealed 96-well metal blocks, was used for the exploration of stereoselective Suzuki–Miyaura couplings, offering precise control over both categorical and continuous variables (Figure 2a) . The integrated robotic system containing a four-needle dispense head facilitated the delivery of reagents in low volume and slurries, ensuring the accuracy and throughput of the process. The entire experimental workflow was further optimized through parallelization, dividing reactions into eight loops, enabling them to complete 192 reactions within 24 loops, achieving a significant throughput in within four days. Other reports for various reactions include the Buchwald–Hartwig aminations , Suzuki couplings , N-alkylations , hydroxylations , and photochemical reactions .

Figure 2: (a) Photograph showing a Chemspeed HTE platform using 96-well reaction blocks. (b) Mobile robot equipment performing tasks normally executed by human experimenters for the photocatalytic conversion of water to hydrogen. (c) Small-footprint portable chemical synthesis platform. (d) Schematic of the SynBot platform developed by Samsung researchers, showing each module used for chemical synthesis. Figure 2a was reproduced from (© 2021 M. Christensen et al., published by Springer Nature, distributed under the terms of the Creative Commons Attribution 4.0 International License, https://creativecommons.org/licenses/by/4.0). Figure 2b is from and was reprinted by permission from Springer Nature from the journal Nature (“A mobile robotic chemist” by B. Burger; P. M. Maffettone; V. V. Gusev; C. M. Aitchison; Y. Bai; X. Wang; X. Li; B. M. Alston; B. Li; R. Clowes; N. Rankin; B. Harris; R. S. Sprick; A. I. Cooper), Copyright © 2020 The Author(s), under exclusive licence to Springer Nature Limited. This content is not subject to CC BY 4.0. Figure 2c is from and was reprinted by permission from Springer Nature from the journal Nature Chemistry (“An autonomous portable platform for universal chemical synthesis” by J. S. Manzano; W. Hou; S. S. Zalesskiy; P. Frei; H. Wang; P. J. Kitson; L. Cronin), Copyright © 2022 The Author(s), under exclusive licence to Springer Nature Limited. This content is not subject to CC BY 4.0. Figure 2d was adapted from (© 2023 T. Ha et al., published by American Association for the Advancement of Science, distributed under the terms of the Creative Commons Attribution 4.0 International License, https://creativecommons.org/licenses/by/4.0).

The versatility to handle multiple reagents and the widespread availability of 96-well plates have facilitated the extensive adoption of HTE under batch conditions for optimizing chemical synthesis. However, several challenges arise when MTP are used as reaction vessels. First, the independent control of variables such as reaction time, temperature, and pressure within individual wells is not possible due to the inherent design constraints of parallel reactors that share the same MTP. In addition, challenges arise when standard MTP-based reaction vessels are used at a temperature near the solvent's boiling point, as this labware is not enclosed or able to cool the top of the reaction vessel to facilitate reflux conditions. Although some research groups have developed custom tools to enable high-temperature reactions, these reactors are currently not commercially available. For an in-depth discussion on the limitations of batch reactors for HTE, we refer the readers to a review by Taylor et al. on chemical reaction optimization .

In recent years, research laboratories have deviated from traditional commercial tools to HTE systems custom-built to the chemists’ requirements and demands. Burger et al. have creatively developed a mobile robot equipped with sample-handling arms, tailored for the precise execution of photocatalytic reactions for water molecule cleavage to produce hydrogen. The mobile robot (Figure 2b) acted as a substitute of a human experimenter by executing tasks and linking eight separate experimental stations, including solid and liquid dispensing, sonication, several characterization equipments, and stations for consumables and sample storage. Remarkably, through a tedious ten-dimensional parameter search spanning eight days, the robot achieved an impressive hydrogen evolution rate of approximately 21.05 µmol⋅h−1. Despite the initial investment and two-year development timeline, the versatility of this robotic system promises remarkable applications in materials, polymers, and chemical synthesis. Most automated synthesis platforms are based on expensive scientific equipment, have a large equipment footprint, and need extensive reconfiguration to adapt to new synthetic protocols. To address this issue, Manzano et al. have developed a small-footprint portable chemical synthesis platform able to perform liquid and solid phase organic reactions (Figure 2c). The platform utilizes 3D-printed reactors that can be generated on demand based on the targeted reaction and features liquid handling, stirring, heating, and cooling modules for enhanced versatility. In addition, the platform is capable of performing under inert and low-pressure atmospheres, handling separation steps, and pressure sensing for reaction monitoring. Its efficacy and robustness were confirmed through the successful synthesis of five small organic molecules, four oligopeptides, and four oligonucleotides in high purity and impressive yield. Although, in the current configuration, the platform lacks characterization modules and has a lower throughput in comparison to other automated platforms, it does offer a low-cost alternative that can be adapted to perform chemical reaction optimization.

In addition to academia, industry is increasingly recognizing the value of investing in custom-built HTE setups to automate their synthesis workflows for enhanced productivity. A fully integrated, cloud-accessible, automated synthesis laboratory (ASL) was designed and built by Eli Lilly . This state-of-the-art facility allowed for heating, cryogenic conditions, microwaving, high-pressure reactions, evaporation, and workup, empowering researchers to conduct an extensive array of chemical reactions. The ASL comprises of three bench spaces dedicated to either high temperature reactions, cryogenic/microwave reactions, or reaction workup. On each bench, a translational combination of robotic arms performs the specific experiments using the modular platforms, while consumables and samples are transferred between benches through a conveyor belt, linking them together. According to the report, the ASL has facilitated over 16,350 gram-scale reactions across various case studies, showcasing the widespread capability. Researchers at Samsung have pioneered the development of SynBot, an innovative autonomous synthesis robot that uses AI and robotic technology to establish optimal synthetic procedures . Similar to ASL, SynBot consists of five modules connected through a conveyor belt backbone, with a robot arm in charge of transferring the samples between them. The modules include a pantry for chemical storage and chemical selection, a dispensing module for solids and liquids, a reaction module capable of heating and stirring, a sample preparation module, and a LC–MS characterization module (Figure 2d). The efficiency of the system has been demonstrated in three reactions types, namely Suzuki–Miyaura coupling, Buchwald–Hartwig amination, and Ullmann coupling. These experiments showcased a conversion rate that outperformed existing reference systems and provided at least six times the efficiency in the experimentation, besides synthesis planning, optimization, and downstream workup tasks. The throughput of SynBot is estimated to be an average of 12 reactions within 24 hours depending on the reaction time. IBM has developed RoboRXN, a remotely accessible autonomous chemical laboratory that enables notable acceleration in chemical synthesis by leveraging cloud computing, AI, and automation . The technology relies on an AI model that recommends the sequence of operations needed to perform the corresponding chemical reactions, including the order of reagent addition . This model facilitates that the synthesis tasks are sent to a robotic research lab from anywhere in the world, allowing the robot to execute the recommended retrosynthesis provided by the user. The enhancements in RoboRXN assist chemists in predicting the environmental impact of chemical processes, and the new AI model also helps identify more environmentally friendly enzymes for chemical reactions. Although RoboRXN has demonstrated the ability to perform most tasks in chemical and material synthesis, the hardware currently cannot perform product purification and multistep synthesis continuously.

HTE using flow platforms

Flow reactions are characterized by a constant flow of reagents and products into and out of the reaction vessel. A flow platform consists of a fluid delivery system, mixing tools, reactors, quenching units, pressure regulation units, and collection vessels. The fluid delivery is normally executed using either high-pressure liquid chromatography (HPLC), a syringe, or peristaltic pumps. A passive mixing stage where the reagents are introduced to the system through a Y- or T-connection is the most common approach observed for most flow reactions, while more specialized mixing tools can be incorporated depending on the reaction prerequisites. The most common reactors used are either microfluidic chip- or coil-based reactors for solution chemistry. Packed bed reactors are used when solid heterogeneous catalysts and reagents (e.g., inorganic bases) are handled. Specialized reactors for electro- and photochemical experiments have also been developed. Depending on the flow of the reaction mixture, flow reactions can be continuous or segmented (also known as slug). Segmented flow reactions present an efficient means to gather diverse data points by creating segmented or droplet flow within microfluidic reactors. Each droplet is carefully separated by either an antisolvent or an inert gas, thus providing every droplet with the functionality of an individual reactor. This segmentation ensures precise control over reactions and prevents interference between different reaction environments. Moreover, the ratio of reagents within these droplets is easily modulated using syringe pumps, providing users with a convenient means to collect data efficiently and coherently. This approach streamlines experimentation processes, enhances reproducibility, and facilitates the exploration of complex reaction spaces with unprecedented accuracy.

Droplet microfluidics has emerged as a powerful tool across diverse scientific disciplines, with dedicated literature offering concepts behind droplet formation . An example of a segmented flow droplet system was employed to screen a range of organic solvents to obtain optimal conditions for the monoalkylation of trans-1,2-diaminocyclohexane . The HTE methodology in combination with feedback DOE facilitated the rapid identification of appropriate solvents. Notably, the use of DMSO, DMF, and pyridine led to an enhanced yield of the monoalkylated product. An experimental setup was developed for single-droplet studies of visible-light photoredox catalysis using an oscillatory flow strategy . In an oscillatory reactor, an alternating pressure gradient is applied within the reactor, causing a back-and-forth oscillation of the reaction slugs, which leads to higher control in mixing and an extended residence time of the reaction mixture. About 150 reaction conditions were explored, using a total volume of 4.5 mL reaction mixture, and the screening results can be readily translated to continuous flow synthesis. The application of segmented flow or microslug reactors was demonstrated in the decarboxylative arylation cross-coupling reaction promoted by catalysts and light . The design allows the screening to be more material- and time-efficient in the optimization of both continuous variables (e.g., temperature and residence time) and discrete variables (e.g., catalyst, base). Pieber et al. reported the application of a segmental flow reactor for heterogeneous solid–liquid reactions. In their report, they described the reaction slugs as serial microbatch reactors (SMBRs) separated through gas segments that incorporated liquid reagents and solid photocatalysts in a continuous flow. The slugs were generated by establishing a stable gas–liquid segmented-flow pattern using a Y-shaped mixer, followed by the suspension of the catalyst via a T-mixer. This technology was utilized to develop selective and efficient decarboxylative fluorination reactions. Recently, a slug flow platform was developed (Figure 3a) by injecting segments of gas as a separating medium for enhancing the optimization of the Buchwald–Hartwig amination intermediate, which is crucial for synthesizing the drug olanzapine . The reactor setup was integrated with spectroscopic and chromatographic in-line analytical tools, enabling real-time monitoring of products and reaction intermediates. A detailed discussion on the optimization strategy is described in the section Machine-learning-driven optimization of chemical reactions.

Figure 3: (a) Description of a slug flow platform developed using segments of gas as separation medium for high-throughput data collection in a Buchwald–Hartwig amination. A six-way mixer was used to mix the solvents and reagents. (b) Schematic representation of a computer-controlled segmented flow pattern developed using degassed water as an antisolvent for the high-throughput polymerization of styrene in p-xylene. A staggered infusion of organic and aqueous phases resulted in the exploration of a wider parameter space.

Robochem, a HTE platform, was designed to streamline the screening of photochemical reactions, facilitating the rapid generation of diverse reaction mixtures, each comprising 650 µL within a slug flow reactor . This innovative system features precise monitoring of the reaction slug through a dedicated array of phase sensors and an algorithm designed for detecting its passage. As a result, the workflow delivers a notable boost in productivity, surpassing traditional batch reactions by over a 500-fold and outperforming flow reactions with a five-fold improvement. A fully integrated automated multistep chemical synthesizer (AutoSyn) was reported to be able to autonomously synthesize milligram- to gram-scale amounts of any organic or drug-like molecule . The system comprised of a flow chemistry synthesis platform, a reagent delivery system, a packed bed reactor, process-analytical tools, and an integrated software control system that automates end-to-end process operations and monitoring. The system has been used to demonstrate the synthesis of at least ten drug molecules autonomously, and it does not include a closed-loop optimization framework. The Pfizer research team developed a custom-designed flow system for rapid reaction screening of the Suzuki–Miyaura coupling reaction on a nanomolar scale . The platform included a modified HPLC system that supplied a flowing stream of 12 selectable solvents, an autosampler that injected microliter amounts of preselected reaction mixtures, and an LC–MS device for product characterization. Approximately 5,760 reactions were screened across a selection of 11 ligands, seven bases, and four solvents, along with appropriate control experiments being performed. The nanomolar droplet system enabled a very high throughput, exceeding 1,500 reactions every 24 hours. This extensive and intelligent screening approach identified optimal conditions for scaling up selected reactions to 50–200 mg under batch and flow conditions.

In addition to organic synthesis, the slug flow methodology has found application in polymer synthesis. A flow platform capable of polymerizing 397 unique copolymer compositions was developed by Reis et al. using a droplet flow reactor. The methodology and high-fidelity data enabled them to discover more than ten copolymer compositions of promising 19F MRI agents that outperformed state-of-the-art materials. A rapid generation of copolymer libraries was achieved by forming a droplet flow in an automated HTE flow setup . This approach not only assists in overcoming viscosity challenges in conventional photopolymerization reactions but also helps to identify structure–property relationships for copolymer libraries. We have generated a segmented flow pattern (Figure 3b) by alternating the infusion of organic components and degassed water to create nine different compositions . The organic components consisting of styrene, α,α′-azobisisobutyronitrile (AIBN), and p-xylene were infused using a computer-controlled segmented-flow platform. These approaches allow the compartmentalization of reaction mixtures without cross-contamination and enhance experimental throughput significantly. The concept of parallel flow reactors, where several distinct reactions conditions are tested simultaneously, has been proposed as a pathway to increase the throughput of flow reactions. Ahn et al. designed and fabricated a complete prototype equipped with a unique built-in flow distributor (Figure 4) and 16 microreactors capable of executing diverse conditions in parallel, including photochemistry. The temperature of the capillary reactors can be controlled independently, providing flexibility in experimentation. The reservoir-type distributor, featuring a baffle structure, not only ensures uniform flow of reagents even when one or more reactors experience clogging but also allows for variation of the residence time of individual capillary reactors. The authors demonstrated the capabilities of their platform by executing 12 distinct reactions, which encompassed six different types of chemical transformations based on diazonium chemistry, in parallel (Figure 4). A total of 96 reaction conditions were tested, leading to optimized reaction parameters in less than an hour.

Figure 4: Schematic representation (a) and photograph (b) of the flow parallel synthesizer intelligently designed for screening optimal conditions for diazonium chemistry (c) for 24 products within two sequences. Figure 4 was reproduced from (© 2021, G.-N. Anh et al., published by Springer Nature, distributed under the terms of the Creative Commons Attribution 4.0 International License, https://creativecommons.org/licenses/by/4.0).

Chatterjee et al. introduced the concept of radial synthesis to perform multiple single-step chemical reactions or to decouple multistep reactions into parallel processes. Individually accessible reactors are arranged around a central switching station that enables the delivery of independent reaction mixtures or reagents. Each reactor loop functions as an independent unit to carry out thermal or photochemical reactions under different conditions. This parallel reactor setup was successfully utilized for the multistep synthesis of 18 compounds of an anticonvulsant drug, employing various reaction pathways to perform photoredox carbon–nitrogen cross-coupling reactions. A parallel droplet flow system was developed by Eyke et al. to significantly increase the throughput of reaction screening. A closed-loop Bayesian optimization (BO) framework was integrated to optimize reactions involving both continuous and categorical variables. The team upgraded the oscillatory droplet reactor platform to a high-throughput version consisting of multiple independent parallel reactors. This parallelization enables the collection of high-fidelity data for reaction kinetics and optimization for at least six different chemical reactions. The major bottleneck in HTE synthesis lies in the challenge of isolating and purifying reaction products once experiments are performed. Despite this bottleneck, the landscape is evolving, with various practical tools emerging to streamline purification processes. From prepacked silica gel tubes to the precision semipreparative liquid chromatography and the versatile capabilities of various scavenger resins, laboratories are witnessing a surge in options for efficient high-throughput purification, particularly in chemical synthesis on a modest scale. A change in thinking beyond conventional purification methods presents an opportunity to revolutionize HTE flow platforms. A completely novel design, differing from established isolation and separation techniques, holds the promise of not only enhancing the efficiency of HTE flow synthesis but also paving the way for more sustainable growth in this research area.

Autonomous self-optimizing flow reactors

Autonomous self-optimizing flow reactors (ASFRs) represent a promising advancement in the process optimization of chemical reactions. ASFR combines principles of automation, AI, in-line analytics, and robotics to streamline and accelerate the process optimization workflow. ASFRs enhance the yield and throughput of synthesis by minimizing waste. Engaging in-line/online analytics and integrating them with flow systems is relatively straightforward. The real-time processing of analytical data allows for immediate adjustments to the reaction parameters, enabling the attainment of optimal solutions rapidly. Consequently, the process can lead to lower energy consumption and reduced use of hazardous materials, contributing to more sustainable chemical processes. Integrating ML algorithms to simultaneously optimize multiple parameters such as yield, purity, and cost within a closed-loop represents a significant advancement in process design. Furthermore, automation in ASFRs reduces the need for constant human oversight, lowering operational costs and minimizing the risk of human errors. A schematic representation of ASFR is provided in Figure 5a.

Figure 5: (a) Schematic representation of an ASFR for obtaining an optimal solution with minimal human intervention. Selected case studies (b–d) with closed-loop optimization are provided. Abbreviations “Obj” and “NOI” represent “objective functions” and “number of interactions”.

A self-optimizing microreactor system has been devised specifically for closed-loop optimization of the Heck reaction, employing a "black-box" optimization strategy directed by Nelder–Mead simplex method algorithm . In-line HPLC analysis was performed to determine the product yield in real time and give feedback to the control system to direct the input conditions to achieve the optimum product yield in 19 automated experiments. The optimum conditions for the formation of the monoarylated product 1 (Figure 5b) identified in a microfluidics system were successfully translated to a mesofluidics system on a 50-fold scale to afford 26.9 g of the product 1. LeyLab, a modular software system developed by Fitzpatrick et al. , allows researchers to oversee chemical reactions online. The hydration of 3-cyanopyridine to an amide was monitored by online MS, offering real-time conversion insights. Through 30 experiments within ten hours, five key reaction parameters were finely tuned for optimal conditions.

Photochemical reactions require uniform light penetration of the reaction mixture, and flow setups with uniform path lengths would be ideal for such reactions. A self-optimizing continuous-flow reactor was designed by Poscharny et al. for [2 + 2]-cycloaddition reactions promoted by light. The optimization (modified simplex) algorithm elaborated the optimal conditions within 25 iterative experiments to afford compound 3 (Figure 5c) in good yield. A modular autonomous flow reactor controlled via MATLAB was designed for the synthesis of carpanone (7, Figure 5d) using a modified Nelder−Mead algorithm . The four-step process involves allylation, Claisen rearrangement, isomerization, and oxidative dimerization. Each reaction step was optimized independently by using either online HPLC or in-line benchtop NMR spectroscopy to afford an overall yield of 67% in 66 iterative experiments over four linear reaction steps. Nandiwale et al. reported the autonomous optimization of three multiphase catalytic reactions involving the handling of solid substrates, operating the photoreactor, and feeding of the slurries, catalysts, and inorganic bases in an automated flow platform comprising a continuous stirred tank reactor (CSTR) cascade. The platform allowed to showcase the autonomous optimization to find the ideal reaction conditions for Suzuki–Miyaura and photoredox-catalyzed coupling reactions.

A plug-and-play, continuous-flow chemical synthesis system (Figure 6a) was intelligently designed by Bédard et al. to mitigate some of the challenges in traditional organic synthesis by the integration of hardware, software, and analytics. Comprising an array of modular components, including units for heating, cooling, LED light exposure, and packed bed reactors, it provides a flexible platform for various reaction categories. The system consists of a liquid–liquid separator and an in-line/online analytical tool to facilitate closed-loop autonomous optimization. The capability of the system was demonstrated in the optimization of C–C and C–N cross-coupling, olefination, reductive amination, photoredox-catalytic, and nucleophilic aromatic substitution reactions, as well as in the two-step synthesis of cyclobutanone. The molecules synthesized under the optimal conditions are presented in Figure 6b, employing the stable noisy optimization by branch and fit (SNOBFIT) algorithm. SNOBFIT offers a convenient methodology for global optimization, eliminating the necessity of a theoretical model. A reconfigurable automated flow platform integrating online HPLC monitoring was used for the cobalt-catalyzed aerobic oxidative dimerization of desmethoxycarpacine (6) to carpanone (7) in the presence of oxygen as an oxidant . A gas−liquid segmented or a tube-in-tube strategy was adopted to achieve a higher yield within a shorter residence time. Substantial further developments have been made in applying ASFR in multiobjective optimizations, which will be discussed in detail below in the section "Machine-learning-driven optimization of chemical reactions".

Figure 6: (a) A modular flow platform developed for a wider variety of chemical syntheses. (b) Various categories of chemical reactions optimized and molecules synthesized in a continuous flow system are given. Figure 6a is from . Reprinted with permission from AAAS. This content is not subject to CC BY 4.0.

Real-time analytics and high-throughput data processing

Real-time analytics play a critical role in the optimization of chemical reactions via high-throughput synthesis and ML algorithms. Process-analytical technology (PAT) tools empower researchers to obtain chemical insights from a large number of experiments, facilitating the precise measurement of optimization targets. The integration of real-time analysis in HTE presents a multitude of advantages over traditional, one-time final product evaluations, as outlined below:

(i) Real-time analysis facilitates rapid decision-making, enabling researchers to continuously monitor and analyze data as it is generated and allows for immediate adjustments to process parameters during experiments.

(ii) Early detection of trends or anomalies are made possible through real-time analysis, providing valuable insights that can guide subsequent experiments and inform iterative improvements and optimizations in experimental protocols.

(iii) By optimizing experimental workflows and minimizing waste through real-time analysis, researchers can allocate resources more efficiently, ensuring that resources are utilized effectively to maximize experimental outcomes.

(iv) Enhanced experimental control on the process to deliver constant product quality to meet desired specifications and standards.

(v) By providing instantaneous feedback, real-time analysis accelerates the optimization process, reducing the experimentation time and expediting the discovery of optimal reaction conditions with minimum material use.

Analytical tools are integral components of high-throughput platforms and are found in various configurations, such as in-line, online, at-line, and offline, contingent upon their placement within the experimental workflow. In Table 1, we describe the subtle disparities for clarity and reference.

Table 1: Different analytical methods depending on their position within the experimental workflow.

analytical method description in-line Analyzed in real time during the reaction or production process by directly integrating appropriate devices. online Sampling and analysis take place while the reaction or process is running. The analysis is done on a device located nearby. Online analyses can be carried out continuously or at set intervals. Autonomous sampling allows for direct online analysis with the instrument. at-line Like online analysis, the samples are analyzed usually within a manufacturing facility. An aliquot of the reaction mixture is taken for analysis. Human intervention is often required for this task. At-line analysis still provides relatively rapid results compared to offline methods, offering a balance between real-time monitoring and convenience. offline Analysis conducted outside of the process environment and separate from ongoing operations. Provides a more detailed and comprehensive analysis compared to real-time monitoring.

Self-optimizing HTE throughput platforms require in-line and/or online characterization as well as data analysis and processing for rapid optimization of organic reactions. Chromatographic (i.e., HPLC, GC) and spectroscopic (e.g., NMR, FTIR, UV–vis, Raman) characterization methods are commonly used in real-time reaction monitoring. To quantify the products of a chemical reaction, a calibration curve is required before the optimization campaign. The following sequential steps are typically employed to refine raw data into actionable inputs for building ML models for optimization: (i) extraction and categorization of appropriate spectra; (ii) fitting of spectral peaks utilizing predefined functional models, alongside deconvolution of overlapping signals; (iii) consolidation of extracted peak information and generation of relevant data plots; and (iv) extracting the relevant information and formatting into input data for ML models. A recent review by Felpin and Rodriguez-Zubiri highlighted the selection of in-line/online analytical tools that can be integrated into flow reactors for the monitoring of chemical reactions. In the current review, we focus on the high-throughput data processing that complements the HTE platforms for rapid optimization of organic reactions. Although multivariate data analysis has frequently been adopted in analytical chemistry for rapid data processing, the availability of relevant open-source code is relatively low . Consequently, the development of open-source code for data processing is interesting for the scientific community.

Jansen et al. have developed HappyTools, a tool for the analysis of HPLC measurements, able to calibrate retention time, perform peak quantification, and use various quality criteria to curate the compiled data. For the quantification and calibration of chromatographic peaks, the user can either input a peak list containing the retention time window of the target chemicals, or the tool can use an automated peak detection algorithm, removing the need of user input. The peak detection algorithm was developed using a loop to attain the user-specified cut-off value of the highest-intensity peak. A new univariate spline is fitted for each iteration, from which the local maxima and minima are determined. Overall, HappyTools showed similar or better performance in comparison to existing commercial software. In particular, HappyTools showed an enhanced throughput, demonstrating up to a ten-fold reduction of the total processing time for biopharmaceutical samples. The authors have released the source code and an executable program in an online repository to be employed freely for research purposes.

In addition to HappyTools, there are other available open-source Python packages to analyze chromatographic and spectroscopic data. A cross-platform Python package named Aston can be used to process both UV–vis and MS data. The open-source library is written using Python, NumPy, and SciPy and is openly hosted in an online repository . Similarly, for processing chromatographic data from GC–FID, HPLC–UV, or HPLC–FD, packages are also available open source. Embedding these codes into HTE and ML workflow dramatically improves the efficiency and speed of the optimization processes. Liu et al. developed a custom-built Python script to study the kinetics of carbonyldiimidazole-mediated amide formation by analyzing data from online HPLC and in-line FTIR-spectroscopic measurements. Their algorithm was able to automatically detect peaks from chromatographic spectra and to automatically assign the peaks to reagents or products depending on the decrease or increase in peak intensity over time. In addition to monitoring the evolution of the reaction, the IR spectral data was processed in real time. This was to ensure the complete consumption of acid reactant and to feed this information back to the pump for immediate quenching of carbonyldiimidazole to prevent any side reactions. The entire process allows to control the acid activation and amide formation precisely to afford the desired final product in quantitative yield.

Recently, Sagmeister et al. assembled four complementary PATs, including in-line NMR, UV–vis, IR, and online ultra-high-performance liquid chromatography (UHPLC) to meticulously monitor the intricate three-step linear synthesis of the drug mesalazine (18, Figure 7) with a 1.6 g⋅h−1 throughput. In the first step, the nitration reaction was monitored by in-line NMR. The overlapping peaks were resolved for accurate quantification by building a chemometric model. The model also allowed for flexibility to small changes in peak positions and shapes in repetitive analyses. An in-house-designed flow cell equipped with a reflectance probe was employed for real-time monitoring of hydrolysis by in-line UV–vis spectroscopy. The raw data was processed using a sophisticated neural network algorithm, yielding rapid quantification with an impressive processing time of 1.4 ms per spectrum. This streamlined approach ensured efficient and timely data analysis, facilitating seamless real-time monitoring of the hydrolysis of 16. The final hydrogenation step was monitored by an in-line IR probe. The spectral data was processed using a partial least squares regression model and quantified. An online UHPLC was used to analyze the final composition of the reaction mixture after three reaction steps. The integration of all PAT tools into the three-step reaction was carefully executed with an open platform communications unified architecture (OPCUA) platform for interplatform equipment communication. The adoption of the OPCUA platform ensured seamless communication between different equipment platforms for enhanced efficiency and accuracy in data analysis.

Figure 7: Implementation of four complementary PATs into the optimization process of a three-step synthesis.

A recent study introduced a novel approach for directly processing and analyzing HPLC−DAD raw data using Python . This method leverages the Multivariate Online Contextual Chromatographic Analysis (MOCCA) package, designed for integration into both automated and manual workflows. MOCCA offers a range of benefits, including automated management of internal standards for precise relative quantification, reliable peak assignments, accelerated sample processing, and efficient deconvolution of overlapping peaks. Its versatility was showcased through the successful completion of four comprehensive case studies, demonstrating its broad applicability across diverse analytical scenarios. Recently, we implemented in-line Raman spectroscopy to monitor the real-time conversion of styrene to polystyrene, utilizing a custom Python package developed in-house . This approach enabled us to track the conversion process at different residence times. Specifically, we quantified the conversion by analyzing the area under the curve (AUC) of the Raman-active vibrational modes associated with the styrene–vinyl C=C stretch (≈1630 cm−1), which we calibrated against signals from p-xylene (≈830 cm−1). To resolve overlapping peaks, we employed curve-fitting techniques utilizing Lorentzian functional forms, facilitated by the lmfit Python package. This methodology (Figure 8) allowed us to accurately calculate conversion rates and to make precise predictions using ML models. Traditionally, the optimization of a chemical reaction, the development of kinetic models, and optimization of analytical characterization parameters are undertaken independently. With this approach, many overlapping tasks are performed in parallel, thus leading to long lead times and inefficient personnel allocation. To overcome these issues, Sagmeister et al. developed a dual modelling approach using a single platform that seamlessly integrates the calibration of PAT, reaction optimization, kinetic modelling, and parametrizes a process model for scale-up within approximately eight hours. Their platform consisted of a flow reactor connected to an in-line FTIR spectrometer. In addition, the platform has two valves that allow a stream of reagents or target product to bypass the reactor coil directly into the in-line FTIR spectrometer. Using this configuration, the platform can perform a calibration of the reagent and product concentration through a standard addition method. Once the PAT is calibrated, the platform performs dynamic experiments where the concentration of the reagents is ramped to explore the parametric space. Finally, using a scientific programming language called Julia, the collected data can be fitted to the kinetic model parameters, and in silico optimization of the reaction parameters can be performed.

Figure 8: Overlay of several Raman spectra of a single condition featuring the styrene vinyl region (a) and the p-xylene region (b). (c) Waterfall plot depicting the decrease in the vinyl peak AUC over time. (d) A representative conversion plot shows an increasing conversion with residence time. Figure 8 is adapted from . Figure 8 was reprinted with permission from , Copyright 2023 American Chemical Society.

Machine-learning-driven optimization of chemical reactions

Historically, optimization of chemical reactions has been performed based on DOE methodologies, with the objective of maximizing the yield of the reaction product. However, these techniques are not well suited to find the global optimal conditions and scale exponentially with the number of variables. Computational approaches that rely on optimization algorithms offer more efficient ways to obtain the optimal conditions, without requiring an exponential number of experiments per variable to be optimized. Early examples of chemical reaction conditions optimization through computational approaches focused on the application of black-box optimization algorithms, such as steepest descent, SNOBFIT, and Nelder–Mead simplex, which demonstrated positive results and the ability to perform self-optimizing automated workflows with little human intervention . In recent years, ML optimization methods have demonstrated the ability to obtain optimal reaction conditions within a reduced number of experiments in comparison to human intuition, traditional DOE, and other black-box optimization algorithms . Unlike traditional optimization algorithms, the ML approach focuses on building predictive surrogate models for objective functions. These models learned the relationships between the reaction conditions and the target optimization objectives based on experimental data. In a second step, these models are efficiently probed to identify the most promising values for optimizing the objective function. In this section, we review the latest developments in ML optimization strategies for the optimization of chemical reactions.

Figure 9a outlines the basic steps for the optimization of chemical reactions using ML methods. The workflow requires an initial set of experimental data that contains different variables for reaction conditions (i.e., temperature, time, solvent, catalyst, etc.) and the corresponding outcome values for the target optimization objectives (e.g., yield, purity, cost, etc.). The initial dataset is commonly obtained by sampling a combination of reaction variables from the parametric space, performing the synthetic experiments under the selected reaction conditions, and measuring the values for the target optimization objectives. The sampling of the initial reaction variables is often performed through near-random statistical methods, such as Latin hypercube sampling (LHS), Sobol sampling, full factorial sampling, and centerpoint sampling methods. Alternatively, the initial dataset can be obtained from values previously reported in the literature. After that, one or various predictive models are fitted to the initial dataset to predict the expected values of the optimization objectives. The number of models that are fitted depends on the number of optimization objectives, and normally one model is constructed for each optimization objective. The next step involves the application of an optimization algorithm to find the parameters that would most likely lead to optimal outcomes of the target optimization objectives. Finally, a set of the most promising suggestions is selected and tested experimentally. The dataset is then updated with the outcomes of the latest experimental parameters, and the process is repeated until the optimal conditions have been found. Depending on the number of objectives, optimization campaigns are classified as single-objective (Figure 9b) or multiobjective optimizations (Figure 9c). In single-objective optimizations, the algorithm will explore the parametric space to determine the optimal conditions by finding the variables that either maximize or minimize the target objective function. In multiobjective optimizations, the algorithms will search for optimal conditions that either maximize or minimize each objective function. On the other hand, when competing objectives are optimized, the algorithm aims to discover the set of solutions where the improvement of one objective results in the deterioration of the other. This set of solutions is called the Pareto front of the system (also known as nondominated solutions), and all other solutions that are not part of the Pareto front are not optimal for any of the objectives and are referred to as dominated solutions. Since all solutions in the Pareto front are optimal, the user is responsible for choosing the set of conditions for their specific application.

Figure 9: (a) Schematic description of the process of chemical reaction optimization through ML methods. (b) 3D representation of the objective function depending on two variables, showing the path of five optimization iterations that aim to minimize the value of the objective function. (c) Representation of the outcomes of a multiobjective optimization campaign. Each data point represents one experimental reaction condition. The Pareto front of the system, where the im

View original article

BEILSTEIN JOURNAL OF ORGANIC CHEMISTRY

分享书签

0 0 0 0 0 0 0

More from this channel

Emerging trends in the optimization of organic synthesis through high-throughput tools and machine learning

留言 (0)