Improved LC–MS identification of short homologous peptides using sequence-specific retention time predictors

Materials

Acetonitrile, methanol, formic acid and trifluoroacetic acid (ULC/MS grade) were obtained from Biosolve (Valkenswaard, the Netherlands). Dichloromethane and di-isopropylethylamine were purchased from Fisher Bioreagents (Basel, Switzerland). The amino acids used for peptide synthesis were purchased from Novabiochem (Nottingham, UK).

Swapped-sequence synthesis

During synthesis, we focused on the amino acids that are commonly present in plant-proteins and that are considered most relevant in terms of taste and off-taste of plant-protein derived products. Amino acids known to be taste relevant, but not abundantly present in plant proteins such as cysteine (C), histidine (H), threonine (T) and methionine (M) were excluded. In total, five sets of peptide mixtures were prepared according to a published solid phase peptide synthesis protocol [19].

A tripeptide mix with 75 unique peptides was synthesized in a three-step operation. One gram of 2-chlorotrityl chloride resin (Iris Biotech GmbH, Marktredwitz, Germany) was used (loading capacity 0.8 mmol). Resin was preswollen with dichloromethane (DCM) and subsequently treated with 0.3 Eq. (0.24 mmol) of each Fmoc-Arg(Pbf)-OH (R), Fmoc-Pro-OH (P) and Fmoc-Phe-OH (F) in DCM with 2 Eq. (1.6 mmol) di-isopropylethylamine (DIPEA) for 2 h. Non-reacted trityl groups were capped using methanol (wash 3 × 15 min with 17:2:1 DCM:MeOH:DIPEA). Afterwards, the beads were washed with DCM (3 × 2 min) and DMF (2 × 2 min) and dried using diethyl ether. Fmoc was removed using 20% piperidine in N,N-dimethylformamide DMF (2 × 8 min). Then the resin was washed with DMF (3 × 2 min). After this, Fmoc-Gly-OH (G), Fmoc-Ala-OH (A), Fmoc-Lys(Boc)-OH (K), Fmoc-Trp(Boc)-OH (W) and Fmoc-Leu-OH (0.2 eq each, 0.16 mmol) were activated with 2-(1H-benzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate (HBTU) (1 eq, 0.8 mmol) and DIPEA (2 eq, 1.6 mmol) in DMF for 2 min before being added to the resin. The reaction mixture was allowed to couple for 2 h. Successful acylation was proven by resin staining using ninhydrin (15 g/L, supplemented with 30 mL/L acetic acid in n-butanol). Upon completion, the resin was washed with DMF (3 × 2 min) and deprotected by 20% piperidine in DMF (2 × 8 min). After washing again with DMF (3 × 2 min), the next amino acids were coupled repeating the same process. For this coupling, Fmoc-Val-OH (V), G, K, L and A (0.2 eq each, 0.16 mmol) were used in order to obtain a mixture with 75 unique peptides.

Four sets of tetrapeptide mixtures were prepared with in total 216 unique tetrapeptides in all four sets combined. Similar to the approach of the tripeptide mixture, 1 g of 2-chlorotrityl chloride resin was used. The first three amino acids (0.3 eq each) were coupled in a similar way. For set 1: G, Fmoc-Ile-OH (I) and V were used; for set 2: Fmoc-Gln(Trt)-OH (Q), Fmoc-Asn(Trt)-OH (N) and Fmoc-Ser(tBu)-OH (S) were used; for set 3: W, Fmoc-Tyr(tBu)-OH and A were used; for set 4: Fmoc-Glu(OtBu)-OH (E), Fmoc-Asp(OtBu)-OH (D) and Fmoc-Thr(tBu)-OH (T) were used and reacted for 2 h. Non-reacted trityl groups were capped, also with methanol mixture, and subsequently washed. The four mixtures were then reacted with the C1 amino acids, for sets 1–2: P, T and K, and for sets 3–4: R, I and G (0.333 eq of each amino acids, 0.27 mmol). For this coupling, HBTU (1 eq, 0.8 mmol) and DIPEA (2 eq, 1.6 mmol) in DMF were used. The same procedure was repeated for the coupling on the C3 position: for set 1: Q, N, S; for set 2: G, I, V; for set 3: E, D, T; and for set 4: W, Y, A. The fourth and final N-terminal coupling included coupling of 2 amino acids per mixture: for sets 1–2: W, Y, and for sets 3–4: P, F. 0.5 eq of each peptide was used (0.4 mmol), and again 1 eq of HBTU and 2 eq of DIPEA in DMF. Each peptide coupling was reacted for 2 h.

The formed peptides were acidolytic cleaved from the resin and fully deprotected by treatment with a cocktail of 95% trifluoroacetic acid (TFA), 2.5% triisopropylsilane (TIS) and 2.5% Milli-Q (deionized water, produced with a Milli-Q Integral 3 system; Millipore, Amsterdam, the Netherlands) in a ratio of 10 mL/1 g resin for 3 h. Each cleaved peptide resin was then washed extensively with fresh cleavage cocktail. The peptide was precipitated by addition of ice cold diethyl ether (1:1 ether:hexane, 10 × initial cocktail volume) and centrifuged for 10 min at 6000 rpm. The supernatant was discarded, and the precipitate was washed with ice-cold diethyl ether and again centrifuged for 10 min at 6000 rpm. The wash step was repeated once more. The resulting precipitate was dried under a light stream of N2, redissolved in acetonitrile (ACN):Milli-Q (4:6) and then lyophilized (Labconco FreeZone lyophilizer, 2.5 L,  − 84 °C, connected to a 35i xDS Edwards Oil-Free Dry Scroll Pump). Since the standards were only used for qualitative analysis, the obtained crude peptide mixtures were used without any further purification, and no further purity assessments were performed.

In Table 1, the synthesis route of the homologous tetrapeptides is shown. For the generation of the peptides with altered amino acid sequences, the steps 1 and 3 in the synthesis procedure were swapped.

Table 1 Scheme for the swapped-sequence synthesisProtein digestions

Albumin from chicken egg white, α-lactalbumin from bovine milk and κ-casein from bovine milk were obtained from Sigma (Zwijndrecht, the Netherlands). Papain enzyme from Carica papaya (10 mg/mL) and TFA was purchased from Merck (Hohenbrunn, Germany). The protein standard solutions were prepared by dissolving 1 mg of the respective protein in 1 mL demineralized water (Millipore, Amsterdam, the Netherlands) in individual Eppendorf tubes. Standard solutions were pre-incubated with 30 µL of papain solution (1 mg/mL) at 65 °C for 5 min. The enzyme-to-protein ratio was 1:33. The sample was incubated overnight (16 h) at 37 °C. Enzyme deactivation was then performed by heating the solutions to 95 °C for 5 min.

Peptide standard solutions

Ten milligram of each of the five synthesized peptide mixes was accurately weighed into individual 15-mL Falcon tubes. The standards were dissolved in 1.0 mL solution of 10% acetonitrile, 1% trifluoroacetic acid and 89% Millipore water (v/v). The expected concentrations in the stock solution per peptide were approximately 0.185 g/L or 370 µM (10 mg/54 peptides per mL, assuming an average MW of 500 Da). The stock solutions were further diluted prior to analysis.

Reversed-phase liquid chromatography–UV absorption–high-resolution mass spectrometry–ddMS2

All analyses were performed on an UltiMate 3000 RS chromatography system equipped with a UV detector, connected to a Q-Exactive Plus Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fischer Scientific, Waltham, MA, USA). An Xselect Peptide CSH C18 130 Å column (particle size 2.5 µm, dimensions 150 × 2.1 mm, Waters, Etten-Leur, the Netherlands) with associated VanGuard pre-column was used for the separations. The analytes were eluted at a flow rate of 0.35 ml/min using a linear gradient of water (solvent A) and acetonitrile (solvent B) both fortified with 0.1% formic acid. The gradient was programmed as followed: 1 min at 1.0% B, in 29 min to 40% B, in 3 min to 100% B, 3 min at 100% B, in 1 min back to 1.0% B and finally re-equilibration for 11 min at 1.0% B. The total run time was 50 min. The column temperature was maintained at 40 °C and the auto-sampler at 4 °C. The injection volume was 5 µL. UV absorption was measured at wavelengths 214 and 280 nm with a bandwidth of 4 nm at a frequency of 5 Hz.

The heated electrospray ion source was operated in positive mode at a capillary temperature of 300 °C and a heater temperature of 413 °C. The sheath gas was set to 48 arbitrary units, the auxiliary gas was set to 11 arbitrary units, and the spray voltage was set to 3.7 kV. The mass spectrometer was set to operate in full-scan MS data-dependent MS2 (ddMS2) mode. Full-scan spectra were acquired at a resolution of 70.000 in the m/z range 80–1200 using a maximum ion injection time of 100 ms, unless stated otherwise. From the top five most abundant ions, ddMS2 scans were acquired at a resolution of 17.500 with an isolation window of 2 m/z whilst having the dynamic exclusion list set to 10 s. The maximum injection time was set to 150 ms, and a normalized stepped collision energy was applied of 15, 30 and 45 arbitrary units.

Peptide identification

The peptides in the digest were identified by de novo peptide sequencing and database searching using Peaks 8 software (Bioinformatics Solutions Inc., Waterloo, Ontario, Canada) [20]. Search parameters were specified enzyme: none; precursor ion mass error tolerance: 10 ppm; fragment ion error tolerance: 0.5 Da; dynamic modifications: none; peptide multiple charges from 1 + to 3 + ; and monoisotopic precursor mass. To reduce the number of false positive identified peptides, the hit threshold (− 10logP) was set at  ≥ 15, and the de novo score (average local confidence) threshold was set to the value of 70.

The reported amino acid sequences of tentatively identified short peptides with low reported confidence scores were manually compared with the amino acid sequence of the digested protein standards. This included the manual validation of the MS/MS spectra and the proposed peptide sequences. Only small peptides that had a sequence that occurred in the protein were included in the final peptide dataset. This approach resulted in approximately 450 identified peptides from the digested proteins with a sequence length of 5 or less amino acids and their experimental retention times. A list of all identified peptides sequences is added in Table S1, in the supporting materials.

Data processing

Peptide retention times were modelled using peptide descriptors as explanatory variables. The LC-MS dataset, Table S1, used for building the prediction models was randomly split in a training set and test set using an 85:15 ratio. All training data was centred and scaled to unit variance prior to training of the model, and a tenfold bootstrap cross-validation was applied. Support vector regression (SVR) [21] was used for creating the model and was implemented in R-studio (version 2021.09.0 build 351) using the SVMradial function from the R-package Caret (version 6.0–92) [22]. The method uses the following hyper parameters: cost-value and sigma. The hyperparameters were optimized by a grid search, and the final optimized model had a cost and sigma value of 14 and 0.01, respectively. The model file and a R-script to perform retention time predictions can be found at github.

Descriptors

Two types of peptide descriptors were used in the model as explanatory variables, first non-sequence-specific descriptors and secondly amino acid index derived sequence-specific peptide (ASP) descriptors, Fig. 1.

Fig. 1figure 1

Schematic representation of the peptide descriptor information used to build and evaluate retention models for short peptides. For each peptide, a standard set of non-sequence-specific descriptors was calculated next to a set of sequence-specific descriptors utilizing the amino acid index database. All models were evaluated on their performance

Non-sequence-specific peptide descriptors

The non-sequence-specific descriptors included in the model are the molecular weight, peptide length, the average calculated hydrophobicity according to the Abraham-Leo scale [23], the calculated isoelectric point using the emboss scale [24] and eight principal components score vectors of the hydrophobic, steric and electronic properties [25]. These descriptors were calculated for each peptide sequence using the R-package peptides (version 2.4.4) [26].

Amino acid index derived sequence-specific peptide (ASP) descriptors

In total, 514 amino acid indices reported in the AA index database [27] were evaluated as relevant descriptors (indices with missing values were excluded). Using each amino acid index listed in this database, four sequence-specific descriptors were calculated for each peptide sequence in the dataset. The first descriptor represents the amino acid positioned at the N-terminus, the second represents the amino acid at the C-terminus, the third value represents the average of the derived indices of the amino acid(s) between the N- and C-terminus, and the last value is the calculated average of all amino acids in the sequence.

Step-wise descriptor selection

A prediction model was generated for each combination of non-sequence-specific descriptors and ASP descriptors. A goodness-of-fit value for the test set (Q2) [28] was calculated to evaluate if the added ASP descriptors had a positive contribution to the retention time predictability. Best performing ASP descriptors were added to the model in a step-wise manner. ASP descriptors were added to the model based on the improvement they gave to the model. A maximum of four ASP descriptors were added to the non-sequence descriptors.

Evaluating model performance for homologous peptides

All prediction models were evaluated on their goodness-of-fit value (Q2) as well as on their ability to predict the retention time difference between homologous peptide structures synthesized in house using the swapped-sequence method. The experimental retention time differences (ΔRTexp) and predicted retention differences (ΔRTpred) between homologous pairs were calculated for all sets of homologous peptides present in our dataset. For each set, the retention time of the last eluting peptide was subtracted with the retention time of the first eluting peptide. Using the same elution order, the ΔRTpred was calculated with the predicted retention times obtained from the model. The goodness-of-fit-value Q2ΔRT, which compares ΔRTpred with ΔRTexp for the homologous peptide structures, was calculated to evaluate the performance of the ASP descriptors for specifically homologous pairs.

留言 (0)

沒有登入
gif