Impact of stromal tumor-infiltrating lymphocytes (sTILs) on response to neoadjuvant chemotherapy in triple-negative early breast cancer in the WSG-ADAPT TN trial

Association of baseline sTILs with clinical/pathological parameters

Baseline sTIL (sTIL-0) measurements were available in 323 (96.1%) patients (mean 29.5%, SD 24.4%, median 20%). Associations between sTIL-0 measurements and clinical/pathological categories are summarized in Table 1. Node-positive status and central grade 3 were associated with higher sTIL-0 levels, while age and tumor size showed no significant association.

Table 1 Associations of population with available sTIL-0 values (n = 323) with clinical/pathological characteristicsDiscrete coding of baseline and 3-weeek sTILs

As explained above, considering prior evidence, sTIL-0 was also coded as binary variable “baseline sTIL status”: “TIL+” (sTIL-0 ≥ 60%, known as “lymphocyte -predominant breast cancer”) versus “TIL−” (sTIL-0 < 60%). The same cutoff was used for sTIL-3, but a coding scheme for an attribution of missingness was required: Among the 110 samples with missing sTIL-3 measurements, 63 cases (57.3%) were attributable to “low cellularity.” For clinical interpretation, it is important to distinguish between missing sTIL-3 due to low-cellularity status and missingness for other reasons: Low cellularity is characterized by tumor necrosis and consequent lack of invasive tumor cells and is thought to result from extensive response to neoadjuvant therapy after 3 weeks. Since low-cellularity status represents a biological feature of response to therapy, it is likely to be informative for pCR and survival, whereas missingness for other reasons is presumably unrelated to therapy and less likely to be informative for pCR and survival. Therefore, a 3-week sTIL status variable was defined with three categories: In addition to the categories “3wTIL+” (TIL-3 ≥ 60%) and “3wTIL−” (TIL-3 < 60%), a third category “3wLC” was coded for “low cellularity” in the 3-week biopsy. With this coding, n = 286 patients were available for analysis.

Dynamics of sTILs under NACT

sTIL measurements at 3 weeks (sTIL-3) were available in 226 (67.3%) patients (mean 38.4, SD 27.9%, median 30%). Paired sTIL-0 and sTIL-3 measurements were available in all 226 of these patients. The estimated Pearson correlation between sTIL-3 and sTIL-0 in paired measurements was r = 0.655 [0.572–0.723] (p < .001). The mean increase was 9.20% [6.32–12.09%] (p < .001, paired t-test).

Table 2 summarizes the transitions in the sTIL status variables from baseline to week 3 (including low cellularity) under NACT according to trial arm. In the trial as a whole, lymphocyte-predominant status (TIL+ vs. TIL−) at baseline was not predictive for low cellularity at 3 weeks: The percentages were 19.6% (TIL +) versus 21.2 (TIL−) (p = .79), suggesting that the estimated mean increase in sTIL levels on therapy (see above) could be relatively unbiased, despite the substantial proportion of missing values. The percentage of low-cellularity cases among patients with TIL+ at baseline was higher in the NP/C arm (p = .008), whereas the percentages among patients with TIL− at baseline were about the same. Evidently, after 3 weeks of therapy, the initial pool of patients with favorable sTIL levels at baseline appears to have been more strongly “depleted” (due to low cellularity) in NP/C than in NP/G.

Table 2 Dynamics of sTILs from baseline to 3 weeks by arm (n = 282)Associations of sTIL-0 and sTIL-3 with pCR

For analysis of the association of sTIL-0 and sTIL-3 measurements with pCR, 311 and 223 patients were available, respectively. In patients with pCR (n = 110), mean sTIL-0 levels (36.0%) were more than 10% higher than in non-pCR patients (n = 201, mean = 25.7%) (p < .001). Similarly, sTIL-3 levels were more than 13% higher in patients with pCR (n = 63, mean = 47.9%) than in non-pCR patients (n = 160, mean = 34.8%) (p = .002). There was no significant association of changes in sTIL levels with pCR. The dynamics of sTILs and their association with pCR are illustrated in Additional file 2: Fig. S2 as a scatter plot of 3-week versus baseline measurements, marked by pCR status.

The significant association between sTIL-0 and pCR persisted when analyzed separately by treatment arm. In NP/G, mean sTIL-0 values of 40.1% versus 28.1% were observed among patients with (n = 49) versus without (n = 124) pCR, respectively (p = .004). In NP/C, mean sTIL values of 32.7% versus 21.5% were observed among patients with (n = 61) versus without (n = 77) pCR, respectively (p = .008).

Regarding 3-week levels (sTIL-3), the trial arms showed rather different behavior: In NP/G, sTIL-3 levels were more than 20% higher on average among patients with pCR (n = 29, mean = 58.7%) than among those without pCR (n = 102, mean 37.6%) (p < .001), whereas in NP/C, the difference was not significant (p = .12); mean sTIL-3 among pCR patients (n = 34) was 38.8% compared to 29.9% in non-pCR patients (n = 58).

Among patients with pCR, the percentage of those with 3wTIL+ was 35.3% in NP/G versus 17.9% in NP/C (p = .03). Among patients with non-pCR, the percentage of those with 3wTIL+ was 29.1% in NP/G versus 12.7% in NP/C (p = .006).

ROC analysis was carried out in order to explore the overall performance of both sTIL measurements as predictors of pCR versus non-pCR (Additional file 3: Fig. S3A/B). In particular, the AUC for sTIL-0 is 0.600 [0.531–0.668] (p = .004), while the AUC for sTIL-3 is 0.628 [0.545-0.712] (p = .003). For both measurements, overall performance is thus significantly, but only moderately better than random chance (AUC = 0.5).

Analyzing separately by trial arm, we find that the performance of sTIL-0 is comparable in NP/G (AUC = 0.609 [0.509-0.710], p = .025) and NP/C (AUC = 0.617 [0.521–0.713], p = .018), respectively. However, the performance of sTIL-3 in terms of AUC appears to be far greater in the NP/G arm (AUC = 0.711 [0.605-0.816], p = .001) than in the NP/C arm (AUC = 0.584 [0.460–0.709], p = .178), where it is not even significantly higher than random chance.

Figure 1A, B shows the variation of sensitivity, specificity, and positive predictive value as a function of cut point as well as the percentage of patients classified as “high” (≥ cut point) for sTIL-0 and sTIL-3, respectively. Since a considerable density of data is presented in these figures, we highlight the key performance statistics for sTIL-0 (obtainable from the graph) at selected cut points proposed in the literature. For the cutoff sTIL-0 ≥ 60% (lymphocyte -predominant breast cancer), 13.0% of patients had sTIL-0 above the cutoff (solid curve). Among those patients, the PPV (here, estimated conditional probability of having a pCR, given sTIL-0 ≥ 60%) was 59.3% (dotted curve); the specificity was 88.1% (short-dashed curve); however, the sensitivity was only 31.8% (wide-dashed curve), reflecting the low percentage in the high group defined by this cut point. For the cutoff sTIL-0 ≥ 30% used by Loi et al. [11], 44.0% are addressed (high group). In this group, the PPV is 42.6%, i.e., the majority of patients with sTIL-0 above the cut point still did not have pCR. The sensitivity was 52.7%, and the specificity was 61.2%. Finally, defining a cutoff as < 15% would put 33.8% in the “low” group. Here, it makes sense to discuss the prediction of non-pCR: The predictive value (probability of non-pCR, given sTIL-0 ≤ 10) was 69.5%, while the sensitivity was 36.3%, and the specificity was 70.9%. The performance of sTIL-3 at these or other cut points can be derived analogously from Fig. 1B.

Fig. 1figure 1

Sensitivity (long-dashed curves), specificity (short-dashed curves), PPV (dotted curves) and % addressed (solid curve) as a function of cut point in ROC analysis for A sTIL-0 and B sTIL-3 (lower panel). Vertical lines indicate particular cut points discussed in the text. The rightmost cut point corresponds to “lymphocyte-predominant” status

For comparison with previous work, we note that the odds ratio associated with each 10% increase in sTIL-0 was 1.19 [1.08–1.31] (p < .001); the odds ratio associated with each 10% increase in sTIL-3 (among those with measurements) was 1.18 [1.06–1.31] (p = .002).

Discrete variables and pCR

For both sTIL-0 and sTIL-3, the defined categories were strongly associated with pCR (Table 3, upper and lower panel, respectively). The odds ratio for TIL+ versus TIL− was 4.56 in NP/G and 2.59 in NP/C, but this apparently differing impact by trial arm was not significant (p = .35). The category 3wLC appears relatively favorable for pCR even in comparison with 3wTIL+ (p = .07, Hosmer–Lemeshow test). As we will shortly see, this relative favorability does not persist, however, with respect to iDFS:

Table 3 Associations between pCR and both sTIL-0 and sTIL-3 (upper panel: pCR by baseline sTIL categories; lower panel: pCR by 3-week categories)Association between sTIL-0 and sTIL-3 and IDFS

Median follow-up among surviving patients was 36 months. At this follow-up, the substantial advantage of the NP/C arm regarding pCR (OR = 2.11) was not reflected in a significant iDFS advantage. However, as previously reported [11], higher baseline sTIL levels were favorably associated with iDFS in univariate Cox analysis.

Defining groups by dichotomized baseline sTILs: in all patients, the lymphocyte-predominant group (TIL +) had estimated 3y-iDFS of 86.0% (95%-CI [76.2% to 95.8%]), while the group with TIL− had 3y-iDFS of 76.8% (95%-CI [71.1% to 82.4%]) (p = .11, Kaplan–Meier, Fig. 2A). The corresponding iDFS curves in the subset of patients with non-pCR (residual disease) and in those with pCR are shown in Fig. 2B, C, respectively. Despite the visual impression of superior iDFS for TIL+ overall and in the non-pCR subset, the differences were not significant (recalling that follow-up was 36 months).

Fig. 2figure 2

iDFS in Kaplan–Meier analysis for TIL+ (baseline lymphocyte-predominant status) versus TIL− in A all patients, B patients with non-pCR and C patients with pCR

Regarding sTIL-3 (Fig. 3A): Defining groups according to the nominal composite variable coding described above, one finds a significant advantage for 3wTIL+ versus 3wTIL− in all patients, with about 17% higher 3y-iDFS; the group coded 3wLC (low cellularity) had iDFS between the other two groups). This advantage remains significant and similar in magnitude in the non-pCR subset as well (Fig. 3B) but not in the pCR subset (Fig. 3C).

Fig. 3figure 3

iDFS in Kaplan–Meier analysis for 3-week measurements, i.e., 3wTIL+ versus 3wTIL− versus 3wLC (low cellularity), in A all patients, B patients with non-pCR and C patients with pCR and D iDFS in Kaplan–Meier analysis for subgroups defined (see text) by sTIL transitions from baseline to 3 weeks

The six combinatorically possible dynamic transitions of baseline to 3-week sTIL categories were coded as explained above in a nominal variable and analyzed for iDFS. The curves 3y-iDFS statistics and significant pairwise comparisons are shown in Fig. 3D. The most favorable combination was the transition TIL+ to 3wTIL+ with estimated 3y-iDFS of 96.6%. The superiority compared to the transition TIL− to 3wTIL− is unsurprising in view of the immediately preceding iDFS comparisons. However—recalling that for pcR, the 3wLC category was consistently more favorable than 3wTIL+ (Table 3)—for iDFS, the transition TIL+ to 3wTIL+ was far superior (25% higher 3y-iDFS) to the transition TIL+ to 3wLC. Despite low absolute numbers, this difference suggests that the reliability of pCR following neoadjuvant therapy as a surrogate for survival could vary among subgroups in TNBC. There was no significant association of the change in sTIL levels with iDFS.

Mediation analysis

As explained above, mediation analysis according to the methodology of Baron and Kenny has performed to quantify the degree to which sTILs represent independent predictors of iDFS in this trial—beyond their influence through pCR. Mediation analysis can reveal the relative importance of distinct biological mechanisms for impact of immune response on survival under NACT in TNBC.

The first step of mediation analysis, demonstration of an impact of sTILs on pCR (favorable, e.g., OR 3.44 for TIL+ versus TIL−, OR 2.19 for 3wTIL+ versus 3wTIL−), was presented above for both baseline and 3-week measurements (Table 3). The next step is to show that pCR has a significant univariable impact on iDFS (HR = 0.27 [0.14–0.53] (p < .001)). Note however that this association may or may not imply causation.

The remaining mediation analysis consists of univariate Cox regression of each TIL variable on iDFS, followed by multiple Cox regression including pCR (with an interaction test); the results are summarized in Additional file 4: Table S1 for baseline sTIL and 3-week sTIL as continuous variables and in Additional file 4: Table S2 for baseline sTIL and 3-week sTIL categories. One then compares the adjusted HRs (including pCR) of the sTIL markers (continuous as in Additional file 4: Table S1 or categorical as in Additional file 4: Table S2) with their corresponding unadjusted HRs. Mediation is suggested if the adjusted and unadjusted HRs differ substantially.

We consider first the continuous models (Additional file 4: Table S1): For both sTIL-0 (baseline) and sTIL-3, there is no evidence for a substantial difference between adjusted and unadjusted HR (noting that sTIL-0 is not significant when pCR is included). We therefore find no evidence for mediation in this continuous analysis.

The analysis of sTIL categories (Additional file 4: Table S2) is slightly more complicated but offers the advantage that low cellularity is included: In the case of sTIL-0 (baseline), we see that while the uHR of 0.56 would suggest a stronger impact on iDFS than the adjusted HR of 0.77 for TIL+ versus TIL−, the corresponding HRs are not significant in either model, consistent with the log-rank tests of Fig. 1A–C. Had the uHR been significant, the difference would have suggested that a favorable effect of high baseline sTILs was at least partially mediated by pCR.

For the factor low cellularity at 3 weeks (3wLC), which was a stronger predictor of pCR than 3wTIL+, neither the unadjusted nor the adjusted HR was significant for iDFS. Again, any impact of low cellularity on iDFS was apparently either confounded with pCR or mediated by pCR. It is worth noting that the HR for pCR remained about the same in unadjusted analysis (see above) and in all models including TIL measurements (Additional file 4: Tables S1, S2).

What we see in the 3-week analysis for 3wTIL+ versus 3wTIL− is that the unadjusted hazard ratio estimate uHR = 0.42 was (only) slightly more favorable than the adjusted estimate aHR = 0.48, while both were significant. The lack of a significant difference was verified by interaction analysis. Hence, the data suggest that 3wTIL+ is a significant prognostic factor for iDFS, independent of pCR, whose impact is at most only partially mediated by pCR.

留言 (0)

沒有登入
gif