Article Text
Abstract
Objectives To systematically review and meta-analyse the evidence for effect modification by refractory status and number of treatment lines in relapsed/refractory multiple myeloma (RRMM); and to assess whether effect modification is likely to invalidate network meta-analyses (NMA) that assume negligible modification.
Design Systematic review, meta-analysis and simulation.
Data sources We systematically searched the literature (e.g., OVID Medline) to identify eligible publications in February 2020 and regularly updated the search until January 2022. We also contacted project stakeholders (including industry)
Eligibility criteria Phase 2 and 3 randomised controlled trials reporting stratified estimates for comparisons with at least one of a prespecified set of treatments relevant for use in Norwegian RRMM patients.
Outcomes We used meta-analysis to estimate relative HRs (RHRs) for overall survival (OS) and progression-free survival (PFS) with respect to refractory status and number of treatment lines. We used the estimated RHRs in simulations to estimate the percentage of NMA results expected to differ significantly in the presence versus absence of effect modification.
Results Among the 42 included publications, stratified estimates were published by and extracted from up to 18 (43%) publications and on as many as 8364 patients. Within-study evidence for effect modification is very weak (p>0.05 for 47 of 49 sets of stratified estimates). The largest RHR estimated was 1.32 (95% CI 1.18 to 1.49) for the modifying effect of refractory status on HR for PFS. Simulations suggest that, in the worst case, this would result in only 4.48% (95% CI 4.42% to 4.54%) of NMA estimates differing statistically significantly in the presence versus absence of effect modification.
Conclusions Based on the available evidence, effect modification appears to be sufficiently small that it can be neglected in adequately performed NMAs. NMAs can probably be relied on to provide estimates of HRs for OS and PFS in RRMM, subject to caveats discussed herein.
- myeloma
- myeloma
- statistics & research methods
Data availability statement
Data are available in a public, open access repository. All data and software are publicly available at: https://github.com/multinormal/fhi.rrmm-em.2022 The specific version used to generate the results presented herein is archived at Zenodo: https://doi-org.ezproxy.u-pec.fr/10.5281/zenodo.7919757.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
STRENGTHS AND LIMITATIONS OF THIS STUDY
We systematically reviewed and meta-analysed stratified estimates from phase 2 and 3 randomised controlled trials to estimate the average magnitude of treatment effect modification in RRMM and, hence, inform network meta-analysis.
We used simulation to estimate how many network meta-analysis results are likely to be affected by effect modification under ‘worst case’ assumptions.
This study was performed in parallel with a health technology assessment based on a detailed protocol with prespecified eligibility criteria, but we did not prespecify the analyses reported herein.
There are several outcomes for which treatment effect may be modified, but this work is limited to two outcomes judged to be most important for assessing treatment benefit (overall survival and progression-free survival).
While many variables might modify treatment effect and this study focuses on the two judged to be most important (refractory status and number of treatment lines), we inspected and briefly summarise the evidence for effect modification for all variables, for which stratified estimates were published.
Introduction
A defining characteristic of relapsing refractory multiple myeloma (RRMM) is that patients either do not respond, or stop responding—that is, are refractory—to specific treatments.1 Refractory patients must switch to alternative treatments, if available. Multiple treatments now exist, and treatment regimens often comprise multiple drugs in combination. This naturally leads to questions about treatment superiority. These have been addressed in several systematic reviews that have used network meta-analysis (NMA).2–7
If the assumptions underpinning an NMA model are satisfied, NMA facilitates meta-analytical estimation of all pairs of treatment effects, including between treatments that have not been compared directly in a trial. One of these is the transitivity assumption8–10 which, informally, means that a treatment effect for one comparison can be calculated by adding or subtracting treatment effects for other comparisons in the network. This allows treatment effects to be estimated for pairs of treatments that have not been directly compared by the trials included in the network (ie, indirect comparisons).
NMAs should assess and report on the validity of the transitivity assumption. This requires comparing distributions of effect modifiers across trials.8 An effect modifier is a variable that causes a difference in treatment effect but is not itself a treatment or an outcome.11 12 In plain English: effect modification is about stratification—when effect modification occurs, treatment effect is different for different subgroups of patients. It is important to distinguish between a variable that is associated with treatment effect (a comparison between treatments) and a variable that is only associated with outcome (eg, overall survival (OS) for a particular patient). The former is an effect modifier, but the latter is a risk factor. Number of lines of treatment (LOT) is presumably a risk factor for OS, if for no other reason than patients who have received many LOT will be older. However, that does not mean it is also an effect modifier.
Unfortunately, non-statistical articles on NMA often conflate risk factors and effect modifiers when considering the transitivity assumption. Risk factors are not a concern for NMAs of randomised controlled trials (RCTs) because, in expectation, randomisation excludes the possibility that they account for observed treatment effects. This is one reason RCTs are so useful. However, if a fixed-effects NMA is applied to estimates from trials with different distributions of effect modifiers, the transitivity assumption will be threatened because the estimates have different interpretations, and, with it, the validity of the NMA. That said, the nature and extent to which an NMA may be invalidated by effect modification depends on the magnitudes and directions of the modifications. If modification is small compared with the precisions of the trial estimates, NMA estimates may still be consistent with the true treatment effects (eg, confidence intervals may contain the target parameter values). Random-effects NMAs are designed specifically to address heterogeneity in trial-level treatment effects.
The use of NMA in RRMM has been criticised13 on the basis that variables such as refractory status and LOT are effect modifiers, with the implication that NMAs that do not account for effect modification may be untrustworthy. This article was motivated by a health technology assessment (HTA) we conducted on treatments for RRMM that was commissioned via Norway’s National System for Managed Introduction of New Health Technologies within the Specialist Health Service (‘Nye Metoder’).14 One of our clinical advisors highlighted concerns about effect modification with respect to refractory status and LOT. While these concerns have been raised in previous work,13 we could not find definitive quantitative research on effect modification in RRMM that could inform our HTA. We, therefore, performed a systematic review and meta-analysis of stratified estimates reported by the trials included in our HTA. We then used the meta-analysis results in a simulation study to assess the degree to which NMA estimates are likely to be affected by effect modification.
Methods
This meta-analysis was not prespecified or registered because it was performed in response to comments on a draft of an HTA. Online supplemental table 1 lists the included treatments and their abbreviations. Online supplemental tables 1 and 2 present completed Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklists.15 Further methodological details are available in online supplemental materials.
Supplemental material
Literature search strategy
The search was first performed in February 2020 and was regularly updated until January 2022 (ongoing trials until June 2021). We limited the search to RCTs, used the search term Multiple Myeloma and used MeSH-terms and text words. Halfway through we limited the search to include the terms relapse or refractory. The full strategy is presented in online supplemental materials. We also contacted project stakeholders, including industry, to solicit suggestions for potentially relevant publications. We did not systematically search beyond this work to support our HTA because we are primarily interested in effect modification within the trials included in our HTA. Via manual searching, we found nine articles reporting stratified estimates for the included trials16–24 but used stratified estimates from the main trial publications because they are more likely to have been prespecified.
Inclusion and exclusion criteria
From the identified publications, we included those that provide estimates of HRs for OS or progression-free survival (PFS) that could be included in NMAs (ie, those that report point estimates and a statement of precision such as a CI or p value). We excluded trials comparing doses or schedules of the same treatment.
We excluded publications from meta-analysis if they did not report stratified estimates of HR for all strata for at least one of two potential effect-modifiers (eg, we would have excluded a study if it did report an estimate for lenalidomide-refractory patients but did not report an estimate for patients not refractory to lenalidomide). We excluded publications that did not report numerical statements of uncertainty on stratified estimates (eg, we excluded one study that reported point estimates numerically but only provided a graphical presentation of the CIs).
Statistical analysis
We extracted estimates of HR for OS and PFS, stratified by LOT and refractory status or previous use of immunomodulatory drugs (see online supplemental methods). We first performed pairwise random-effects meta-analyses of stratified HRs, grouped by trial, for refractory status and LOT. This facilitates testing for evidence of effect modification within each trial. Because these analyses yielded very weak evidence for effect modification, but there nevertheless seems to be strong opinions that effect modification does occur and is a problem for NMAs for RRMM, we then performed pairwise random-effects meta-analyses of relative HRs (RHRs; described below) for refractory status and LOT. This facilitates estimation of relative magnitudes of effect modification and allows us to test for effect modification by pooling all evidence of effect modification across trial and treatment comparison.
RHRs were computed for each trial as follows (online supplemental materials provides a plain language introduction to RHR as well as a formal definition; see also online supplemental figures 1 and 2). First, the trial’s strata were sorted to ensure that the order of strata has similar interpretations across trials and are, therefore, amenable to meta-analysis. For example, LOT strata were sorted from fewest to most LOT, and previous lenalidomide use was nominated as the first (ie, reference) level of the refractory status factor variable. Then, we computed the ratio between the HR for each stratum and the HR for its preceding stratum (except for the first stratum, which is the reference). Finally, we ‘inverted’ any of these ratios with a point estimate less than one to ensure that point estimates for all RHRs are greater than or equal to one. This inversion step is necessary to prevent ratios less than one from cancelling ratios greater than one in the meta-analyses and thereby obscuring any evidence of effect modification (see below). SEs on RHRs were computed as described in online supplemental materials. We excluded reference strata from meta-analysis because, as references, they are not defined with respect to another stratum.
The RHR scale removes heterogeneity in direction of treatment effect within and between trials and facilitates meta-analysis across all trials such as to make evidence of effect modification statistically detectable; it, therefore, strongly favours the effect modification hypothesis. A RHR tells us how many times larger a stratified estimate is compared with the estimate for its preceding stratum (or vice versa). If the meta-analytical estimate of mean RHR differs statistically from RHR=1, then we can reject the null hypothesis of no effect modification.
All meta-analyses were performed on the logarithmic scale. We used random-effects models throughout because there are important differences in the definitions of refractory status and LOT used across the trials, which would be expected to manifest as heterogeneity, and which must be accounted for statistically. We present results using forest plots, subgrouped by publication, to report estimates of mean HRs or mean RHRs, 95% CIs, and I2 and p values throughout. We used the conventional p<0.05 criterion for statistical significance. Statistical analyses were performed using Stata 16 (StataCorp LLC, College Station, Texas). Data and code are freely available (see Data availability statement).25 We assessed risk of bias and certainty of evidence for all included studies as part of our HTA and published this information in that report.
Simulation studies
To help understand the degree to which effect modification may affect NMA results, we performed two simulation studies (plus various sensitivity analyses; see Discussion). The purpose of the simulations was to estimate the percentage of NMA estimates that would be expected to be statistically significantly different under effect modification compared with no effect modification, due to refractory status and LOT. Figure 1 shows a cartoon that illustrates the design of these studies.
Cartoon of the simulation study. Each panel shows 1 of the 1000 pairs of simulated networks. Within a pair, the estimates of one network (the topmost in the cartoon) were only subject to simulated heterogeneity, while the other was subject to heterogeneity and effect modification. Each network in the cartoon has five treatments (A, B, …, E), but the simulations used 35 treatments. The magnitudes of direct estimates of effect are indicated by the lengths of the links between treatments (heterogeneity and effect modification affect the magnitudes of the estimates, and in extreme cases, their directions). Direct estimates that are particularly modified are shown as red links. NMA results are indicated by the matrices. Diagonal elements are not considered further (shaded) because there is no treatment effect between a treatment and itself. Lower triangles are not considered further (shaded) because they are identical to the upper triangles except for direction (sign). Corresponding estimates within a pair are tested for equality, and those that differ significantly are counted. Effect modification is quite severe in the first and final simulations illustrated by the cartoon, with seven and eight estimates differing. No estimates are statistically different in the second simulation. These numbers are merely illustrative. By performing many simulations, it is possible to estimate the proportion of NMA estimates that would be expected to be affected by the degree of effect modification observed in the literature. NMA, network meta-analyses.
Each simulation used 1000 pairs of synthetic networks of evidence, generated to be similar in distribution to the real network for PFS (the outcome for which RHRs were estimated to be largest; see Results). Networks within a pair were identical except that one network was subjected to simulated effect modification and the other was not, such that any differences in NMA estimates between the two networks could only be attributed to the impact of effect modification. All networks had the same topology as the network for PFS. Simulated effect sizes (log HRs) and their standard errors were drawn from distributions that matched those for the PFS data.
We used estimates of RHR for PFS because they were larger than for OS (ie, we assumed worst-case scenarios), simulating effect modification by sampling from normal distributions parameterised by mean RHRs and their SEs to account for uncertainty on the estimates of RHR. We fitted random-effects component-NMA models26 to each pair of simulated networks and tested null hypotheses of no differences between corresponding estimates. Testing was performed using two-sided Z-tests using the estimated log HRs and their SEs. Corresponding estimates were deemed to differ if p<0.05. We summarised the results of each simulation as the percentage of estimates expected to be statistically significantly different under effect modification compared with no effect modification. We then repeated these simulations to plot how the percentage of NMA estimates expected to differ varies with RHR (ie, how smaller or larger effect modification may affect NMAs). Simulations were performed using R V.3.5.227 with component NMAs performed using the netmeta28 package (V.1.3–0). Further details are available in online supplemental methods.
Patient and public involvement
Patient interests were formally represented on our board of external advisors, as described in the protocol for our HTA;14 however, this study was developed from the involvement of a clinical advisor.
Results
Systematic literature searching identified 810 references, of which 40 publications contributed stratified estimates (see table 1 and online supplemental figure 3). Table 1 shows which trials could have reported stratified estimates (because they included patients who differ with respect to refractory status or LOT) and did so; trials and publications that could have reported stratified estimates but chose not to; and trials that could not report stratified estimates.
Overview of included publications
Effect modification of HR for progression-free survival
Almost all main trial publications could have reported stratified estimates, but only 17 (40%) publications, representing 8364 patients, did report estimates stratified by refractory status (table 1). Similarly, 18 (43%) publications, representing 7503 patients, did report estimates stratified by LOT (table 1). Within-trial evidence for effect modification of HR for PFS by refractory status and LOT is weak (figures 2 and 3). Only one test for equality of stratified HRs was statistically significant with respect to refractory status (p<0.01 for the comparison of Kd vs Vd29 and another with respect to LOT) (p=0.01 for the comparison of DVd vs Vd).30
Hazard ratios for PFS stratified by refractory status. Statistically significant stratified estimates of HR indicate likely treatment effect in specific patient subgroups. Effect modification would be demonstrated by unequal stratified HRs within trial. Only 1 of the 17 within-trial tests for equality of stratified HRs gives statistically significant results at the 95% significance level. Note the lack of a consistent pattern in the estimates across trial that would lend face validity to the effect modification hypothesis. PFS, progression-free survival.
Hazard ratios for PFS stratified by number of lines of treatment. Statistically significant stratified estimates of HR indicate likely treatment effect in specific patient subgroups. Effect modification would be demonstrated by unequal stratified HRs within trial. Only 1 of the 18 within-trial tests for equality of stratified HRs give statistically significant results at the 95% significance level. Note the lack of a consistent pattern in the estimates across trial that would lend face-validity to the effect modification hypothesis. PFS, progression-free survival.
Mean RHR was estimated to be 1.32 (95% CI 1.18 to 1.49; p<0.005; I2=0%) for refractory status and 1.19 (95% CI 1.09 to 1.30; p<0.01; I2=0%) for LOT (figure 4). No statistical heterogeneity in RHR was observed.
Estimates of ratios of HRs (RHRs) for PFS. The panels show estimates of RHRs constructed under conditions that favour the effect modification hypothesis. The top panel shows RHRs for refractory status and the bottom panel shows RHRs for number of lines of treatment. RHR=1 corresponds to no effect modification. PFS, progression-free survival; REML, restricted maximum likelihood.
Effect modification of HR for OS
Almost all main trial publications could have reported stratified estimates for OS. Only six publications (14%), representing 3471 patients, did report estimates stratified by refractory status (table 1). Similarly, only seven (17%) publications, representing 4072 patients, did report estimates stratified by LOT (table 1). Within-trial evidence for effect modification of HR for OS by refractory status and LOT is very weak, with no tests for equality of stratified HRs demonstrating statistical significance (online supplemental figure 4 and 5).
Mean RHR was estimated to be 1.16 (95% CI 1.01 to 1.32; p=0.03; I2=0%) for refractory status and 1.09 (95% CI 0.98 to 1.20; p=0.12; I2=0%) for LOT (online supplemental figure 6). No statistical heterogeneity in RHR was observed, suggesting that effect modification may be relatively consistent across trial and comparison, and that our broad definitions of refractory status and LOT did not introduce undue heterogeneity.
Simulation study
We estimate that only 0.41% (95% CI 0.39% to 0.42%) of NMA estimates would be expected to differ statistically significantly in the presence versus absence of worst-case effect modification due to refractory status. That is, among the 595 possible comparisons of the 35 treatments in the included trials on PFS, no more than about 2–3 comparisons would differ statistically significantly due to effect modification. We estimate that 4.48% (95% CI 4.42% to 4.54%) of NMA estimates would be expected to differ statistically significantly in the presence versus absence of worst-case effect modification due to LOT. That is, among the 595 possible comparisons, no more than about 30 comparisons would be expected to differ statistically significantly due to effect modification. While the RHR estimated for refractory status is larger than for LOT (see above), the impact of LOT is larger than for refractory status because the simulation assumed four categories of LOT (eg, patients included in trials could have had one of zero, one, two or three previous LOT) and that effect modification compounds over increasing number of LOT (see the Methods).
Online supplemental figure 7 explores how the percentage of NMA estimates expected to differ varies with RHR. Random-effects NMA appears to be quite robust even to very large effect modification due to refractory status. NMA is less robust to modification due to LOT. Looking at an extreme example in which HR is modified by LOT, trials may include patients with up to four levels of this variable, effect modification acts in a consistent direction, and mean RHR=2—that is, a value that is an implausible four times larger (on the log scale) than the published evidence suggests—then we would expect 40% of NMA estimates to be affected.
Discussion
Principal findings
For RRMM, within-trial evidence for effect modification of HR for OS and PFS by refractory status and LOT is weak. Only 2 of 49 tests of heterogeneity were statistically significant (ie, almost exactly the number of type I errors expected at the 95% significance level under the null hypothesis of no effect modification). The largest (ie, worst-case) mean RHR estimated was 1.32 (95% CI 1.18 to 1.49) for HR for PFS with respect to refractory status. We then used simulations to estimate percentages of NMA estimates that may be affected by effect modification. For refractory status, these suggest that even if effect modification is as large as the worst-case estimate, substantially fewer than 1% of NMA estimates are likely to be statistically different than they would be if effect modification does not occur. For LOT, the simulations suggest that fewer than 5% of NMA estimates are likely to be statistically different in the presence of effect modification and heterogeneity. This is higher than for refractory status, but putting this in perspective, 5% is the same as our typical tolerance for type I errors. Absence of evidence is not evidence of absence, and we may simply not have sufficient data to detect the impact of effect modification. Still, if effect modification does occur, we would expect to see consistent patterns supporting effect modification, which we do not. In some cases, estimates increase with refractory status or LOT, in others, it is the opposite, but in most cases, the estimates are practically the same.
Strengths and weaknesses
To our knowledge, this is the first systematic review, meta-analysis and simulation study of effect modification in RRMM. However, it was not prespecified. While stratified estimates were reported in up to 18 (43%) of included publications, most publications did not report stratified estimates. We did not perform a separate literature search but focused on stratified analyses presented in main trial reports because they are more likely to have been prespecified rather than exploratory. It is, therefore, possible that we did not include all available data on effect modification. However, it is probably unreasonable to expect small trials (eg, phase 1) to report stratified estimates, as they would likely be very imprecise and essentially uninformative. Among the phase 2 and 3 trials we included, exploratory logistic regressions suggest no association between trial sample size and reporting of stratified estimates (p>0.05 for all combinations of the potential effect modifiers and outcomes studied). Furthermore, stratified estimates were published for about half as many analyses of OS compared with PFS, despite there being about the same number of publications providing estimates of HR for the two outcomes. Because stratified estimates are not reported in the main trial reports for so many comparisons, it is possible that effect modification is larger than we estimate, particularly for OS.
We systematically reviewed evidence of effect modification with respect to refractory status and number of LOT but did not systematically review other variables. However, we did look at all stratified estimates and did not notice any variables that appeared to consistently demonstrate convincing evidence of effect modification.
Because there was heterogeneity in trial reporting, we were not able to use definitions of refractory status and number of LOT that measured exactly what we were interested in, because doing so would have resulted in almost no synthesisable evidence. We, therefore, used pragmatic and inclusive definitions, particularly for refractory status (see the section Methods). We expected this to introduce heterogeneity, but this was not the case (I2=0% in all analyses).
Because the within-trial evidence of effect modification is so weak, but there are nevertheless concerns in the RRMM research community about effect modification and NMA, we constructed RHR and designed the simulations to strongly favour the effect modification hypothesis. This likely resulted in somewhat exaggerated conclusions about whether effect modification occurs and the extent to which it is problematic.
Quantities such as RHRs, as used in meta-research,31 are likely challenging to interpret, and we suspect that few will have an intuitive understanding of what constitutes a ‘large’ or ‘important’ RHR with respect to effect modification in RRMM. A major strength of this work is that having estimated RHRs, we then used simulations to investigate how many NMA results would be expected to be statistically significantly different under the estimated degree of effect modification. We hope this helps readers understand the likely impact of any effect modification on NMA estimates. However, we remind readers that we used random-effects NMAs32 in the simulations, which are designed to account for heterogeneity in trial estimates. Our results do not necessarily translate to fixed-effects NMAs, as used in some systematic reviews on treatments for RRMM.2 4 It is important to note that fixed-effects and random-effects NMAs make fundamentally different assumptions about transitivity. Fixed-effects NMAs assume that ‘trial-level’ treatment effects can be added and subtracted to make indirect estimates. Random-effects NMAs assume that treatment effect means can be added and subtracted. Random-effects NMA explicitly accounts for differences between trials, including different distributions of effect modifiers.
Finally, we also performed sensitivity analyses to investigate the implications of the assumptions we made in the simulations. For example, because RHR discards direction of modification, we assumed that direction of modification is consistent within treatment comparison but may vary between comparisons. This may not be true, so we performed a sensitivity analysis in which direction is assumed consistent within and between comparisons. The result of this analysis suggests that about half as many estimates would differ statistically compared with the main analysis (ie, that the main result reflects the worst-case).
Strengths and weaknesses relative to other studies
Cope et al qualitatively assessed 12 NMAs or unanchored indirect comparisons in RRMM and, based on expert opinions on variables that may be effect modifiers, concluded that NMA estimates may have been compromised by differences in distributions of effect modifiers.13 Our work is quantitative and does not depend on qualitative assessment or opinion. We are aware of one other attempt at quantifying effect modification through NMA of subgroups, for example, patients with one previous LOT versus 2 or more previous LOT.6 In that work, which was limited to immunomodulatory-containing regimens for RRMM, Dimopoulos et al reported that subgroup analyses yielded results consistent with their main findings (ie, no apparent effect modification).
Implications for research
Explanations of assumptions underpinning NMA are often simplified in articles aimed at non-statisticians. For example, articles tend to use arguments about ‘similarity’ of patients8 rather than more precise language about effect modifiers. Given this oversimplification, it is unsurprising that there are concerns about using NMA in RRMM. The transitivity assumption that standard NMA methods rely on does not concern patient similarity, nor whether treatment effect estimates can be added or subtracted, it concerns whether estimands (estimation targets) can be linearly combined. Patient similarity is a good place to start thinking about NMAs, but a terrible place to stop. Modern statistical methods should be communicated more carefully and received more studiously.
Understanding effect modification is important for making decisions based on individual trials, and for assessing the assumptions and validity of NMAs. We, therefore, suggest that RRMM trialists develop and adopt standardised definitions of potential effect modifiers that, where possible, should be used to report stratified analyses in future trials. In addition to improving transparency and improving consistency of reporting of effect estimates for patient subgroups, standardisation would facilitate more specific meta-analytical study of effect modification by reducing methodological heterogeneity. Furthermore, we suggest that stratified analyses be reported for all patient-important outcomes, particularly OS, which has been dramatically under-reported compared with PFS.
The strength of concerns that effect modification, as it may occur in RRMM, may invalidate NMAs appears to be inconsistent with the available evidence. This suggests that NMA can probably be relied on to estimate direct and indirect treatment effects, subject to some important caveats. First, evidence on effect modification is limited to at most ~40% of comparisons, so it is possible that modification is more severe in the remaining ~60% of comparisons. That said, it would be concerning if large modification occurs but has been systematically unreported in the majority of phase 2 and 3 trials. Second, more evidence on modification is available for PFS than OS, so it is possible that HR for OS is subject to greater modification than the available evidence suggests. This may be because the PFS endpoint is typically reached earlier than that for OS. However, again, it would be concerning if large modification was not being reported for what is arguably the most important outcome of cancer trials. Third, we are not suggesting that a particular NMA estimate can be applied to patients in the clinical setting who are refractory to one or both treatments involved in a given comparison: such estimates would be subject to an obvious, if somewhat absurd, form of effect modification (see the section Implications for clinical practice). A method for ranking treatments for patients who are refractory to specific treatments or components is presented in online supplemental appendix. Fourth, and crucially, our simulations used random effects (cf. fixed-effects) NMA, which account for heterogeneity in treatment effects, such as from effect modification. Our findings are unlikely to generalise to fixed-effects NMAs. Finally, NMA should be able to be used to make indirect comparisons if effect modification is negligible for all direct comparisons (as the evidence suggests) and there is no good reason to believe that non-negligible modification would occur for treatment comparisons that have not been made directly. However, it would be preferable to have direct evidence.
Implications for clinical practice
In general, an RCT comparing a pair of treatments should not recruit patients who are refractory to one or both treatments being compared. Excluding patients who are refractory to the two comparators ensures that the treatment effect estimate is conditional on patients not being refractory to either treatment. If no substantial effect modification occurs (eg, due to being refractory to other treatments, or due to the number of lines of previous treatment), then estimates from NMAs based on such RCTs are, therefore, also conditional on patients not being refractory to any of the treatments included in the NMA.
Weak evidence of effect modification should not be misunderstood to mean that refractory status, for example, is unimportant for making a treatment decision about a specific patient in the clinic. If a specific patient is refractory to a given treatment, then treatment effect estimates that are conditioned on the patient not being refractory to the treatment cannot be used to support a treatment decision about that patient (this includes estimates from individual RCTs and NMAs). However, effect estimates comparing treatments to which the specific patient is not refractory remain valid. The online supplemental materials describe how NMA results can be used to rank treatments for refractory populations.
Conclusions
There is very weak within-trial evidence for effect modification with respect to refractory status and number of previous LOT. It is plausible that effect modification does not occur with respect to these variables or is so small as to be statistically undetectable, even in phase 3 trials. If this is true, then differences in the distributions of these variables across trials are unlikely to be a problem in NMAs. We were only able to detect effect modification by performing meta-analyses across trials under assumptions that strongly favour the modification hypothesis. These assumptions may not hold, so our estimates of the magnitude of effect modification may be exaggerated, as may our estimates of the percentages of NMA estimates that would be expected to be affected.
Adequately performed random-effects NMAs can probably be relied on to provide estimates of mean HRs for OS and PFS, subjected to the caveats discussed above.
Data availability statement
Data are available in a public, open access repository. All data and software are publicly available at: https://github.com/multinormal/fhi.rrmm-em.2022 The specific version used to generate the results presented herein is archived at Zenodo: https://doi-org.ezproxy.u-pec.fr/10.5281/zenodo.7919757.
Ethics statements
Patient consent for publication
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors All authors approved the published version and agreed to be accountable for all aspects of the work. Study conception and design: CJR. Acquisition, analysis or interpretation of data: CJR, IKO, LG, GEN, AF. Drafting the manuscript: CJR, IKO, GEN, AF. Critical revision of the manuscript for important intellectual content: IKO, LG, GEN, AF. Statistical analysis: CJR. Supervision: AF. Guarantor: CJR.
Funding The authors conducted this research under the employ of the Norwegian Institute of Public Health (Folkehelseinstituttet). The work was funded via Norway’s National System for Managed Introduction of New Health Technologies within the Specialist Health Service (Nye Metoder). The funder had no role in the conduct of this work.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.