Article Text
Abstract
Objective Our study aimed to systematically review the methodological characteristics of studies that identified prognostic factors or developed or validated models for predicting mortalities among patients with acute aortic dissection (AAD), which would inform future work.
Design/setting A methodological review of published studies.
Methods We searched PubMed and EMBASE from inception to June 2020 for studies about prognostic factors or prediction models on mortality among patients with AAD. Two reviewers independently collected the information about methodological characteristics. We also documented the information about the performance of the prognostic factors or prediction models.
Results Thirty-two studies were included, of which 18 evaluated the performance of prognostic factors, and 14 developed or validated prediction models. Of the 32 studies, 23 (72%) were single-centre studies, 22 (69%) used data from electronic medical records, 19 (59%) chose retrospective cohort study design, 26 (81%) did not report missing predictor data and 5 (16%) that reported missing predictor data used complete-case analysis. Among the 14 prediction model studies, only 3 (21%) had the event per variable over 20, and only 5 (36%) reported both discrimination and calibration statistics. Among model development studies, 3 (27%) did not report statistical methods, 3 (27%) exclusively used statistical significance threshold for selecting predictors and 7 (64%) did not report the methods for handling continuous predictors. Most prediction models were considered at high risk of bias. The performance of prognostic factors showed varying discrimination (AUC 0.58 to 0.95), and the performance of prediction models also varied substantially (AUC 0.49 to 0.91). Only six studies reported calibration statistic.
Conclusions The methods used for prognostic studies on mortality among patients with AAD—including prediction models or prognostic factor studies—were suboptimal, and the model performance highly varied. Substantial efforts are warranted to improve the use of the methods in this population.
- cardiology
- epidemiology
- cardiac epidemiology
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
This systematic review study is the first to identify methodological gaps and assess the performance of the prognostic factors or prediction models among all studies addressing individual prognostic factors or developing or validating prediction models on mortality among patients with acute aortic dissection (AAD).
This review designed a comprehensive questionnaire that included items from both Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) checklists and assessed methodological gaps among all studies.
This review is important that the methodological quality of models designed to support medical decision for patients with AAD, substantial efforts are warranted to strengthen the use of rigorous methods for the accuracy and reliability of the performance in the future research.
The small number of prediction models limits the recommendation in clinical practice, combining international registry of acute aortic dissection (IRAD) score and C-reactive protein model showed better discrimination than IRAD score, future studies may consider updating IRAD model by including other relevant biomarkers, which may further improve prognostic performance.
Our review about the methodological characteristics was primarily based on reporting, which might be cases that the researchers had considered the methodological issues but did not clearly report.
Introduction
Acute aortic dissection (AAD) is a life-threatening cardiovascular disease with high mortality, characterised with acute onset and rapid progression. The mortality of untreated AAD was approximately 1%–2% per hour early following the onset of symptoms, and the overall in-hospital mortality was approximately 27%.1 2 Treatment options for AAD include medical intervention, surgery or endovascular repair, the selection of which mainly depends on complications and prognosis of patients.3 Better understanding of the disease prognosis, ideally predicting the risk of a serious outcome, is highly desirable for medical decision-making and patient communication, among which mortality has the highest priority.
Several published systematic reviews assessed the association of inflammatory biomarkers (eg, C-reactive protein (CRP)) and marker of cardiac injury (ie, troponin) with increased mortality in patients with AAD.4–6 A few studies also developed or validated prediction models for mortality in AAD,7–9 in which a combination of biomarkers, demographic and clinical characteristics were included.8 10–14 As a result, they have received increasing use in clinical practice.
However, limited efforts have been made to systematically examine the performance of the prognostic factors or prediction models. In particular, a comprehensive assessment is strongly needed to investigate whether the published studies—either individual prognostic factor studies or prediction models—meet the desirable methodological rigours for clinical use, since suboptimal methods can compromise the accuracy and reliability of the risk estimation. This is particularly the case for AAD, a disease condition, whereby predictability of an adverse outcome has paramount importance. Therefore, we conducted a systematic review study to identify methodological gaps among all studies addressing individual prognostic factors or developing or validating prediction models on mortality among patients with AAD.
Methods
We conducted this systematic review according to a prespecified protocol, which was not published.
Eligibility criteria
We developed the eligibility criteria under the Population, Index prognostic factor, Comparator prognostic factors, Outcome, Timing and Setting (PICOTS) guidance.15 A study was eligible for inclusion if it included patients diagnosed with AAD; and aimed to identify or assess any prognostic factors for mortality, or develop or validate a prognostic model for mortality in patients with AAD. We excluded a study if it was prediction model for AAD diagnosis only; or the report was a review, comment, letter or editorial, case report, protocol or conference abstract.
Predictors measured at any time point in the course of AAD were eligible. No restriction on study setting was applied; patients with AAD who visited any healthcare facilities were eligible. We defined a prognostic prediction model as a multivariable model, predicting risk of specific outcomes occurring in future by selected predictors.16
Literature search and screening
We searched PubMed and EMBASE from inception to June 2020 for relevant reports published in English language. We conducted the search using the Medical Subject Headings (MeSH) terms and free texts to identify reports about AAD, including ‘aortic dissecting aneurysm’, ‘aortic aneurysm’, ‘aortic dissection*’ and ‘aortic dissecting hematoma’. We applied a validate search strategy for searching prediction models, which proved to have high sensitivity and specificity.17 The full search strategy is presented as online supplemental appendix A. Two investigators (YR and SH) independently screened all searched reports, and resolved any disagreements through discussion with a third investigator (CL). We also manually searched for additional articles from the reference lists of all selected articles.
Supplemental material
Data extraction
We collected the following general information from each eligible study, including first author, year of publication, study aim, region of study, type of aortic dissection, age and sex ratio. We carefully collected information about performance of identified prognostic factors or prediction models, including their names and results about discrimination, calibration, sensitivity and specificity. Discrimination and calibration are the two key measures for evaluating the predictive performance of the prognostic factors or prediction models.18
In order to examine the methods used among these prognoses studies, a team of methods-trained, experienced methodologists expertise with prognostic studies and prediction models convened to develop a questionnaire through a consensus process. They first consulted items from the published statements and tools (eg, PROBAST, CHARMS checklist) about prognoses studies,19 20 and brainstormed for additional items. Subsequently, they discussed the identified items about their relevance for methods, and dropped items that were deemed irrelevant. Finally, they achieved consensus about the items through group discussion and agreement.
Generally, this questionnaire consists of five domains: (1) study design (number of centres, sample size, number of events, data sources, epidemiological design); (2) participants (definition and selection of participants); (3) predictors (definition and measurement of predictors); (4) outcome (definition and measurement of outcomes) and (5) analysis (were all enroled participants included in the analysis, the number of events per variable (EPV), statistical method for selecting and handling predictors, missing data, model structure used in the study and relevant model performance measures evaluated for addressing prognostic factors or prediction models). The questionnaire was presented as in online supplemental appendix B.
Supplemental material
Additionally, we used a risk of bias assessment tool adapted from the PROBAST tool to assess the risk of bias for prediction modelling studies.15 20 The detailed tool and assessment criteria are presented in online supplemental appendix C.
Supplemental material
Statistical analysis
Categorical variables were expressed as the number of frequencies and proportion. For quantitative variables, data were summarised by mean and SD or median with IQR according to normality tests.
Results
In total, 13 555 records were identified, among which 155 were selected for full-text screening, and 32 studies were eligible and included in the final analysis (figure 1).
Flow chart of study selection.
General characteristics of included studies
The 32 eligible studies were published between 2002 and 2019 (online supplemental appendix table 1). Five (15%) were multinational studies, and 21 (66%) were conducted in the USA, China and Europe. The dissection types of patients with AAD were mostly Type-A (n=21, 66%), followed by a mixture of Type-A and Type-B (n=8, 25%). In-hospital mortality was the most frequently used outcome (n=24, 75%, table 1).
Supplemental material
General characteristics about design and conduct of studies
Eighteen (56%) studies aimed to evaluate the performance of prognostic factors. The most commonly investigated prognostic factors were D-dimer (DD, n=8), neutrophil lymphocyte ratio (NLR, n=4) and CRP (n=3). Fourteen (44%) studies aimed to develop or validate a prediction model, of which nine developed a new prediction model without any validation, two developed a new prediction model with internal validation and three conducted external validation with or without updating a prediction model (table 1).
Model performance
The performance of prognostic factors showed poor to strong discrimination (AUC 0.58 to 0.95). The AUC of single prognostic factor ranged from 0.58 to 0.92, and the one for combined prognostic factors ranged from 0.77 to 0.95 (DD and CRP: 0.95; NT-proBNP and aortic diameter: 0.83; Tenascin-C (TNC) and DD: 0.95; TNC and CRP: 0.91; cystatin C and high-sensitivity C-reactive protein: 0.88; UA, DD and age: 0.77,table 2).
Reported discrimination and calibration of prognostic factors or prediction models for AAD
The developed or validated models from 11 studies showed poor to strong discrimination (AUC 0.49 to 0.91), only 6 reported calibrations, and of which 5 reported good calibrations (p>0.05). Rampoldi et al developed a prediction model and reported moderate discrimination (AUC 0.76). But through external validation, scoring systems developed by Rampoldi et al showed poor discrimination (30-day mortality: AUC 0.56, operative mortality: AUC 0.62). Mehta et al (p value for the Hosmer-Lemeshow (H-L) test=0.75) developed a prediction model using International Registry of Acute Aortic Dissection (IRAD) from multinational data and reported good calibration. Through external validation, IRAD score showed moderate discrimination (AUC 0.74), addition of CRP to IRAD score notably improved discrimination (AUC 0.89, table 2).
Methodological characteristics
Among the 32 studies, most were single-centre studies (n=23, 72%). The sample size varied from 35 to 1034 (median 165, IQR, 103–348), and the median number of events was 35 (23–72). Thirteen (41%) studies used prospective cohort study design, and the rest 19 (59%) used retrospective cohort study design, 22 (69%) used data from electronic medical records, 5 (16%) from cohort studies and 5 (16%) from registries (table 3).
Methodological characteristics of included studies
Thirty-one (97%) studies clearly described inclusion and exclusion criteria for participants. The criteria used to define and to measure predictors in the study population were consistent among all included studies. The criteria for outcome definition and measurement was consistent in all but one study13 (table 3).
Twenty-two (69%) studies included all enroled participants in the analysis. In the handling of missing data, 30 (94%) studies reported no missing outcome data, 26 (81%) did not report missing predictor data and 5 (16%) reported that there were some predictors with missing data, and used complete-case analysis to handle missing predictors (table 3).
In 18 prognostic factor studies, 9 (50%) had the EPV more than 20, 8 (44%) between 10 and 20 and 1 (6%) less than 10; 15 (83%) reported discrimination, sensitivity and specificity, other 3 (17%) only reported discrimination, or sensitivity and specificity; and 11 (61%) chose logistic regression model for the analysis, 5 (28%) used cox regression, 2 (11%) only used Receiver Operating Characteristic (ROC) analysis (table 3).
In the 14 prediction model studies, only 3 (21%) had the EPV more than 20, 8 (57%) between 10 and 20 and 3 (21%) less than 10; 10 (71%) chose logistic regression model for the analysis, other four studies used cox regression, support vector machines, neural networks and ROC analysis, respectively. The performance measures were poorly reported: only five (36%) reported both discrimination and calibration statistics. Eleven (64%) studies reported discrimination, measured as AUC of the receiver operated curve, and six (43%) reported calibration, measured as p value for the H-L test. For developing a prediction model, three (27%) did not report any statistical methods and three (27%) simply used statistical significance for selecting predictors; seven (64%) did not report how to handle continuous predictors, four (36%) reported continuous predictor was transformed into categories (table 3).
Risk of bias assessment
The risk of bias for 14 prediction models in the domains of participants, predictors and outcome was low for most studies, while the risk of bias in the domain of sample size and missing data and statistical analysis was generally high (table 4). Studies rated high and unclear risk of bias in the domains of sample size and missing data, due to low number of outcomes per variable (EPV <10), or lack of information about the method on handling missing data. The main reasons for studies rated high and unclear risk of bias in the domains of statistical analysis are as follows: the predictors are selected on the basis of univariable analysis prior to multivariable modelling, lack of information on whether continuous predictors are examined for non-linearity and how categorical predictor groups are defined and either calibration or discrimination are not reported.
Risk of bias of included prediction model studies
Discussion
Summary study findings
In this systematic review, we identified 32 studies addressing prognostic factors or prediction models for mortality among patients with AAD . As noticed in this review, the performance of prognostic factors or prediction models was most commonly evaluated by the AUC and H-L test. Most assessment of prognostic factors demonstrated moderate discrimination. The factors using combined TNC and DD, or combined DD and CRP showed strong discrimination (AUC 0.95). The prediction models showed poor to strong discrimination (AUC 0.49 to 0.91). The prediction model European System for Cardiac Operative Risk Evaluation (EuroSCORE II) showed poor discriminative ability (AUC 0.49) and poor calibration (p value for the H-L test <0.001). One explanation may be that EuroSCORE II is a risk model which allows the calculation of the risk of death after a heart surgery, and is not related to prognosis of patients with AAD, because not all patients with aortic dissection undergo surgical treatment, and some of them undergo endovascular treatment. Mehta et al’s7 model showed better discrimination (0.74) than the EuroSCORE II. Meanwhile, Mehta et al used IRAD from multinational data reported good calibration. Through external validation, IRAD score showed moderate discrimination (AUC 0.74), addition of CRP to IRAD score notably improved discrimination (AUC 0.89). Hence, the prediction model for mortality in AAD should consider including biomarkers as predictors to improve discrimination.
In this systematic review, we found that most studies had small number of sample sizes and events, were derived from a single-centre study and a relatively large proportion of studies chose to use retrospective data. Most studies did not describe information on missing data nor accounted for appropriate statistical methods for handling missing data.
For developing or validating prediction models, we found that most were considered at high risk of bias; the number of EPV in most studies was relatively small, which result in prediction performance of models being possibly biased21 22; most studies did not evaluate both discrimination and calibration. Almost all studies reported discriminative ability of prediction models, while only six studies reported calibration. For developing prediction models, we found that some studies based on statistical significance for selecting variable may lead to suboptimal models; most studies did not report how to handle the continuous variable, and linear assumption may be inappropriate.
Implications for future study
Although some studies showed good discrimination and calibration, our findings highlighted important methodological limitations among those studies. Then it is possible that the result is not accurate and reliable. So in the future, studies about prognostic factors or prediction models for mortality in AAD should enrol large patient population from multicentre setting, meanwhile consider cohort designs, the imputation of missing data. Multiple imputation techniques to deal with missing data are important when evaluating model performance. Excluding cases with missing data may lead to biased results.23
Studies about prediction models for mortality in AAD should consider appropriate methods for selecting variable and handling the continuous variable, and evaluating both discrimination and calibration. The number of participants and events should be planned, and the number of EPV should be at least 10. If the number of events is low relative to the number of predictors, penalised regression may be better than the standard regression. Stability selection and subsampling have demonstrated to yield more stable models based on a consistent selection of variables, so they should be used in future studies for prediction model.24 Discrimination should not be reported in isolation because a poorly calibrated model can present the same discriminative capacity as a perfectly calibrated one.25 Reporting both discrimination and calibration is highly recommended for evaluating performance measures. Validating the prediction models should be considered, as both model development and validation are essential processes for establishing a useful prediction model.26
A prediction model most suitable for clinical practice should include a relatively small number of variables, be easily interpreted and have good statistical performance. Apart from the well-established IRAD model, our review found that the combined IRAD score and CRP model used less variables and showed better discrimination than IRAD score alone. These characteristics may warrant daily practice of the combine model. Moreover, future studies may consider updating IRAD model by including other relevant biomarkers, which may further improve prognostic performance in clinical practice.
Strengths and limitations
To our knowledge, no systematic review looking at the methodology characteristics and performance of prognostic factors or predictive models for mortality in AAD has been published. Whether these existing prognostic factors or prediction models may be used to guide or improve clinical practice remains underexplored. Should we seek better prognostic factors or prediction models? Should we continue using and validating these prognostic factors or prediction models? There is consensus on this issue among commentators. We should seek better prognostic factors or prediction models. Substantial efforts are warranted to strengthen the use of rigorous methods for the accuracy and reliability of the performance in the future research.
A limitation of the present study is that our review about the methodological characteristics was primarily based on reporting. There might be cases that the researchers had considered the methodological issues but did not clearly report. This situation also emphasised the importance of complete reporting.
Conclusions
In conclusion, DD, NLR and CRP predictors were the most commonly used biomarkers, the performance of prognostic factors showed a poor to strong discrimination, the prediction models varied substantially, only six studies reported the calibration, and of which five reported good calibration. Meanwhile, many of these prognostic factors or predictive models are weak methodologically, several important issues are needed to consider for strengthening for predicting mortality in AAD, such as the sample size, the methods for handling missing data, appropriate statistical analysis methods and reporting both calibration and discrimination for prediction models. Substantial efforts are warranted to improve the use of the methods for better care of this population.
References
Footnotes
Contributors Study concept and design was provided by YR. Screening the articles was performed by YR and SH. Acquisition of data was performed by YR, SH and CL. Analysis of data was done by YR and SH. Drafting of the manuscript was by YR. Writing, review and editing were performed by QL, LL, JT, KZ and XS. Study supervision was done by XS.
Funding This study was supported by National Key R&D ProgramProgramme of China (Grant No. 2017YFC1700406 and 2019YFC1709804) and 1·3·5 project for disciplines of excellence, West China Hospital, Sichuan University (Grant No. ZYYC08003).
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval The current study is a secondary analysis of the research data. No ethical approval was required for our study.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information. All data relevant to the study are included in the article or uploaded as supplementary information. The authors confirm that the data supporting the findings of this study are available within the article and its supplementary materials.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.