Article Text
Abstract
Introduction Interstitial lung disease (ILD) patients may develop a progressive phenotype usually characterised by progressive pulmonary fibrosis. While this condition is life-limiting, wide variations in its clinical course have made it difficult to predict the rate of disease progression, onset of acute exacerbations and mortality. New approaches are needed to predict the clinical course of ILD, to enable treatment planning, evaluation and clinical trial design. Advances in digital health technologies have facilitated the ability to collect ‘real-time’ data to monitor diseases. These data, including physiological measures, activity indices and patient-reported outcomes, may be useful as components of new outcome predictors. The objective of this study is to first deploy comprehensive data collection enabling deep profiling of patients with ILD and to use these data to develop better predictors of outcome. Finally, these predictions will be evaluated based on real observed outcomes for individual patients.
Methods and analysis This study is a prospective cohort study with 50 participants. Inclusion criteria: Age 18 years or older with a diagnosis of ILD and the ability to provide written informed consent. Exclusion criteria: Age under 18 years or unwilling to wear a smartwatch for the duration of the study. Participants will be provided with a smartwatch to passively collect biometric data. These data will be combined with clinical history and course, in addition to a set of patient-reported outcome measures. Participants will be followed for 3 years to assess the rate of disease progression, occurrence of acute exacerbations and mortality. Initial data will be used to develop clinical prediction models. These models will be further evaluated for accuracy using regular follow-up data.
Ethics and dissemination This study was approved by the St. Vincent’s University Hospital Research Ethics Committee, Dublin, Ireland (reference no: RS23-023). Results will be presented at medical conferences and disseminated via peer-reviewed journals.
- Interstitial lung disease
- Observational Study
- Patient Reported Outcome Measures
- Wearable Electronic Devices
- Machine Learning
- Digital Technology
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- Interstitial lung disease
- Observational Study
- Patient Reported Outcome Measures
- Wearable Electronic Devices
- Machine Learning
- Digital Technology
STRENGTHS AND LIMITATIONS OF THIS STUDY
Real-time passively collected data will give deeper insights into patient physiological measures, while not overburdening the participant to provide active measures.
Study devices - training on device use and ongoing support will be provided, overcoming potential sources of bias and barriers to participation in terms of access to technology and tech literacy.
Interstitial lung disease is a rare disease; therefore, our sample size is small at 50 participants, which may limit the robustness of prediction modelling. However, we will collect a large volume of data points and anticipate a high event rate in this population over the 3 year follow-up period.
By using the rule of thumb of 10 events per variable, we expect to have sufficient information for the development of the model; though we acknowledge a larger dataset and patient population will likely be required for the further validation of the predictor.
Interstitial lung disease outpatient clinic attendees in Ireland are not an ethnically diverse population, which may limit the generalisability of the results.
Introduction
Interstitial lung disease (ILD) is a heterogeneous group (over 200 disorders) of non-infectious, predominantly diffuse and usually chronic respiratory disorders.1 The disease affects the interstitium as well as the alveolar and airway architecture.2 Some ILDs are characterised by progressive pulmonary fibrosis, such as in idiopathic pulmonary fibrosis (IPF) and progressive pulmonary fibrosis (PPF).3 IPF is a chronic, fibrosing interstitial pneumonia of unknown cause, associated with radiological and histological features of usual interstitial pneumonia.3
In 2017, the incidence of ILD in Ireland was 7.66 per 100 000 population for men and 4.2 per 100 000 for women.4 Mortality rates were 5.18 per 100 000 and 2.73 per 100 000, respectively;4 this is in line with the median incidence in Europe in 2017.4 However, these figures could be underestimated, as the British Lung Foundation has reported 6000 people diagnosed with IPF annually.5 IPF occurs primarily in the older adult population, with onset of disease in the sixth or seventh decade.6 More men have been reported with IPF than women, and most patients have a history of cigarette smoking.7 Specialist care for ILD patients is delivered by respiratory multidisciplinary care teams at eight clinical centres across Ireland.8
Clinical manifestations include chronic exertional dyspnoea and cough. The disease is characterised by progressive worsening of dyspnoea and lung function, progressive fibrosis on high resolution computed topography, acute respiratory decline with a median survival of 3–5 years.2 3 Patients with IPF are susceptible to abrupt declines in lung function (acute exacerbations) which typically develop in less than 1 month, are accompanied by new radiological abnormalities on high resolution computed topography,9 and are a major indicator of morbidity and mortality, with acute exacerbation preceding approximately 40 per cent of IPF deaths. Median survival following an acute exacerbation is approximately 3–4 months.9 A significant challenge is the fact that IPF and other ILDs often follow an unpredictable clinical course, which impedes physicians’ ability to predict the rate of disease progression, acute exacerbations and survival.10 This impacts new therapeutic development, as the absence of clear signals of progression impacts on trial design. Given that ILDs are rare diseases, clinical investigations require additional precision.
Numerous efforts have been made to develop reliable prediction models for ILD patients.11 The upgraded CRP (Clinical, Radiological and Physiological)12 scoring system incorporates several parameters: age, smoking history; clubbing; extent of profusion of interstitial opacities, and presence or absence of pulmonary hypertension on the chest radiograph; per cent predicted total lung capacity; and the partial pressure of oxygen in arterial blood at the end of maximal exercise. However, it failed to take into account the significance of gender in predicting survival in patients with IPF. Wells et al13 proposed the composite physiological index (CPI). This index only included pulmonary function test results, overlooking radiological findings in predicting prognosis. Du Bois et al14 and Richards et al,15 respectively, developed predictive systems based on IPF diagnostic criteria and biomarker predictive models. However, these models have been difficult to use, lack validation, focus predominantly on IPF and have not been widely adopted. The GAP model suggested by Ley et al,16 which uses four variables: gender, age and two pulmonary physiological parameters, has proved more straightforward to use. However, a validation study for prognoses of patients with IPF for each GAP score suggested a need to refine the model in terms of groups included in stage 1.11 Another limitation is its overestimation of risk in lower-risk groups.16 A modified ILD-GAP index added an ILD subtype variable to the prediction model. Both GAP and ILD-GAP have been widely used in the clinical setting to help predict mortality. More recent studies to take account of comorbidities suggest combining ILD-GAP with the Charlson Comorbidity Index score to better predict ILD-related events.17 GAP6 suggests adding the 6-minute walk test to include functional capacity to the model.18 These suggested additions to the model have not yet been externally validated.
Recent attention has focused on how monitoring using digital health technologies can be leveraged to understand the course and progression of the disease, with evidence that home monitoring trials have shown good adherence.19–27 Wearable devices can passively collect continuous biometric data, including heart rate,28 blood oxygen levels and objective measures such as activity levels, which may provide insights into global cardiopulmonary status and longitudinal trends. Pairing this with mobile patient-reported symptoms and other clinical measures may provide opportunities to better understand and predict the course of IPF.
Study objectives
The overarching objective of this study is to develop a rich discovery dataset (deep phenotype) to identify and explore prediction models built on passively collected physiological data and actively collected patient-reported outcome data which can predict short-term and long-term clinically meaningful outcomes for patients living with IPF. Wearables to collect physiological and activity measures and remote data approaches to collect electronic patient-reported outcomes will be used. This real-time data will be combined with clinic data to provide a rich dataset (deep phenotype) for each patient. These data will be used to predict outcomes for ILD patients using both traditional prediction modelling approaches and more recent machine learning-based strategies. Patient’s actual outcomes will be recorded and compared with predicted outcomes to determine the utility of the prediction models.
Primary objective
The primary objective of this study is to develop prediction models for clinical outcomes by looking at patterns in data types and the significance of those patterns for individual participants. It is envisaged that we will produce three prediction models for the following clinical outcomes for ILD patients:
Disease progression prediction model.
Acute exacerbation prediction model.
Mortality prediction model.
Secondary objective
To produce a prediction model dataset for each participant which could be used as a synthetic arm for an n=1 clinical trial.
Methods and analysis
Design
This will be a prospective, cohort study in patients with ILD using wearable devices (smartwatch) and electronic patient-reported outcome measures (phone app) combined with patient healthcare record data to predict clinical outcomes.
Study setting
Patients diagnosed with IPF or PPF who have been referred to St. Vincent’s University Hospital Interstitial Lung Disease Clinic will be invited to join the study by the specialist team. The specialist team at the study site is made up of respiratory physicians, nurse specialists, respiratory physiotherapists, rheumatologists, radiologists and pathologists.
Inclusion and exclusion criteria
Patients will be considered eligible for enrolment to this study if they fulfil the inclusion criteria and none of the exclusion criteria, as defined below.
Patient inclusion criteria:
Age 18 years or older.
Diagnosis of IPF or PPF.
Ability to provide written informed consent.
Patient exclusion criteria:
Age under 18 years.
Patients who are unwilling to wear a smart watch for the duration of the study.
Cognitive impairment or inability to understand and follow instructions which would limit the patient’s understanding of the project or the measurement.
Enrolment
Patients will be recruited from ILD outpatient clinics at St. Vincent’s University Hospital. Participants will be identified as suitable candidates by the specialist team and will be offered information about participation in the study. They will be given an information leaflet and will be directed to the study website (https://prodigy-ild.ie) for further explanation of the study and what will be expected of them. Once participants have had time to consider if they would like to take part in the study, they will be asked to sign the informed consent form (online supplemental material) and enrolled in the study. It is anticipated that recruitment to this study will take place over a 6 month period concluding in Quarter 4, 2024. Participants will be followed for 3 years.
Supplemental material
Sample size
The primary objective of this study is to develop prediction models for clinical outcomes (disease progression, acute exacerbations and mortality) by looking at patterns in data types and the significance of those patterns for individual participants. The development of a prediction model first requires a developmental dataset containing the likely predictor values, which are used to build the model. The sample size for the development dataset must be sufficiently large to enable a model to be developed which can subsequently be tested. In this study, we are using the 10 events per variable rule of thumb to estimate the sample size for the developmental dataset.29 30 Using this approach, we estimate that 50 participants will provide sufficient events to enable the development of the prediction modelling. It should be noted that given the progressive nature of this disease and the proposed 3 year follow-up, we anticipate that all patients will progress with exacerbations, hospitalisations and death occurring within the study time frame, in most participants. Depending on the predictor variable ultimately used, a larger sample size for the validation set will be enrolled. The sample size was determined based on the need to achieve reliable and generalisable results for this objective. Sampling for rare diseases is inherently challenging due to limited patient populations. Given the prevalence of ILD (rare disease), a sample size of 50 is feasible within the available time frame and resources.
Study procedures
Baseline visit
Once a participant has been onboarded to the study, a research record will be created using demographics, clinical history, diagnostic and disease relevant information. At baseline, participants will complete a questionnaire to collect outstanding data on demographic information, medical history, medication and smoking status. Diagnostic radiology, pulmonary function and lab results will be collected directly from the patient record.
Digital health technology
Study data will be collected and managed using REDCap (Research Electronic Data Capture) hosted at UCD Clinical Research Centre.31 32 REDCap is a secure, web-based software platform designed to support data capture for research studies. As part of enrolment, patients will be assigned a unique study identification number in the REDCap study database. The unique study identification number will be used to name the participants’ wearable device and to anonymously track their data throughout the study. The study team will provide each participant with a wearable device (Apple Watch Series 6 or above) and an iPhone 8 or above if required. Training and onboarding will take place at the baseline visit. In addition, MyCap, a companion app to REDCap research software, will be used to collect patient-reported outcome measure (PROM) data electronically, to enable comprehensive remote PROM collection over time.33 The MyCap app will be downloaded to the participants’ phone, and participants will be enrolled to the app by scanning a QR code they are presented with on their study app. Participants will receive a prompt ‘You have a MyCap activity due today’ at 08:00 on the day a questionnaire is scheduled to be completed. In the case of a non-response to questionnaires, participants will be sent a message to respond to the questionnaire. A MyCap on-screen prompt will flash up to say, ‘You have a secure message waiting’. When participants open the MyCap app, they can check their messages and complete the relevant questionnaire which is due. The questionnaire will not expire; therefore, follow-up messages for outstanding questionnaires can be sent to the participant. If there is still no response, a phone call will be made to ensure participants are aware of when to engage with the study questionnaires. A further escalation, if there is still no response to engage with study questionnaires, will be to call on the participant to troubleshoot issues they are having with the study devices.
Patient-reported outcome measures
Participants will be required to complete the following survey instruments at enrolment and at regular intervals, as per table 1.
Schedule of patient questionnaires
Breathlessness Questionnaire—modified Medical Research Council dyspnoea scale mMRC,34 35 a self-rating tool used to assess the degree of baseline functional disability due to breathlessness on a scale from 0 to 4.
Cough Severity Questionnaire—VAS,36 a Visual Analogue Scale for cough severity where the patient indicates the severity of their cough over the last 2 weeks on a scale of 0 to 100. A ≥30-mm reduction in cough is considered a clinically meaningful change threshold for clinical trials in chronic cough.35
Cough Quality of Life Questionnaire—The Leicester Cough Questionnaire (LCQ),37 is a 19-item cough questionnaire comprising three health domains: physical, psychological and social to assess the impact of cough in the previous 2 weeks which takes less than 5 minutes to complete.
Fatigue Questionnaire—Fatigue Assessment Scale (FAS),38–40 is a 10-item self-report scale (1–5) evaluating symptoms of physical and mental fatigue. A total FAS score<22 indicates no fatigue, a score≥22 indicates fatigue.
King’s Brief Interstitial Lung Disease (KBILD)41 42 is a self-completed 15-item validated ILD-specific measure of health-related quality of life, consisting of three domains: physiological (3,5,6,8,10,12,14), breathlessness and activities (1,4,11,13) and chest symptoms (2,7,9). The KBILD domain and total score ranges are 0–100; 100 represents best health status.
Wearable devices
Study participants will be provided with an Apple Watch (Series 6 or later) wearable device and charging cable. The Apple Watch Series 6 pairs with the iPhone 8 or above, and collected data can be viewed in the iPhone Health app. The watch will passively collect continuous real-time data on heart rate, blood oxygen levels, activity levels and 6-minute walk distance (online supplemental table 1). All data collected by the smartwatch automatically synchronises to the Health app on the participants’ iPhone when participants have internet connection. The Health app has a function to export all health data collected. This anonymous file will be exported from the participant Health app while participants are attending clinic or during a home visit if participants do not attend a clinic appointment within 6 months of their baseline date. The export file will be transferred to a research MacBook using Apple Airdrop secure file transfer. From there, it will be uploaded to the secure UCD research Google Drive, where it will be stored for later analysis.
Supplemental material
Participants will be required to wear the study wearable device for at least 20 hours per day and will be responsible for keeping the device charged to allow this. Training materials will be provided in hard copy or can be emailed to the participant in electronic format. Participants will be shown how the watch must be worn snug to the wrist, to ensure correct readings are recorded while in use.
Study procedures
Follow-up visit
Participants’ follow-up visits will coincide with clinic visits every 4 months. It is therefore anticipated that there will be three patient visits per year. If a participant is unable to attend a clinic appointment, a home visit will be scheduled to collect the participant’s export file. Patients will be followed for 3 years, giving potentially a total of 9 visits per participant. Routine clinic visits include vital signs, spirometry and 6-minute walk test, checking for desaturation. Additional periodic tests will include diffusion capacity of the lungs for carbon monoxide (DLCO2), high-resolution CT (HRCT) and blood samples for biomarker studies (table 2).
Study schedule of events and ILD outpatient clinic
All patient record data collected at clinic appointments will be captured in the REDCap database for integration with MyCap PROMS and wearable datasets in our PRODIGY-ILD data warehouse (figure 1).
Data warehouse of data collected from wearable, app, and patient record. PROMs, patient-reported outcome measures; PFT, Pulmonary Function Tests; PRODIGY-ILD, Predicting outcomes using digital technology in patients with Interstitial Lung Disease.
Clinical outcomes (disease progression, onset of acute exacerbations and mortality) will be assessed at clinic appointments, through changes in imaging (CT of the chest), pulmonary function tests, 6-minute walk test distance and patient symptoms. Follow-up data will be captured in the patient’s research record in REDCap.
Data analysis
The overall objective of this study is to leverage the data collected to generate predictive models of outcome. Specifically, we will seek to develop individual patient level models of the following categorical outcomes of
Acute exacerbation.
Hospitalisation.
Mortality.
Models will be developed using a number of different approaches, with all models evaluated to determine best performance. These may include:
Logistic regression for mortality, acute exacerbation and hospitalisation modelling the relationship between independent variables (eg, biometric or PROM data) and each of the model outcomes of mortality, acute exacerbation and hospitalisation.
Proportional hazards to predict time to event (mortality) estimate the HR for input variables.
Linear regression, for continuous outcomes (exacerbations) where a linear relationship between variables (eg, cough questionnaire) is anticipated.
Survival analysis: For time to event outcomes, where outcome (eg, mortality) will be explored in the context of data collected from patients
To complement these approaches, we will also use machine learning approaches to explore more complex models, including
Random forest—to classify outcomes based on biometric or PROM variable data, as both single and multiple predictors.
Support vector machines—classifying patients as progressors or not based on multiple variables including PROM and biometric data, as well as clinical phenotype.
K-nearest neighbours—to make predictions about individual patients based on similarities in the dataset to other patients with known outcomes.
Pattern discovery—unsupervised machine learning methods will be used to uncover patterns in the data, applying methods such as clustering to identify, for example, high symptom burden patients from PROMs data.
All models will be evaluated and compared to determine performance. Key evaluation measures will include
Accuracy: based on the assessment of predicted versus observed outcomes, as participants continue in the cohort.
Sensitivity and specificity: analysis of true predictors, based on comparison with actual participants’ outcome.
C-statistics: to measure the performance of models in discriminating between outcomes.
Six-month patient data will be collected and analysed. Outcomes predicted will be compared with actual outcomes recorded at month 12, allowing refinement, enhancement and validation of the developed models. Data analysis will proceed as follows:
Step 1: Data quality control and validation
Individual patient level data will be reviewed for completeness. Given the lack of previous studies integrating clinical, physiological, activity and patient-reported data in this population, we will employ a conservative approach to data completeness with a requirement for at least 70% completeness for key variables. Missing data will be summarised and dealt with through case deletion, where the 70% threshold is not reached. Rational substitution will be employed where possible. For missing data at random, arithmetic imputation methods will be employed, including worst case imputation for dropouts and interpolation/extrapolation where prior and after data is available.
Step 2: Descriptive analysis
Data from activity and physiological measurements will be summarised as per table 3. Data summarisation will involve providing a concise overview of key characteristics of the dataset including central tendency values, variance to give insights into the spread of the data and frequency distributions, assessing for normality, while highlighting any notable patterns or peaks in the dataset.
Baseline characteristics from wearable
Step 3: Feature selection
Potential predictors of progression, exacerbation and mortality will be reviewed by specialist respiratory clinicians. The rationale behind the feature selection process will be clearly documented, and the chosen predictors will be rigorously validated to ensure clinical relevance to the target outcomes. In parallel, activity and physiological measurements (individual and composite, single point and time-rolling average trends, raw values and individual-normalised) will be analysed to identify potential features, predictive of clinical outcome.
Step 4: Predicted outcomes
The prediction model will be simplified by the elimination of unnecessary variables. Methods such as correlation analysis will be performed to identify potential redundancies. Statistical tests will be used to rank features and their association with the target outcomes.
Step 5: Validation
We will evaluate the predictive model’s effectiveness by comparing its performance to actual events per predictor both within individuals and across individuals. By doing this, we will gain insights into the practical relevance of individual predictors, and we will refine the model accordingly.
Patient and public involvement
This study has been designed together with input from patients and patient advocates. Specifically, the Irish Lung Fibrosis Patient and Public Involvement (PPI) Research Advisory Group and their volunteers have provided input into the relevance of wearables and research for this population, as well as recruitment and onboarding procedures to ensure these are designed with the end users in mind. In addition, we have co-developed the patient-facing resources (including the website) to ensure their relevance, comprehensibility and accessibility for this patient population.
Discussion
The primary objective of this study is to develop robust prediction models for clinical outcomes in ILD using a combination of clinical, physiological, activity and patient-reported data fields. Through the comprehensive collection and analysis of these datasets, we aim to identify better predictors of disease progression, acute exacerbations and mortality. Predictions will be compared with actual outcomes to validate prediction models. These predictions are expected to offer meaningful value for clinical management by providing clinicians with an improved tool for prognostication.
By developing a clinical prediction model of ILD outcomes, we anticipate that such predictors will be useful as patient level controls for clinical trials, where predicted outcomes for individual patients can be compared with actual recorded outcomes following an intervention. In this manner, individual patients will act as their own control, enabling additional investigation of new therapeutic strategies.
A strength of this study lies in the ability of the wearable to passively collect a large quantity of biometric data without burdening the patient to record active measures. However, the patient-reported outcome measures may represent a burden to patients.
Ethics and dissemination
Full ethics approval was granted for this research study by the St. Vincent’s University Hospital Research Ethics Committee, Dublin, Ireland (reference no: RS23-023). Explicit consent will be sought from participants for the collection and processing of their data. Data Processing Agreements will be in place to ensure that personal data will be processed as is necessary to achieve the objective of the health research and to ensure that data shall not be processed in such a way that might cause damage or distress to the data subject. St. Vincent’s University Hospital and University College Dublin will be joint data controllers for the study. This study will comply with the General Data Protection Regulation. Study results will be presented at medical conferences and will be disseminated via peer-reviewed journals.
Ethics statements
Patient consent for publication
Acknowledgments
Health Research Board – Trials Methodology Research Network (HRB-TMRN), Galway, Ireland and the UCD Clinical Research Centre for their support of this study. Lindsay Brown Advanced Nurse Practitioner for assistance with PROMS selection. Dylan Keagan PIL/ICF. Irish Lung Fibrosis Association PPI Nicola Cassidy, Paula Jenkins and Robert Hurley for their input to the design of the Website and Patrick McKay, Advantage Point Creative Consultancy for Website design.
References
Footnotes
Contributors Conception and design of the study: PD, MK, CM, ANF, SH, EG. Drafting the manuscript: SH, EG. Revising the manuscript for important intellectual content: PD, MK, CM, ANF, SH, EG. Approval of the version of the manuscript to be published: PD, CM. PD is the guarantor.
Funding The grant support for this research study was provided by the Health Research Board – Trials Methodology Research Network (HRB-TMRN). The funding body had no role in the design of this study protocol and will not be involved in the collection, analysis and interpretation of data or manuscript preparation.
Competing interests None declared.
Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.