Article Text

Original research
Further development and validation of the Multimorbidity Treatment Burden Questionnaire (MTBQ)
  1. Polly Duncan1,
  2. Lauren J Scott2,3,
  3. Shoba Dawson1,
  4. Muzrif Munas1,
  5. Yvette Pyne1,
  6. Katherine Chaplin1,
  7. Daisy Gaunt3,
  8. Line Guenette4,
  9. Chris Salisbury1
  1. 1Centre for Academic Primary Care, Bristol Medical School, University of Bristol, Bristol, UK
  2. 2National Institute for Health Research Applied Research Collaboration West, University Hospitals Bristol and Weston NHS Foundation Trust, Bristol, UK
  3. 3Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
  4. 4Faculty of Pharmacy and CHU de Québec Research Center, Université Laval, Quebec city, Quebec, Canada
  1. Correspondence to Dr Polly Duncan; polly.duncan{at}bristol.ac.uk

Abstract

Objectives To undertake further psychometric testing of the Multimorbidity Treatment Burden Questionnaire (MTBQ) and examine whether reversing the scale reduced floor effects.

Design Survey.

Setting UK primary care.

Participants Adults (≥18 years) with three or more long-term conditions randomly selected from four general practices and invited by post.

Measures Baseline survey: sociodemographics, MTBQ (original or version with scale reversed), Treatment Burden Questionnaire (TBQ), four questions (from QQ-10) on ease of completing the questionnaires. Follow-up survey (1–4 weeks after baseline): MTBQ, TBQ and QQ-10. Anonymous data collected from electronic GP records: consultations (preceding 12 months) and long-term conditions. The proportion of missing data and distribution of responses were examined for the original and reversed versions of the MTBQ and the TBQ. Intraclass correlation coefficient (ICC) and Spearman’s rank correlation (Rs) assessed test–retest reliability and construct validity, respectively. Ease of completing the MTBQ and TBQ was compared. Interpretability was assessed by grouping global MTBQ scores into 0 and tertiles (>0).

Results 244 adults completed the baseline survey (consent rate 31%, mean age 70 years) and 225 completed the follow-up survey. Reversing the scale did not reduce floor effects or data skewness. The global MTBQ scores had good test–retest reliability (ICC for agreement at baseline and follow-up 0.765, 95% CI 0.702 to 0.816). Global MTBQ score was correlated with global TBQ score (Rs 0.77, p<0.001), weakly correlated with number of consultations (Rs 0.17, p=0.010), and number of different general practitioners consulted (Rs 0.23, p<0.001), but not correlated with number of long-term conditions (Rs −0.063, p=0.330). Most participants agreed that both the MTBQ and TBQ were easy to complete and included aspects they were concerned about.

Conclusion This study demonstrates test–retest reliability and ease of completion of the MTBQ and builds on a previous study demonstrating good content validity, construct validity and internal consistency reliability of the questionnaire.

  • Primary Care
  • Surveys and Questionnaires
  • Patient Reported Outcome Measures
  • Quality in health care
  • Patient-Centered Care
  • Chronic Disease

Data availability statement

No data are available. The participants of this study did not give written consent for their individual anonymised data to be shared publicly, so the research supporting data are not available.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • Further psychometric testing was carried out on the Multimorbidity Treatment Burden Questionnaire, including test–retest reliability, construct validity and assessing ease of completing the questionnaire.

  • Study participants had three or more long-term conditions and an average age of 70 years.

  • Postal recruitment enabled people without a smartphone, computer, access to the internet or good information technology literacy skills to take part.

  • The study was designed to assess the primary outcome of test–retest reliability but was not necessarily large enough to detect multiple associations between treatment burden and some patient characteristics.

Introduction

Having a good measure of treatment burden for patients with multimorbidity is important given the ageing population and the associated increase in multimorbidity.1 Interventions designed to reduce treatment burden require a measure of treatment burden to assess their effectiveness. Treatment burden is defined as the ‘effort of looking after ones’ health and the impact that this has on everyday life’.2 This includes ordering and collecting medications, taking complex medication regimens, coordinating and attending healthcare appointments, monitoring one’s health conditions and making lifestyle changes.

The Multimorbidity Treatment Burden Questionnaire (MTBQ) is a 10-question simply worded measure of treatment burden, developed and validated as part of the three-dimensional (3D) study, a multicentre cluster-randomised controlled trial in the UK that aimed to improve the management of patients with multimorbidity within primary care.3 4 There are three additional optional questions, which had a high proportion of ‘does not apply’ responses in the original study but may be relevant to other populations. Study investigators can choose to use the 10-question or 13-question version of the MTBQ. 1546 adult participants with multimorbidity (≥3 long-term conditions) and a mean age of 77 years, took part in the original study.3 The MTBQ was developed using a framework of treatment burden derived from qualitative research in the USA2 and demonstrated good content validity, construct validity and internal consistency reliability and preliminary evidence of responsiveness.4 The MTBQ is widely used internationally and has been translated, culturally adapted and validated into several languages, including Danish,5 German,6 French-Canadian7 and Chinese.8

We are aware of four other existing measures of treatment burden for patients with multimorbidity, but all have limitations.9–13 The Treatment Burden Questionnaire (TBQ) is a 13-question measure originally developed in France10 and subsequently translated, adapted and validated in English.9 A limitation of the TBQ is that the English version was developed and validated in a relatively young (mean age 51 years) and highly educated (78% with a college education) population recruited from an online platform.9 Some of the wording is quite complex requiring high literacy levels (eg, question 1: ‘How would you rate the problems related to the taste, shape or size of your tablets and/or the annoyances caused by your injections (eg, pain, bleeding, bruising or scars?’). The Patient Experience with Treatment and Self-management (PETS) questionnaire is a comprehensive measure of treatment burden developed in the USA, including 48 questions.11 For some study investigators, the length of the PETS questionnaire will be considered a strength, as it is able to capture a detailed picture of the different aspects of treatment burden. However, others may consider its length a limitation, being too time-consuming and burdensome for participants to complete. A shorter version of the PETS questionnaire, called the ‘Brief PETS’ questionnaire, has been developed.14 The length of the questionnaire (32 questions) may still be considered too long for some study investigators, particularly clinical trialists where treatment burden is one of several secondary outcomes and included among a battery of other measures. The Multimorbidity Illness Perceptions Scale questionnaire was developed and validated in older people (mean age 70 years) in the UK and includes a six-question treatment burden subscale.13 This excludes some important aspects of treatment burden, such as arranging appointments with healthcare professionals. The Healthcare Task Difficult questionnaire, developed in the USA, was only designed to measure one aspect of treatment burden (difficulty with health-related tasks, such as obtaining and taking medications) and was not designed to measure other aspects of treatment burden (eg, seeing different healthcare professionals).12

The MTBQ, and the original validation study,4 have four important limitations. First, the data were positively skewed and there was a high floor effect, with 22% of participants scoring a global MTBQ score of 0 (no treatment burden). As it is not possible to improve from a score of 0, this can make it difficult to detect change. Similar floor effects have been shown from other existing treatment burden questionnaires for patients with multimorbidity.9–12 Second, within the context of a trial, it was not possible to assess test–retest reliability. Third, in the original validation study, we were only able to test construct validity using indirect measures which we expected to correlate with high and low treatment burden scores, such as health-related quality of life score, rather than direct measures of treatment burden, such as number of healthcare appointments. Fourth, as the MTBQ was developed and validated as part of a trial, it may not be generalisable to non-trial populations. The ease of completing the MTBQ was assessed in the original study as part of the cognitive interviews (n=8) but has not been assessed in a larger sample of participants.

The purpose of this study was: (1) to examine whether reversing the scale of the questionnaire improved the floor effects and the skewness of the data; (2) to assess test–retest reliability; (3) to compare responses, construct validity and ease of completion of the MTBQ and a comparator questionnaire, the TBQ9 10 and (4) to assess interpretability of the MTBQ in a non-trial population.

Methods

Study population, eligibility criteria and recruitment

Participants were recruited from four General Practices serving a range of deprived, mid-deprived and affluent populations, from August 2018 to August 2019. Patients were eligible if they were aged ≥18 years and had ≥3 long-term conditions from 17 major long-term conditions included in the 2014 National Health Service Quality and Outcomes Framework (a UK programme which incentivises General Practices to deliver high quality healthcare).15 Conditions were grouped into 12 types of condition with similar management considerations; for example, asthma and chronic obstructive pulmonary disease (COPD) within the same individual were counted as one condition. Patients who had taken part in the 3D study3 or who were deemed unsuitable to take part by a clinician from the practice (eg, due to a recent bereavement, cognitive impairment or poor level of English to read and complete the questionnaire) were excluded.

Potentially eligible participants were identified via a standardised search of the electronic general practitioner (GP) records, which was used in the original validation study.3 4 16 Similar conditions, such as asthma and COPD, within the same individual were counted as one condition (table 1). A random sample of potentially eligible participants was selected from each practice and was reviewed by a clinician in the practice to check whether it was appropriate to invite them. Eligible participants were sent an invitation letter, a participant information sheet, and a questionnaire booklet (with original MTBQ or version with the scale reversed; see below). Completion of the questionnaire implied consent, as stated in the participant information sheet. Those who responded were sent a follow-up questionnaire 1–4 weeks after returning the baseline questionnaire, including the same version of the MTBQ completed at baseline. Participants were sent a £5 Love2Shop voucher each time they returned a completed questionnaire.

Table 1

Participant characteristics

Survey content

The questionnaire booklet included demographic information (age, gender, age left full-time education, employment status, ethnic group); the MTBQ (original or reversed version; see below)4; the TBQ comparator questionnaire9 10 and four questions from the QQ-10 questionnaire to assess the ease of completion of the MTBQ and the TBQ.17 Four different versions of the questionnaire booklet were created: original MTBQ followed by TBQ; reversed MTBQ followed by TBQ; TBQ followed by original MTBQ; TBQ followed by reversed MTBQ. Each booklet was colour coded and participants were sent the same version of the questionnaire at follow-up to assess test–retest reliability.

Participants could actively decline participation by ticking a box on the front page of the questionnaire booklet saying they did not wish to participate and returning the booklet in the FREEPOST envelope. For non-responders, a reminder letter was sent 10–14 days after the baseline questionnaire.

Data from electronic GP records

The following non-identifiable information was collected from the electronic GP records: Townsend Deprivation Index scores; long-term conditions; all consultations recorded in the preceding 12 months (including face-to-face, telephone, video and home visits), type of professional who performed the consultation (eg, GP, nurse) and a GP identifier if it was a GP consultation. The Townsend scores were used to calculate quintiles of deprivation based on the 2011 census data.18

Consultations were coded in the same manner as a previous study.19 20 The number of consultations in the preceding 12 months was calculated by adding up all consultation entries where participants were seen by a GP, nurse or primary care paramedic (employed by the general practice as part of the clinical team). The number of different GPs seen in the preceding 12 months was calculated for each participant who had at least one GP appointment using the GP identifier codes assigned to the consultations listed above. For some GP appointments (19%), a GP identifier code was not assigned; these appointments were excluded from this analysis. We excluded participants from the analysis who had one or more GP consultations with no GP identifier assigned.

Patient involvement

Four members of the Patient Involvement in Primary Care Research group were involved in the study design. We worked closely with them to develop simply worded, concise and easy to read invitation letters, information sheets and questionnaire booklets, making the study more accessible to patients.

The original MTBQ

The original MTBQ comprises 10 questions including the following aspects of treatment burden: taking and collecting medications, monitoring health conditions, arranging and attending healthcare appointments with different healthcare professionals, making recommended lifestyle changes and having to rely on help from family and friends.4 There are three additional optional questions about paying for medicines and equipment, accessing healthcare in the evenings and weekends and getting help from community services (eg, physiotherapy, community nurses). In this study, the 13-question version of the MTBQ was used, including the three optional questions.

Participants score each of the questions on a five-point Likert scale ranging from 0 (not difficult), to 1 (a little difficult), to 2 (quite difficult), to 3 (very difficult) to 4 (extremely difficult). There is also an option of ‘does not apply’ (scores 0). A global MTBQ score can be computed by calculating the mean from the questions answered and multiplying this by 25 to give a score from 0 to 100.4 A global score cannot be calculated if more than 50% of responses are missing. The global score based on the 10 core questions was used in most of our analyses; for some analyses, we have also calculated and presented the global score based on all 13 questions.

Reversing the scale

A new version of the MTBQ was developed where the order of responses was reversed, that is, the response option of ‘extremely difficult’ was listed first and ‘not difficult’ was listed last (online supplemental file 1). We hypothesised that this might frame difficulties as to be expected and reduce floor effects on the questionnaire. Participants were randomly sent either the original MTBQ or reversed MTBQ.

Data and statistical analysis

We used means and SDs, and medians and IQRs to summarise normally distributed and skewed data respectively. Categorical data were summarised using counts and percentages.

Objective 1: examine whether reversing the scale improved the floor effects and the skewness of the data

To assess the effect of reversing the scale, the count and percentage of each response to each question, as well as the floor effects for the global score (the proportion of participants with a global score of 0), were compared between the original and reversed versions of the MTBQ. The distribution of the MTBQ on each scale was presented as medians and IQRs. A χ2 test was used to compare the floor effect between the original and reversed MTBQ. The analyses for objective 1 included data from the baseline questionnaire.

Objective 2: assess test–retest reliability

To assess test–retest reliability, we calculated the intraclass correlation coefficient (ICC) for agreement (and the 95% CI) between global MTBQ score at baseline and follow-up. An ICC>0.7 was considered acceptable.21 A Bland-Altman plot was constructed, where the mean global MTBQ score from the two time points was plotted against the difference in global MTBQ score between the two time points.22 All participants were included in these analyses, including those who were sent the original and reversed versions of the MTBQ scale. A sensitivity analysis was performed, including only those who were sent the original version.

Objective 3: compare responses, construct validity and ease of completion of the MTBQ and a comparator questionnaire, the TBQ

The TBQ was chosen as the comparator questionnaire for this study because it includes all aspects of treatment burden and is relatively short (13 questions),9 10 and so was thought feasible for participants to self-complete. The proportion of missing data and not difficult/does not apply responses (floor effect) were examined for each question from the MTBQ alongside the comparable questions from the TBQ.

MTBQ construct validity was assessed using Spearman’s rank correlation coefficients (Rs) with corresponding p value s for independence, for four prespecified hypotheses: the TBQ comparator (criterion validity) 9 10; a positive association between treatment burden score and number of long-term conditions; a positive association between treatment burden score (global score) and number of primary care appointments in the prior 12-month period; and finally, a positive association between treatment burden score (global score) and the number of different GPs seen in the prior 12-month period.

Four statements from the QQ-10 questionnaire17 were used to compare the ease of completion of the MTBQ with the TBQ: (1) the questionnaire was easy to complete; (2) the questionnaire included all aspects of my condition that I am concerned about; (3) the questionnaire was too long and (4) the questionnaire was too complicated. For each statement, participants could strongly disagree, mostly disagree, neither agree or disagree, mostly agree or strongly agree. The QQ-10 consists of 10 statements, however, only the 4 statements which appeared most relevant to assessing the ease of completion of the TBQs were selected to avoid overburdening participants. The proportions of each response to each of the four questions were examined for the MTBQ and the TBQ; responses were grouped as strongly agree/mostly agree versus neither agree or disagree/mostly disagree/ strongly disagree and formally compared using the McNemar test.23

Construct validity analyses included data from all baseline participants (original and reversed MTBQ scale). All other analyses for objective 3 included data from baseline participants who responded to the original MTBQ scale.

Objective 4: assess interpretability of the MTBQ

To assess interpretability of the MTBQ, we categorised the global MTBQ scores greater than 0 into tertiles to generate four categories: no burden (score 0), low burden (lowest tertile), medium burden (middle tertile) and high burden (upper tertile). The tertiles were based on the MTBQ baseline data of participants who completed the original and reversed MTBQ. We summarised the participant characteristics and key outcome variables, including number of long-term conditions, by the four categories. Further, we dichotomised the burden categories into no/low burden versus medium/high burden and examined the effect of participant characteristics and key outcome variables on treatment burden using logistic regression. For these analyses, we collapsed some of the variables due to small numbers. We performed univariable analyses, in addition to adjusted analyses where each model was adjusted for age, gender, deprivation and number of comorbidities. Estimates are presented as ORs alongside 95% CIs and p values.

Sample size

Sample size calculations were performed so that the primary outcome, assessment of test–retest reliability, achieved an interval estimate with sufficient precision, rather than a specific power to test a hypothesis.24 Using a 0.7 ICC with 95% CI having a width of 0.2 (ie, 0.6 to 0.8), 101 participants were required to complete the baseline and follow-up questionnaire. Based on the response rate of the 3D study3 and the ‘TBQ’ validation study,9 the anticipated response rate was 20%.

The study design was assessed against the Consensus-based Standards for the selection of health status Measurement Instruments (COSMIN) checklist (online supplemental file 2).25

Results

Of the 800 adults invited, 244 completed the baseline survey (consent rate 31%, 112 and 132 completed the original and reversed scale versions, respectively) and 225 completed the follow-up survey (92% of participants who had completed the baseline survey, 105 and 120 completed the original and reversed scale version, respectively) (figure 1). The mean age of participants was 70 years (SD 13), 53% were male and 94% were of white British ethnicity (table 1). 56% had 3 long-term conditions, 30% had 4 and 14% had 5 or more. The most common long-term conditions were cardiovascular disease (86%), diabetes (58%), COPD or asthma (49%) and depression (44%). Seventy-three per cent left school aged 16 years or under, and 59% were fully retired from work. The sociodemographic characteristics and long-term conditions of those who completed the original and reversed scale versions of the MTBQ were similar (online supplemental file 3).

Figure 1

Participant flow diagram. GP, general practitioner.

Objective 1: examine whether reversing the scale of the questionnaire improved the proportion of missing data, the floor effects and the skewness of the data

The proportion of missing data for each question was between 0% and 2% for the original version of the MTBQ, and between 0% and 3% for the reversed version (online supplemental file 4). The number of missing responses per participant was low for both versions of the questionnaire: 0 for 96% for the original version and 93% for the reversed version. The floor effect for the individual questions (the proportion of participants responding ‘not difficult’ or ‘does not apply’) was slightly higher for the reversed version compared with the original version, except for question 10. For both versions, the responses to individual questions were positively skewed, with a higher proportion of participants responding either ‘a little difficult’ or ‘quite difficult’, than ‘very difficult’ or ‘extremely difficult’. The distribution of responses to individual questions was similar for the original and reversed versions (online supplemental file 4).

The median global MTBQ score was 17.1 (IQR 7.5–35.0) for the original MTBQ and 12.5 (IQR 5.0–27.5) for the reversed scale (online supplemental file 4). There were 11 (10%) participants with a global MTBQ score of 0 for the original version and 18 (14%) for the reversed version (p=0.35). The distribution of the TBQ global scores was also skewed and similar between participants who received the original version and reversed version of the MTBQ (online supplemental file 5).

Objective 2: assess test–retest reliability

The ICC for agreement between global MTBQ scores at baseline and follow-up was 0.768 (95% CI 0.705 to 0.818) and 0.765 (95% CI 0.702 to 0.816) for the 13-question version and 10-question version, respectively (includes all participants). Similarly, the agreement between baseline and follow-up for participants who completed the original MTBQ was 0.715 (95% CI 0.599 to 0.801) and 0.705 (95% CI 0.587 to 0.794) for the 13-question and 10-question scale, respectively. The Bland-Altman plot22 suggests that there was no systematic bias between values at the two time points, with an average difference of only −0.5 (95% CI −24.8 to 23.8; online supplemental file 6).

Objective 3: compare responses, construct validity and ease of completion of the MTBQ and a comparator questionnaire, the TBQ

Of the participants who completed the original version of the MTBQ, the TBQ had similar floor effects for the global treatment burden score as the MTBQ, with 12% of participants scoring 0 for the TBQ compared with 10% of participants scoring 0 for the MTBQ (McNemar ratio 0.85, 95% CI 0.46 to 1.56, p=0.593; see also online supplemental files 5 and 7). The proportion of missing data for each question was between 0% and 2% for the MTBQ and 0%–5% for the TBQ. Ninety-two per cent of participants had no missing responses for the MTBQ, compared with 86% of participants for the TBQ (online supplemental file 7).

The floor effects for the individual questions (proportion of participants responding ‘does not apply’ or ‘not difficult’) ranged from 38% to 89% for the MTBQ, and from 36% to 71% for the TBQ (online supplemental file 7).

Regarding construct validity, the global MTBQ score had a strong positive correlation with the comparator TBQ scale (Rs 0.77, p<0.001 for 10-item MTBQ ; Rs 0.78, p<0.001 for 13-item MTBQ). A weak positive correlation was found between global MTBQ score, number of primary care appointments and number of different GPs consulted within the preceding 12-month period (table 2). Similar weak correlations were found between global TBQ score and the same variables. There was no correlation found between global MTBQ score or global TBQ score and number of long-term conditions.

Table 2

Correlations between global MTBQ score and global TBQ score, number of long-term conditions, number of primary care appointments and number of different GPs consulted in the preceding 12 months

Slightly more participants agreed that the MTBQ was easy to complete compared with the TBQ (86% vs 80%; p=0.013; table 3). For the MTBQ and TBQ, respectively, 66% and 67% agreed that the questionnaire included all aspects of their condition they were worried about (no significant difference between the two questionnaires, p=1.0; table 3). The proportion of participants who agreed or strongly agreed that the questionnaire was too long or too complicated was 12% and 13%, respectively, for the MTBQ and 13% and 12% for the TBQ (no significant difference between the questionnaires; table 2).

Table 3

Ease of completing the MTBQ and TBQ, measured using four questions from the QQ-10 questionnaire (n=112)

Objective 4: to assess interpretability of the MTBQ in a non-trial population

Grouping global MTBQ scores greater than 0 into tertiles, four categories were generated: no burden (score 0), low burden (score <11), medium burden (12–25), high burden (>25). Categorising treatment burden as medium to high (≥11) or low (0 to 10), younger participants (≥71 years vs 18–70 years; adjusted OR 0.24, 95% CI 0.13 to 0.44, p<0.001), and those with depression (adjusted OR 3.11, 95% CI 1.71 to 5.65, p<0.001), or rheumatoid arthritis (adjusted OR 4.34, 95% CI 1.14 to 16.48, p=0.031) were at greater risk of having high treatment burden (table 4). Treatment burden split by four categories is described in online supplemental file 8.

Table 4

Participant characteristics by category of treatment burden (original version of 10-question MTBQ, n=243)

Discussion

In this study, we examined test–retest reliability, construct validity and ease of completion of the MTBQ, and assessed whether reversing the scale (listing ‘extremely difficult’ first and ‘not difficult’ last) improved the floor effects and skewness of the data. There was good evidence for test–retest reliability and a strong positive correlation was found between global MTBQ score and global TBQ score (the comparator questionnaire). Global MTBQ score was weakly correlated with number of consultations and number of different GPs consulted but not with number of long-term conditions. Reversing the scale did not reduce the floor effects or skewness of the data. For both the MTBQ and TBQ, participants mostly agreed the questionnaires were easy to complete, included aspects of their condition they were worried about and were not too long or complicated. As the MTBQ and TBQ9 10 performed similarly in this study, the choice of which questionnaire to use will likely come down to study preference.

A strength is that the study population comprised randomly selected older adults (mean age 70 years) with multimorbidity (≥3 long-term conditions)—the population for whom the MTBQ is intended. We recruited participants by post and so were able to include those who do not have a smartphone, computer, access to the internet or good information technology literacy skills. Further strengths are that we used questions from the validated QQ-10 questionnaire17 to assess the ease of completing the MTBQ and the TBQ; and combined survey data with routinely collected data from the GP records.

The low baseline response rate of 31% is a weakness of the study since this may reduce the generalisability of the findings if those who participated differed from those who did not take part. Similar response rates have been reported by other study investigators validating measures of treatment burden internationally,9–12 and in primary care survey studies in the UK.26 The majority of study participants self-identified as white British ethnicity, which is also a limitation. We purposefully included two practices serving more ethnically diverse populations, but as our sample has a lower proportion of people from minority ethnic groups than the general UK population, this limits the generalisability of the findings and could potentially lead to selection bias. As the questionnaire was self-administered, we were unable to include those with a poor level of English literacy to read the questionnaire. A further limitation is that reading similar questions from the MTBQ and TBQ in the same questionnaire pack could have influenced participants’ responses, although we tried to mitigate the effects of this by randomising the order the MTBQ and TBQ questionnaires were presented in. One limitation is that, while the study was designed to assess the primary outcome of test–retest reliability, it was not necessarily large enough to detect multiple associations between treatment burden and some patient characteristics. The sample size (n=243) was ‘very good’ according to the COSMIN criteria,25 but for some subgroup analyses (eg, less common long-term conditions), sample sizes were low, precluding an adequate test of certain relationships. A final limitation is that we were unable to resolve the skewed distribution of responses. This merits further investigation, for example, through exploring other changes to response options; alternatively, it may indicate that experience of burden is inevitably skewed rather than being a problem of the measures.

Interestingly, we found that younger participants were more likely to report high treatment burden, a phenomenon found in the original MTBQ study,4 Danish population survey5 and in Tran’s TBQ validation study.10 This may be explained by younger people having more caring and work responsibilities, and reduced capacity to manage the workload of looking after their health.27 In this and several other studies, including the original MTBQ validation study, people with depression were more likely to report high treatment burden.4 12 13 In contrast to the original and much larger MTBQ study4 but in keeping with studies in the USA,14 28 we did not find an association between high treatment burden score and number of long-term conditions. We found modest associations between high treatment burden and number of consultations and poor continuity of care. We would not necessarily expect strong associations because the number of consultations and number of different healthcare professionals would only affect one aspect of treatment burden. Furthermore, the relationships between treatment burden, number of consultations and continuity of care are complex. A high number of consultations could lead to high treatment burden caused by having to arrange and attend multiple appointments, but this could also reflect good access to healthcare appointments, and subsequent reduced treatment burden. Similarly, seeing a healthcare professional whom you know and trust (good continuity of care) often comes at the expense of having to wait longer for an appointment, which could in turn increase treatment burden. The cut-off values for the four treatment burden groups were slightly higher in this study compared with the original study.4 For studies using the MTBQ, we recommend using the original study cut-off values: no burden (score 0), low burden (score <10), medium burden (10–21) high burden (>21). Further research, such as anchor-based methods, is needed to determine the clinical significance of global MTBQ scores.

The MTBQ is a simply worded concise measure of treatment burden for patients with multimorbidity. This study provides further evidence of the scale’s psychometric properties, including test–retest reliability, construct validity and ease of completion. These findings can be combined with the original validation study, where the MTBQ demonstrated good content validity, construct validity, internal consistency reliability and preliminary evidence of responsiveness. The MTBQ was developed and validated primarily as a research tool and has been widely used in interventional and observational studies. Further work is underway to develop and validate an adapted version of the MTBQ, known as the ‘Short Treatment Burden Questionnaire’, for use in clinical settings.29

Data availability statement

No data are available. The participants of this study did not give written consent for their individual anonymised data to be shared publicly, so the research supporting data are not available.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and ethical approval was obtained by the Faculty of Health Sciences Research Ethics Committee (FREC), University of Bristol (18/LO/1051, IRAS 236536). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

Appreciation is extended to the General Practices and their patients for taking part in the study, and to the PiPCare members (comprising people with two or more long-term conditions) for providing valuable patient and public insights into the study design. We also thank Dr Viet-Thi Tran for permission to use the Treatment Burden Questionnaire (TBQ).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • X @polly_duncan, @prof_tweet

  • Contributors PD led this project under the supervision of CS, and both were responsible for the concept and study design. CS provided methodological expertise in assessing the psychometric properties of the MTBQ. PD, SD, MM and KC were involved in inviting participants and transcribing data from the surveys into the database, extracting anonymous data from the electronic GP records. LJS led the analysis with input from DG, PD, YP, LG and CS. PD and LJS drafted the manuscript under supervision of CS. All authors critically reviewed the manuscript and approved the final version. PD is the guarantor.

  • Funding This study was part funded by the Avon Primary Care Research Collaborative (grant no: NA), an NIHR Doctoral Research Fellowship award to PD (NIHR301824) and an NIHR Senior Investigator award to CS (NIHR201314). It was also supported by the National Institute for Health Research (NIHR) Applied Research Collaboration West (ARC West) at University Hospitals Bristol and Weston NHS Foundation Trust (grant no: NA).

  • Disclaimer The views expressed in this publication are those of the authors and not necessarily those of the NIHR, NHS or the UK Department of Health and Social Care.

  • Competing interests PD and CS developed and validated the MTBQ. CS is an NIHR Senior Investigator.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.