Article Text

Original research
Exploring COVID-19 vaccine hesitancy and uptake in Nairobi’s urban informal settlements: an unsupervised machine learning analysis of a longitudinal prospective cohort study from 2021 to 2022
  1. Nandita Rajshekhar1,
  2. Jessie Pinchoff2,
  3. Christopher B Boyer3,
  4. Edwine Barasa4,
  5. Timothy Abuya5,
  6. Eva Muluve5,
  7. Daniel Mwanga5,
  8. Faith Mbushi5,
  9. Karen Austrian5
  1. 1Independent Consultant, Atlanta, Georgia, USA
  2. 2Social and Behavioral Sciences Research, Population Council, New York, New York, USA
  3. 3Harvard University HSPH, Boston, Massachusetts, USA
  4. 4Health Economics Research Unit, Centre for Geographic Medicine Research Coast, Nairobi, Kenya
  5. 5Population Council Kenya, Nairobi, Kenya
  1. Correspondence to Dr Jessie Pinchoff; jpinchoff{at}popcouncil.org

Abstract

Objectives To illustrate the utility of unsupervised machine learning compared with traditional methods of analysis by identifying archetypes within the population that may be more or less likely to get the COVID-19 vaccine.

Design A longitudinal prospective cohort study (n=2009 households) with recurring phone surveys from 2020 to 2022 to assess COVID-19 knowledge, attitudes and practices. Vaccine questions were added in 2021 (n=1117) and 2022 (n=1121) rounds.

Setting Five informal settlements in Nairobi, Kenya.

Participants Individuals from 2009 households included.

Outcome measures and analysis Respondents were asked about COVID-19 vaccine acceptance (February 2021) and vaccine uptake (March 2022). Three distinct clusters were estimated using K-Means clustering and analysed against vaccine acceptance and vaccine uptake outcomes using regression forest analysis.

Results Despite higher educational attainment and fewer concerns regarding the pandemic, young adults (cluster 3) were less likely to intend to get the vaccine compared with cluster 1 (41.5% vs 55.3%, respectively; p<0.01). Despite believing certain COVID-19 myths, older adults with larger households and more fears regarding economic impacts of the pandemic (cluster 1) were more likely to ultimately to get vaccinated than cluster 3 (78% vs 66.4%; p<0.01), potentially due to employment requirements. Middle-aged women who are married or divorced and reported higher risk of gender-based violence in the home (cluster 2) were more likely than young adults (cluster 3) to report wanting to get the vaccine (50.5% vs 41.5%; p=0.014) but not more likely to have gotten it (69.3% vs 66.4%; p=0.41), indicating potential gaps in access and broader need for social support for this group.

Conclusions Findings suggest this methodology can be a useful tool to characterise populations, with utility for improving targeted policy, programmes and behavioural messaging to promote uptake of healthy behaviours and ensure equitable distribution of prevention measures.

  • COVID-19
  • public health
  • health policy
  • statistics & research methods
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • A strength of modern statistical methods, such as K-Means clustering, is the ability to facilitate data-driven analysis, objectively revealing subgroups without the researchers’ preconceived assumptions potentially biasing the analysis.

  • A strength of this study is its longitudinal prospective design, following respondents from 2 months after the pandemic was declared through to vaccine availability.

  • Some limitations to K-Means clustering include possible changes to the clustering of the data when run multiple times due to the use of random starting points and challenges in interpreting the data when distinct subgroups are not present.

  • Limitations in the study design include potential selection bias favouring respondents who had mobile phones as well as social desirability bias, whereby respondents may have answered questions to be socially acceptable to the interviewer.

  • Relatedly, the study has high attrition due to the repeat rounds of collection.

Introduction

WHO officially declared COVID-19, a disease caused by the novel coronavirus SARS-CoV-2, a pandemic on 11 March 2020.1 The first case of COVID-19 in Kenya was reported shortly after on 13 March 2020. To curb transmission, the Kenyan Government swiftly instated lockdown policies including restrictions on travel and large gatherings, and business and school closures. Experts were concerned that due to limited resources for distancing and hand washing, that populations in urban informal settlements would be at high risk of transmission.2 Many studies regarding COVID-19 and other outbreaks, such as Ebola, have cited loss of income, food insecurity, gender-based violence, mental health and lack of access to healthcare needs as major downstream impacts of disease mitigation policies.3–5 In the years since the pandemic began, restrictions have eased and with the rollout of COVID-19 vaccines to the general public in early 2021, the focus has shifted to increasing vaccination coverage. While vaccination is critically important, during initial phases of the rollout, 82% of globally available doses went to high and upper middle-income countries, with only 0.2% delivered to low-income and middle-income countries, highlighting continued vaccine inequity and injustice.6–10 As of July 2023, 65.9% of individuals globally have taken both doses of the COVID-19 vaccine.11

The government of Kenya launched a phased rollout of COVID-19 vaccination from March 2021, starting with essential workers such as healthcare providers, then the elderly and those with comorbidities. In June 2022, the Kenyan Ministry of Health expanded their reach and aimed to vaccinate 27 million eligible adults and 5.8 million teenagers by the end of the year.12 Certain jobs require vaccination such as civil servants, teachers and some private employers.13–16 Ongoing campaigns aim to increase vaccination coverage, assuage concerns about vaccine safety and promote uptake to protect Kenyans from severe outcomes and death as well as to protect against new and emerging variants. Vaccination is one of the most effective interventions to control the ongoing pandemic but vaccine acceptance rates around the world vary.17–19

Vaccine hesitancy is a major ongoing global concern as it is likely there will continue to be new vaccines or boosters required as the pandemic evolves. A study across 23 countries worldwide (including Kenya) found that soon after the vaccines were available (June 2021), over three-quarters (75.2%) of respondents reported vaccine acceptance, meaning they would get the vaccine. Reasons for vaccine hesitancy related to lack of trust in COVID-19 vaccine safety and science, and scepticism about its efficacy.19 Other factors included misperceptions regarding individual-level risk of contracting COVID-19, the severity of infections19–24 and fear of side effects.25 Some people surveyed reported a general lack of trust in scientific institutions or health authorities which can also increase vaccine hesitancy.19

Looking closer at COVID-19 vaccine hesitancy in Kenya, an early study in four Kenyan counties found hesitancy ranged from 10.2% to 44.6%, with Nairobi County having the highest proportion that reported they intended to get the vaccine, particularly among those who had received training from the Ministry of Health.26 A 2022 study from six Kenyan health facilities found that while 81% reported it was important to get the vaccine, 40.5% also reported concerns, mainly regarding side effects.6 This study also found that hesitancy was higher in government and faith-based health institutions compared with private ones.6 Another study conducted in February 2022 found that >45% of individuals eligible for vaccination in Kenya had not taken a single dose.19 27 28

To increase vaccine uptake, it is important to address hesitancy by identifying sources of information, perceived trustworthiness of sources and how messaging can be adapted to drive positive behaviour change. Studies have shown that individuals who report receiving COVID-19 information from social media, primarily Facebook, have the highest rates of vaccine hesitancy.6 26 An Africa CDC report found that among those surveyed in Kenya, 65% reported having seen or heard at least some misinformation about COVID-19 from social media.29 Overall, the potential for social media to contribute to misinformation is concerning, as the information shared is not scientifically filtered or reviewed. Other sources commonly reported for COVID-19 information include television (TV), SMS from government agencies and health providers. An African CDC report found that in Kenya, 78% of those surveyed say that TV is a trusted source of information.29 In Nairobi, a study revealed that government health messages through TV, radio and SMS were among the most common sources of information for residents in urban informal settlements at the initial onset of the COVID-19 pandemic.30 In particular, it is important to understand how young adults receive and interpret information regarding COVID-19, as some studies suggest this age group may be extremely hesitant because of perceived low risk of severe outcomes, mistrust in authority and fear regarding side effects especially around infertility and pregnancy outcomes.31–33 A global study found young people were most likely to search for COVID-19 and other health information from social media, raising concerns about exposure to misinformation.34

This study analyses data from a sample of individuals residing in urban informal settlements in Nairobi, surveyed in 2021 and 2022, before and after the distribution of the first COVID-19 vaccine. An exploratory analysis was implemented to understand how the characteristics of respondents could point to vaccine acceptance/hesitancy (prior to availability) and uptake (after the vaccine was available). We explored the utility of K-Means clustering to characterise participants based on demographics, knowledge, perceptions, risks and other factors, to determine if certain archetypes or subgroups are present in the cohort; and if so, how likely they are to want to take the COVID-19 vaccine and ultimately get it. We selected K-Means analysis because it is a data-driven approach, meaning that the patterns are derived from the data itself, a less biased method to characterise ‘types’ of participants. K-Means have been used in previous studies to group together participants in a dataset to predict health prevention and treatment strategies for each group.35 We compared this statistical approach with a more basic one, to highlight the utility of K-Means clustering to understand unmeasured characteristics of the groups. Ultimately, K-Means clustering identified three subgroups in the dataset with implications for COVID-19 vaccination policy and messaging.

Methods

Sample and survey design

The Population Council, in collaboration with the Kenya Ministry of Health, conducted a longitudinal prospective cohort study across five informal settlements (Kibera, Mathare, Kariobangi, Huruma and Dandora) in Nairobi, Kenya to understand knowledge, attitudes and practices around COVID-19. Participants were sampled from two previous longitudinal cohorts, Adolescent Girls Initiative-Kenya (AGI-K) (n=2565) and Nisikilize Tujengane (NISITU): engaging men and boys in girl-centred programming (n=4519). For AGI-K and NISITU surveys, household listings were generated and eligible households contained at least one adolescent member were sampled. For AGI-K and NISITU, sample size calculations were conducted and samples selected accordingly.

For the COVID-19 survey, 3465 households were randomly sampled from the AGI-K and NISITU cohorts and stratified by informal settlement, so they are somewhat representative but had to have at least one adolescent household member (eg, a household with only one adult member would not have been eligible for inclusion). For the COVID-19 surveys, we were aiming for a sample size of 2000 or 400 per informal settlement.30 Of the random sample from AGI-K and NISITU (n=3465), 24% of the numbers were no longer in use, but refusals were quite low at about 1%. The resulting cohort for this COVID-19 study includes 2009 adult household members interviewed on 30 March 2020 and 31 March 2020 just after the pandemic was declared. Repeated mobile phone surveys were completed in April (n=1768), May (n=1750), June (n=1525) of 2020, February 2021 (n=1117) and March 2022 (n=1121). Attrition was high given the frequent repeat nature of the survey and possibility of mobile phone numbers being discontinued, but given the unknowns early in the pandemic, the possibility of attrition was weighed against gathering critically needed information.

Survey questions include demographics, knowledge and awareness of COVID-19 transmission and symptoms, perceived risk, socioeconomic effects of the pandemic, health and mental health indicators, gender-based violence and uptake of various protective behaviours such as masking, isolating if sick, testing and vaccination (see questionnaires in online supplemental files 1 and 2). All interviews were conducted by phone by a team of 77 Kenyan surveyors to adhere to national physical distancing policies to prevent the spread of COVID-19. Respondents gave informed consent over the phone before commencing the survey. The same approach was used for all surveys at each time point. Only the questionnaire changed, with questions added or adapted between rounds.

Measures of variables

Relevant variables were selected based on how likely they are to influence behaviour and vulnerability to the effects of COVID-19 and missing values were imputed using the mice R package. The included demographic and behavioural variables were age, gender, educational attainment, marital status, slum, perceived risk, knowledge of symptoms, what myths they believe, disease prevention measures taken, symptoms experienced, social and economic impacts, household size, government assistance received and fears around COVID-19. These variables were used to construct subgroups using unsupervised machine learning, a variable description and summary statistics are included in online supplemental table 1.

Data analysis

The data were analysed using R V.4.1.2. To identify potentially relevant data-dependent subgroups, K-Means clustering was applied. This is an unsupervised, data-driven machine learning method of exploratory analysis often used to determine the number of ‘clusters’ that naturally exist within a high-dimensional space formed by a set of possible covariates. K-Means clustering was run, and three clusters were identified, even with repeated attempts, suggesting distinct subgroups. Silhouette plots (online supplemental figure 1) were visualised to find the appropriate number of clusters, and cluster means of each variable were calculated and tabulated (online supplemental table 2) to display the characteristic breakdown of each cluster.

To assess the value of the K-Means algorithm against more traditional methods, we ran likelihood ratio tests. The likelihood ratio test compared the fit of a model containing demographic covariates of interest alone versus a model with the addition of a cluster indicator. We conducted this analysis twice, once for the outcome of vaccine hesitancy (in 2021, prior to vaccine availability) and again for the outcome of vaccine uptake (in 2022, once the vaccine was widely available). For each of these outcomes of interest, p values were calculated for each model containing a demographic covariate of interest when nested (H0: outcome~intercept+covariate) and complex (H1: outcome~intercept+covariate+cluster indicator), with significant p values indicating that the model with the cluster indicator (complex model) is a better fit for the data. Overall, significant p values for the likelihood ratio tests for each demographic covariate highlight that the cluster variable adds additional, unmeasured information about the subgroups in the dataset versus the demographic covariate alone. Separate models were fit for age, education, marital status, household size, likely to know positive COVID-19 status, knowledge of COVID-19 symptoms, household gender-based violence risk, economic impacts (food insecurity and income loss) and respondent concerns around loss of income due to COVID-19.

After creating the clusters, we used the newly defined cluster variable to compare vaccine hesitancy and vaccine uptake across the three groups using regression forest analysis, an approach which uses non-parametric statistical estimation based on random forests, to estimate the conditional mean of the outcomes of interest. The best-fit tree was found, and the results were visualised as forest plots using ggplot in R. P values were calculated for three-way and pairwise comparisons of the clusters for vaccine acceptance and vaccine uptake using Wald tests.

Patient and public involvement

Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Results

Participants had an average age of 36.5 years (SD=11.3) with 59% of participants between ages 30 and 40 years, 28.7% of participants aged 18–29 years and 12.4% of participants aged 50+ years, over half were female (62.8%) and over half were married (58.5%) (table 1). In 2021, before the vaccine was widely available, most of the respondents (72.1%) said they would be willing to get a vaccine, and about this same percentage had received the vaccine in 2022 once it was available (71.1%). However, this means over a quarter (29%) still had not received the vaccine at the time of the most recent survey.

Table 1

Cohort demographics for round 1 (n=2009) respondents from five informal settlements in Nairobi, Kenya April 2020

Based on the results of the K-Means clustering, each of the three clusters that emerged define slightly different ‘types’ of people. Cluster 1 contained older, married individuals who knew less about common COVID-19 symptoms, were more likely to have believed common myths around COVID-19, and lived in the largest households. Members of this cluster also had the most concern about potential economic harms (fear of food shortages and loss of income) and had a higher perceived risk of COVID-19 early in the pandemic. Cluster 2 primarily consisted of less educated, married or divorced, middle-aged women who were the most economically impacted (skipping meals, loss of income, lack electricity at home, lack social support) at the beginning of the pandemic. These individuals were also the most likely of the three groups to report a perceived risk for gender-based violence from increased tensions at home due to the pandemic. Cluster 3 was the youngest group with higher educational attainment, who had a higher average knowledge of COVID-19 symptoms and expressed fewer fears around the economic impacts of lockdowns early in the pandemic. The mean values of each demographic variable per cluster is presented in online supplemental table 2, and clusters are described in online supplemental table 3. The silhouette plots presented in online supplemental figure 1 highlight the three clusters selected that best capture the variation in the dataset.

We then ran the likelihood ratio tests to compare each variable to see if the fit was better with the variable alone (nested model) or with the addition of the cluster indicator (complex model). All of the likelihood ratio tests except for age were significant, revealing that when included in the model, the clusters defined using the K-Means algorithm are a better fit for the data than individual characteristics alone (table 2 presents for outcome of vaccine hesitancy in survey round 5 and table 3 for the outcome of vaccine uptake in round 6).

Table 2

Likelihood ratio test for vaccine hesitancy (Nairobi survey round 5; February 2021, prior to vaccine rollout in Kenya), where H0: outcome~intercept+covariate and H1: outcome~intercept+covariate+cluster indicator

Table 3

Likelihood ratio test for vaccine uptake (Nairobi survey round 6, March 2022), where H0: outcome~intercept+covariate and H1: outcome~intercept+covariate+cluster indicator

After completing the likelihood ratio tests and concluding that the clusters offer more information than demographic variables alone, we used regression forest analysis to explore the association between cluster identification and the two vaccine-related outcomes. For vaccine acceptance (2021), cluster 3 was significantly less likely to say they would get the vaccine if it became available compared with cluster 1 (41.5% vs 55.3%; p<0.01) and compared with cluster 2 (41.5% vs 50.5%; p=0.014) (figure 1). Once the vaccine became available and participants were asked about vaccine uptake in 2022, cluster 1 was significantly more likely to have gotten at least one dose of the vaccine compared with cluster 2 (78.0% vs 69.3%; p<0.01), and more likely than cluster 3 (78.0% vs 66.4%, p<0.01) (figure 2). Additionally, cluster 2 was more likely than cluster 3 to report wanting to get the vaccine (50.5% vs 41.5%; p=0.014) but not more likely to have gotten it (69.3% vs 66.4%; p=0.41). Of the 29% (n=324) in round 6 who have not gotten the vaccine, about half are hesitant (48%) and about half say they are very likely to still get the vaccine (not shown).

Figure 1

Regression forest analysis plot of vaccine acceptance by cluster, Nairobi, Kenya February 2021 (n=1117). *Cluster 1 and cluster 2 are significantly different than cluster 3, but not each other. **Cluster 3 is significantly lower than cluster 1 and cluster 2 (p<0.01 and p=0.014, respectively).

Figure 2

Regression forest analysis plot of vaccine uptake by cluster, Nairobi, Kenya March 2022 (n=1121). *Cluster 2 and cluster 3 are significantly different than cluster 1, but not each other. **Cluster 1 is significantly higher than cluster 2 and cluster 3 (p<0.01 for both).

Discussion

Our findings suggest that survey respondents from across Nairobi informal settlements fall into three clusters or archetypes each with distinct characteristics that can provide insight into COVID-19 vaccine uptake. Kenya, and our sample specifically, achieved high vaccination coverage (almost three-quarters of respondents). This estimate is in line with a global study that suggested a maximum share of 70% of the total population could be vaccinated, without application of coercive policies or restrictions.36 Our exploratory analyses suggest the cluster indicator adds value to basic models describing characteristics associated with vaccine uptake, capturing unmeasured characteristics of participants that are associated with the outcome. The clusters may be useful to identify archetypes of individuals in informal settlements and suggest avenues to explore for communication with subgroups that have different vulnerabilities and risks. Our results suggest some variation between the three groups of respondents in vaccine uptake, information that can be used to better target or improve messaging to increase awareness and adoption of healthy behaviour.37–42

It is concerning to find that primarily younger, more highly educated individuals, with highest knowledge of COVID-19 transmission in cluster 3 are least likely to have gotten the vaccine. They reported being less concerned with COVID-19 infection and the economic impacts, potentially indicating less urgency due to a lack of perceived risk, as initially risks to the elderly were highlighted. A recent study confirms this link, and that lack of perceived risk and low perceived disease severity were leading factors for not getting vaccinated.42 Relatedly, those in cluster 3 were less likely to know someone who had tested positive for COVID-19 (17% vs 25% in cluster 2 and 27% in cluster 1) reinforcing their lower perceived risk (Supplementary Table 2) (not shown). It is also likely younger people might be exposed to different information through their higher use of social media. Public health messages tailored to youth43 could highlight vaccine safety, as our participants’ main concerns were about side effects or wanting to wait and see if it is safe. Studies in other settings show young people may be concerned about myths regarding vaccine side effects that affect fertility.44 Lastly, it would also be useful to ensure access to vaccines for young people, potentially expanding current outreach to include mobile clinics or other options instead of requiring a visit to a health facility. Nairobi is already employing strategies for vaccine outreach including providing vaccines at social gatherings such as churches or social functions, this may increase uptake.

Respondents from cluster 1, mostly men, defined by large households and with less educational attainment, were found to have more economic anxieties due to the pandemic and less knowledge about COVID-19 symptoms and were most likely to have gotten the vaccine. They were also the most likely to believe common myths around COVID-19 but have the highest perceived risk of infection. This may be because this cluster of individuals reported being more likely to need to travel for work (a factor in considering themselves at high risk of infection).45 They also may hold jobs that require vaccination. Keeping employment by getting vaccinated may have been deemed worth any potential perceived risk of the vaccine, as this cluster also expressed economic concerns related to the pandemic and were responsible for bringing in income to their large households. This is supported by a recent study that found older adults particularly with chronic illnesses had the highest vaccination rates, and that this group was responsive to messages to increase vaccination.46

Individuals in cluster 2, older women who were married or divorced, seem to carry the highest risk of economic hardship and gender-based violence due to the pandemic,37–41 so further investigation to vaccinate and support this group is critical. Cluster 2 comprised older women, with higher risks of food insecurity and gender-based violence due to the pandemic.37–41 This group had a lower rate of vaccine uptake in relation to their willingness or interest in getting the vaccine expressed in February 2021. This could point to issues around accessibility of the vaccine, especially for women who may have more familial responsibilities and fewer financial and transportation resources. Government assistance and social support interventions may provide a solution, as well as outreach through churches and other venues, to reach women who are unable to travel to facilities and face other challenges in food and economic insecurity and potential violence risks.

By defining archetypes or groups in the population, we can better inform and target policy to improve the efficacy of public health and social support interventions. These clusters can also be used to inform future modelling and predictive analysis of the data by providing insight into what characteristics and behaviours define subgroups of interest, particularly in a situation with a novel disease such as COVID-19 where a lot is unknown and where no prior information is available to inform messaging or policy. These are major strengths to this statistical approach as it is an efficient way to let the data guide the analysis without potential bias related to the analysts’ preconceived beliefs about the population. Some limitations of this approach include possible changes to the clustering of the data when run multiple times due to the use of a random starting point and challenges in interpreting the data when clearly defined subgroups are not present. Another limitation to note was the issue of social desirability bias that possibly arose during the phone interviews. Respondents may have felt compelled to provide socially acceptable responses rather than responses that reflect their true attitudes and beliefs, which may clarify some of the inconsistencies observed in vaccine acceptance and uptake. It is also important to note that the cohort of respondents are not truly representative of the underlying population but rather a subset that have a mobile phone and an adolescent household member that participated in recent survey rounds through AGI-K and NISITU. We conducted a small analysis (not shown) that found no significant differences by age or gender in attrition, but that overrounds wealthier participants were slightly less likely to respond, and that participants in Dandora and Kibera slums were slightly more likely to. It is also important to note that vaccine acceptance was recorded before the vaccine was available to the general public, and that there is a gap between the vaccine acceptance and uptake measures during which time perceptions may have shifted.

Overall, respondents in our sample of residents of five informal settlements in Nairobi had higher vaccination rates reported than Nairobi as a whole (nearly 75% compared with the 52% reported for the city47) as of March 2022. Of the unvaccinated participants, about half reported interest in receiving the vaccine. This suggests that with additional access and messaging almost all individuals can be vaccinated. We also found that most respondents had received more than one dose, although about 1 in 10 had only received the first dose, suggesting additional outreach is needed to make sure everyone is fully vaccinated. As vaccine immunity wanes and new variants emerge, continued messaging, new vaccinations, and uptake of other non-pharmaceutical interventions to prevent transmission will be critical.48 49 Studies to understand how to improve governance to increase vaccination and to determine optimal levels of vaccination are important to inform policy.50–52 K-Means clustering may be a useful statistical tool when survey data are available to rapidly understand variation in the population and to highlight different potential approaches to messaging and outreach. This paper summarises our methodology and results to provide a starting point for more investigation into targeted vaccination strategies.

Conclusion

Machine learning techniques, such as K-Means clustering, are useful to investigate the factors that may predict behaviours related to disease prevention and mitigation. By letting the data guide the analysis and identifying naturally occurring subgroups, we identified characteristics associated with vaccine hesitancy and vaccine uptake, useful for informing policies and messages to target different vulnerable groups within a population. Our results highlight that the highest risk individuals (cluster 1) are most likely to get vaccinated, but that younger, more educated respondents (cluster 3) may require additional messaging and persuasion. One group identified (cluster 2) faced many different challenges and barriers to vaccination and in economic security, food security and risk of violence. This group may require more ways to access the vaccine and may require additional access to social support systems. Based on the results of this study, K-Means clustering may be a useful tool to explore to better identify and target vulnerable groups in public health policy at a national and global level. Although this study primarily focused on vaccine acceptance and uptake, these methods can be applied to a wide range of public health behaviours in future use.

Ethics statements

Patient consent for publication

Ethics approval

This study was approved by The Population Council IRB (p936) and AMREFESRC (P803/2020). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

The authors acknowledge all of the work done by the Population Council field team to collect these surveys.

References

Supplementary materials

Footnotes

  • Twitter @JessiePinchoff

  • Contributors NR conceptualised the project, conducted the data analysis and led development and writing of the manuscript. JP conceptualised the project and supported development and writing of the manuscript. CBB developed and led the data analysis and review of the manuscript. EB and TA supported with conceptualisation of the project, interpretation of results and review of the manuscript. EM, DM and FM supported with data collection, project management, data cleaning and interpretation of results, including review of the manuscript. KA managed the project and data collection, supported with interpretation of results and review of the manuscript. KA is responsible for the overall content as the guarantor.

  • Funding This study was funded by UK Department for International Development through Innovations for Poverty Action Peace & Recovery COVID-19 rapid response grant (grant MIT0019-X15).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.