Couldn't the high numbers of students with self-rated mental health problems or considering leaving med school simply reflect the fact that they were asked during the pandemic? Incredibly worrying for everyone but perhaps even more so for those having to cope with the effects without being properly trained yet.
Rauf et al. examined modifiable risk factors for cardiovascular diseases (CVD) in participants without ischaemic heart disease (1). There is a high prevalence of modifiable risk factors for atherosclerotic CVD, such as hypertension, diabetes and obesity, especially in women. In contrast, atherosclerotic CVD risk score was higher in men. The authors speculated that risk factors such as age, gender and blood lipid profile may also contribute to the association between the prevalence of risk factors and atherosclerotic CVD risk score. I present a comment with special reference to sex difference.
Pana et al. reported that women were undertreated compared to men after myocardial infarction, but better survival and outcome benefits were observed (2). This result may be partly explained as follows. Hetherington et al. summarized that women in pre-menopause periods would be protected from CVD by estrogen exposure (3). I understand that the risk assessment for CVD by using modifiable risk factors should be conducted by sex stratification. In addition, ethnic difference should be considered.
References
1. Rauf R, Khan MN, Sial JA, et al. Primary prevention of cardiovascular diseases among women in a South Asian population: a descriptive study of modifiable risk factors. BMJ Open 2024;14(11):e089149.
2. Pana TA, Mamas MA, Myint PK, et al. Sex differences in myocardial infarction care and outcomes: a longitudinal Scottish National Data-Lin...
Rauf et al. examined modifiable risk factors for cardiovascular diseases (CVD) in participants without ischaemic heart disease (1). There is a high prevalence of modifiable risk factors for atherosclerotic CVD, such as hypertension, diabetes and obesity, especially in women. In contrast, atherosclerotic CVD risk score was higher in men. The authors speculated that risk factors such as age, gender and blood lipid profile may also contribute to the association between the prevalence of risk factors and atherosclerotic CVD risk score. I present a comment with special reference to sex difference.
Pana et al. reported that women were undertreated compared to men after myocardial infarction, but better survival and outcome benefits were observed (2). This result may be partly explained as follows. Hetherington et al. summarized that women in pre-menopause periods would be protected from CVD by estrogen exposure (3). I understand that the risk assessment for CVD by using modifiable risk factors should be conducted by sex stratification. In addition, ethnic difference should be considered.
References
1. Rauf R, Khan MN, Sial JA, et al. Primary prevention of cardiovascular diseases among women in a South Asian population: a descriptive study of modifiable risk factors. BMJ Open 2024;14(11):e089149.
2. Pana TA, Mamas MA, Myint PK, et al. Sex differences in myocardial infarction care and outcomes: a longitudinal Scottish National Data-Linkage Study. Eur J Prev Cardiol 2024. doi: 10.1093/eurjpc/zwae333. [Epub ahead of print]
3. Hetherington K, Thomas J, Nicholls SJ, et al. Unique cardiometabolic factors in women that contribute to modified cardiovascular disease risk. Eur J Pharmacol 2024;984:177031.
We read the protocol paper for the NISA trial (1) with interest. One issue that we believe warrants further consideration relates to the inclusion of infants from a multiple birth. As this trial is being conducted in preterm infants less than 32 weeks’ gestation and multiple births are not part of the exclusion criteria, it is likely that a relatively high percentage of eligible infants will be from a multiple birth and their sibling(s) may also be eligible. Multiples generally provide less information about the effect of an intervention than unrelated singletons, due to similarities in the outcomes of infants from the same birth. This has implications for both the trial design and analysis that may not have been fully considered here.
In our recent systematic review (2), we found that few published trials of preterm infants adequately account for multiple births. We are committed to improving practice around this issue and commend the authors for their consideration of multiple births in both the randomisation (stating that “twins or multiple births will be randomly assigned to each group, which means they will be randomly assigned according to birth order”; p3) and data collection tools (where the number of fetuses is recorded; online supplementary material 2). We further encourage the authors to:
1. Consider the sample size/power implications of including multiple births in the trial. Our freely available online calculator may be useful for this purpose (3...
We read the protocol paper for the NISA trial (1) with interest. One issue that we believe warrants further consideration relates to the inclusion of infants from a multiple birth. As this trial is being conducted in preterm infants less than 32 weeks’ gestation and multiple births are not part of the exclusion criteria, it is likely that a relatively high percentage of eligible infants will be from a multiple birth and their sibling(s) may also be eligible. Multiples generally provide less information about the effect of an intervention than unrelated singletons, due to similarities in the outcomes of infants from the same birth. This has implications for both the trial design and analysis that may not have been fully considered here.
In our recent systematic review (2), we found that few published trials of preterm infants adequately account for multiple births. We are committed to improving practice around this issue and commend the authors for their consideration of multiple births in both the randomisation (stating that “twins or multiple births will be randomly assigned to each group, which means they will be randomly assigned according to birth order”; p3) and data collection tools (where the number of fetuses is recorded; online supplementary material 2). We further encourage the authors to:
1. Consider the sample size/power implications of including multiple births in the trial. Our freely available online calculator may be useful for this purpose (3).
2. Select a method of statistical analysis that accounts for clustering due to multiple births. Commonly used approaches are generalised estimating equations and mixed effects models (2).
References:
(1) Gao J, Xiong H, Nie P, et al. Application of a new type of double-lumen endotracheal tube in preterm infants with respiratory distress syndrome: study protocol for a non-inferiority randomised controlled trial (NISA). BMJ Open 2024; 14:e083508. doi:10.1136/bmjopen-2023-083508
(2) Robledo KP, Libesman S, Yelland LN. We should do better in accounting for multiple births in neonatal randomised trials: a methodological systematic review. Arch Dis Child Fetal Neonatal Ed 2024; 0:F1–F7 (online ahead of print). doi: 10.1136/archdischild-2024-327983
(3) Yelland LN, Sullivan TR, Collins CT, et al. Accounting for twin births in sample size calculations for randomised trials. Paediatr Perinat Epidemiol 2018; 32:380–7. doi: 10.1111/ppe.12471
There have been significant therapeutic improvements for certain cancers. Although the increased incidence of thyroid cancer and prostate cancer can be entirely explained by detection, the increased incidence of breast cancer and melanoma is largely real (1-4), as shown by stratification of age groups based on mortality or incidence.
1) Corcos D, 2017. Breast cancer incidence as a function of the number of previous mammograms: analysis of the NHS screening programme. BioRxiv doi.org/10.1101/238527
2) Corcos D & Bleyer A, 2020. Epidemiologic signature in cancer: Prostate vs Breast. New England Journal of Medicine, 382(1):96
3) Corcos D, 2020. 2nd International DKFZ Conference on Cancer Prevention
4) Corcos D & Bleyer A, 2021. Cause of the Decades of Increase in Cutaneous Melanoma: Overdiagnosis, Ultraviolet Rays, Non-Ultraviolet Radiation? ResearchGate.
We read with interest the article “UK Medical Students' mental health and their intention to drop out”. It reiterates the growing evidence of the critical intersection of mental health and the need for effective wellbeing support to prevent medical students from dropping out (1). It highlighted that medical schools need to encourage students to seek help to reduce the stigma around mental health (1). However, our recent survey from across 43 medical schools, with 534 responses found that the opposite is currently occurring across the UK (2). Our survey revealed that current medical school wellbeing provisions are largely performative, reduced to tick-box exercises that fail to deliver meaningful support.
In our survey, we asked medical students how well supported they felt throughout their studies. It was disappointing that only 45% reported they had easily accessible psychological support. More alarming was the pervasive fears amongst students with some thinking “people get kicked off if it seems like they are struggling with mental health to prevent more suicides.” This perceived threat directly undermines the suggestion that students should be encouraged to seek help while there is still an unconscious stigma around mental health; perpetuating a dangerous cycle of silence.
A toxic culture has taken root in medical education. One medical student stated in our survey “no one really cares what we do as being burnt out is just part o...
We read with interest the article “UK Medical Students' mental health and their intention to drop out”. It reiterates the growing evidence of the critical intersection of mental health and the need for effective wellbeing support to prevent medical students from dropping out (1). It highlighted that medical schools need to encourage students to seek help to reduce the stigma around mental health (1). However, our recent survey from across 43 medical schools, with 534 responses found that the opposite is currently occurring across the UK (2). Our survey revealed that current medical school wellbeing provisions are largely performative, reduced to tick-box exercises that fail to deliver meaningful support.
In our survey, we asked medical students how well supported they felt throughout their studies. It was disappointing that only 45% reported they had easily accessible psychological support. More alarming was the pervasive fears amongst students with some thinking “people get kicked off if it seems like they are struggling with mental health to prevent more suicides.” This perceived threat directly undermines the suggestion that students should be encouraged to seek help while there is still an unconscious stigma around mental health; perpetuating a dangerous cycle of silence.
A toxic culture has taken root in medical education. One medical student stated in our survey “no one really cares what we do as being burnt out is just part of being a med student” or that they worry a formal diagnosis would lead to fitness to practice hearings and even suspensions. It is clear a negative culture persists in medical schools, one that must change if we are to retain medical students and strengthen our workforce. While the NHS long-term workforce plan aims to double the number of medical students, surely the first step is looking after the ones we have (3).
The solution to tackling mental health issues amongst medical students is more challenging than ever, with increasing pressures from lack of psychological support, poor funding, intense academic rigor, and negative cultures within medical schools. Combined with poor attrition of doctors, there is a dire need for change. The current status quo is inadequate, and the current culture of ‘tick-box’ exercises is not helpful for our medical students in their mental health struggles. It is no longer useful to be collecting data if it does not lead to meaningful changes.
We know what the ideal wellbeing support system looks like. The BMA has provided clear guidance aligned with the GMC’s “Guidance on undergraduate clinical placements” on essential measures medical schools should implement (2,4). What’s missing is not knowledge of the solution, but rather the will to enact it. There needs to be a national focus on the wellbeing of medical students and for this we will need key stakeholders to ensure medical schools take responsibility, forcing them to step up.
After all, to be able to study medicine, one should not just be able to look after others but also themselves. Medical schools must transform from passive observers into champions of wellbeing, leading a revolution in support or risk perpetuating toxic environments that fuel mental health crises, drive students to leave and cripple the future workforce.
Yours sincerely,
Ria Bansal
Deputy Co-Chair, BMA Medical Students Committee (2023/2025)
To the editor:
Kacew et al review “reversals” of therapeutic guidelines for COVID-19 disease issued by the National Institutes of Health (NIH) and authorizations granted by the Food and Drug Administration (FDA). However, the FDA issued only emergency use authorizations, which are based “on a reasonable belief that the product may be effective…without waiting for all the information that would be needed for an FDA approval.” This is a very different standard than used in NIH guidelines.
In claiming a “reversal” in the case of CP, the authors rely on a single meta-analysis (Mihalek et al, ) that focused entirely on all-cause mortality, reviewing just 19 randomized controlled trials (RCTs). By contrast, the meta-analysis by Senefeld et al included 39 RCTs, finding a 13% reduction in mortality with CP, more than twice the reduction estimated by Mihalek et al.
Some RCTs of CP used insufficient antibody or treated too late for therapeutic effectiveness. Mihalek et al echoed these concerns in the RCT’s they reviewed, asserting that “It is possible that in some of the trials in included in our meta-analysis the antibody titer was not high enough to lead to any clinical change” and adding that “we suspect that convalescent plasma may be more effective in reducing clinical progression when administered early in the clinical course.”
The Senefeld et al meta-analysis found that mortality was 15% lower in recipients of plasma with high antibody conten...
To the editor:
Kacew et al review “reversals” of therapeutic guidelines for COVID-19 disease issued by the National Institutes of Health (NIH) and authorizations granted by the Food and Drug Administration (FDA). However, the FDA issued only emergency use authorizations, which are based “on a reasonable belief that the product may be effective…without waiting for all the information that would be needed for an FDA approval.” This is a very different standard than used in NIH guidelines.
In claiming a “reversal” in the case of CP, the authors rely on a single meta-analysis (Mihalek et al, ) that focused entirely on all-cause mortality, reviewing just 19 randomized controlled trials (RCTs). By contrast, the meta-analysis by Senefeld et al included 39 RCTs, finding a 13% reduction in mortality with CP, more than twice the reduction estimated by Mihalek et al.
Some RCTs of CP used insufficient antibody or treated too late for therapeutic effectiveness. Mihalek et al echoed these concerns in the RCT’s they reviewed, asserting that “It is possible that in some of the trials in included in our meta-analysis the antibody titer was not high enough to lead to any clinical change” and adding that “we suspect that convalescent plasma may be more effective in reducing clinical progression when administered early in the clinical course.”
The Senefeld et al meta-analysis found that mortality was 15% lower in recipients of plasma with high antibody content and - in RCTs, matched-cohort studies and case series - mortality was 37% lower when treatment was initiated early in the disease course. They further found that the six outpatient RCTs of CP (only two of which were reviewed by Mihalek et al) showed a 35% reduction in hospitalization.
Kacew et al make no mention of patients who might especially benefit from CP, but Mihalek et al recognized that “patients with coexisting immunodeficiency…may benefit from convalescent plasma therapy.” The FDA’s EUA for CP was affirmed, not reversed, by the full approval it gave to high titer COVID-19 CP “in individuals who are immunocompromised” in 2024.
It is a mistake to lump CP plasma with clearly ineffective treatments such as ivermectin and hydroxychloroquine, ignoring the fact that convalescent plasma almost certainly saved thousands of lives during the COVID pandemic, and failing to recognize that convalescent plasma will be the first available therapeutic option in any new pandemic.
My name is Gengchen Ye, and I am the first author of the paper titled "The Long COVID Symptoms and Severity Score: Development, Validation, and Application," published in [Value in Health, 2024]. I recently read the systematic review published in your journal, "Patient-reported outcome measures for post-COVID-19 condition: a systematic review of instruments and measurement properties".
Upon reviewing the supplemental materials and the main text of the article, I noticed that the construct validity results for the Long COVID Symptoms and Severity Score (LC-SSS) are described as having "1 out of 5 hypotheses confirmed." However, in our original publication, we reported that all five hypotheses were confirmed, demonstrating strong construct validity through significant correlations with quality of life and psychological measures.
This discrepancy suggests there may have been an inadvertent misunderstanding or misinterpretation of our findings. Accurate representation of the LC-SSS’s measurement properties is crucial for researchers and clinicians who rely on your systematic review for informed decision-making.
I kindly request that your team review this matter and consider issuing a correction or clarification to accurately reflect the construct validity results of the LC-SSS as reported in our original study.
Thank you for your attention to this matter and for your valuable contributions to the field. I am available to p...
My name is Gengchen Ye, and I am the first author of the paper titled "The Long COVID Symptoms and Severity Score: Development, Validation, and Application," published in [Value in Health, 2024]. I recently read the systematic review published in your journal, "Patient-reported outcome measures for post-COVID-19 condition: a systematic review of instruments and measurement properties".
Upon reviewing the supplemental materials and the main text of the article, I noticed that the construct validity results for the Long COVID Symptoms and Severity Score (LC-SSS) are described as having "1 out of 5 hypotheses confirmed." However, in our original publication, we reported that all five hypotheses were confirmed, demonstrating strong construct validity through significant correlations with quality of life and psychological measures.
This discrepancy suggests there may have been an inadvertent misunderstanding or misinterpretation of our findings. Accurate representation of the LC-SSS’s measurement properties is crucial for researchers and clinicians who rely on your systematic review for informed decision-making.
I kindly request that your team review this matter and consider issuing a correction or clarification to accurately reflect the construct validity results of the LC-SSS as reported in our original study.
Thank you for your attention to this matter and for your valuable contributions to the field. I am available to provide any further information or clarification if needed.
Warm regards,
Gengchen Ye
The First Affiliated Hospital of Xi’an Jiaotong University
Email: yegengchen@stu.xjtu.edu.cn
1. Ye G, Zhu Y, Bao W, et al. The Long Coronavirus Disease (COVID) Symptom and Severity Score: Development, Validation, and Application. Value Health J Int Soc Pharmacoeconomics Outcomes Res. Published online April 17, 2024:S1098-3015(24)02341-6. doi:10.1016/j.jval.2024.04.009
Thank you for your message and for sharing your comments. In response, we would like to explain our methodological approach with regard to the assessment of construct validity.
Our systematic review follows the COSMIN methodology (Terwee et al. 2018). For the evaluation of construct validity, the review team is required to formulate generic hypotheses about expected relations-hips between the PROM under review and other well-established, high-quality comparator instruments commonly used in the field (Prinsen et al. 2018, Table 4). This approach does not aim to determine whe-ther the authors' original hypotheses were confirmed in the validation studies, but to determine if these generic hypotheses are supported. For sufficient construct validity, 75% of the hypotheses must be confir-med.
In accordance with the COSMIN guidelines, correlations with PROMs measuring similar constructs should be above 0.5, while correlations with PROMs measuring related but dissimilar constructs should be between 0.3 and 0.5. In the study by Gengchen Ye et al., the PROMs used to assess the construct validity of the Long COVID Symptom and Severity Score (LC-SSS) measure related but dissimilar constructs, inclu-ding the EuroQol 5-Dimension 5-Level (EQ-5D-5L), EuroQol Visual Analogue Scale (EQ-VAS), Patient Health Questionnaire-9 (PHQ-9), Insomnia Severity Index (ISI), and Beck Anxiety Inventory (BAI). Therefore, corre-lations in the range of 0.3-0.5 are hypothesized. However, i...
Thank you for your message and for sharing your comments. In response, we would like to explain our methodological approach with regard to the assessment of construct validity.
Our systematic review follows the COSMIN methodology (Terwee et al. 2018). For the evaluation of construct validity, the review team is required to formulate generic hypotheses about expected relations-hips between the PROM under review and other well-established, high-quality comparator instruments commonly used in the field (Prinsen et al. 2018, Table 4). This approach does not aim to determine whe-ther the authors' original hypotheses were confirmed in the validation studies, but to determine if these generic hypotheses are supported. For sufficient construct validity, 75% of the hypotheses must be confir-med.
In accordance with the COSMIN guidelines, correlations with PROMs measuring similar constructs should be above 0.5, while correlations with PROMs measuring related but dissimilar constructs should be between 0.3 and 0.5. In the study by Gengchen Ye et al., the PROMs used to assess the construct validity of the Long COVID Symptom and Severity Score (LC-SSS) measure related but dissimilar constructs, inclu-ding the EuroQol 5-Dimension 5-Level (EQ-5D-5L), EuroQol Visual Analogue Scale (EQ-VAS), Patient Health Questionnaire-9 (PHQ-9), Insomnia Severity Index (ISI), and Beck Anxiety Inventory (BAI). Therefore, corre-lations in the range of 0.3-0.5 are hypothesized. However, in four out of five cases, the observed correlati-ons were significantly higher, indicating, for example, potential redundancy.
Furthermore, the hypothesis that the LC-SSS is related to psychological status seems to be a content-related hypothesis rather than a methodological one. While this may be important for the practical appli-cation and interpretation of the questionnaire, it does not support the validity of the measure in assessing the intended construct. Post COVID-19 condition indeed presents as a complex clinical picture, thus the selection of suitable instruments to assess construct validity is challenging and highlights the need for further research.
Terwee, C. B., Prinsen, C. A. C., Chiarotto, A., Westerman, M. J., Patrick, D. L., Alonso, J., et al. (2018). COS-MIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi stu-dy. Quality of life research: an international journal of quality of life aspects of treatment, care and rehabi-litation, 27(5), 1159–1170. https://doi:10.1007/s11136-018-1829-0
Adjobimey et al. examined the association between occupational psychosocial factors and hypertension (1). Occupational stress, age ≥25 years, increased body mass index, permanent worker status, and seniority in the textile sector >5 years were significant risk factors of hypertension. To prevent hypertension, job strain and recognition at work in the ginning plants sector should be corrected through occupational health promotion. I have some comments.
It is generally accepted that physical and mental health status are closely related. Li et al. reported that the adjusted hazard ratio (HR) (95% confidence interval [CI]) of workplace discrimination for incident hypertension was 1.54 (1.11-2.13) (2). Clausen et al. reported that the adjusted odds ratio (95% CI) of exposure to discrimination for the onset of depressive disorders was 2.73 (1.38-5.40) (3). Regarding the causal relationship, Jeon et al. reported that the adjusted HRs (95% CIs) of moderate and severe depressive symptoms for incident hypertension were 1.05 (1.01-1.11) and 1.12 (1.03-1.20), respectively (4). By applying time-dependent models, corresponding HRs (95% CI) were 1.12 (1.02-1.24) and 1.29 (1.10-1.50), respectively. They also clarified that high blood pressure was associated with decreased risk for developing depressive symptoms. Taking together, I suspect that depressive status would mediate the effect of workplace stress on subsequent hypertension. Perceived job stress may be a risk factor of d...
Adjobimey et al. examined the association between occupational psychosocial factors and hypertension (1). Occupational stress, age ≥25 years, increased body mass index, permanent worker status, and seniority in the textile sector >5 years were significant risk factors of hypertension. To prevent hypertension, job strain and recognition at work in the ginning plants sector should be corrected through occupational health promotion. I have some comments.
It is generally accepted that physical and mental health status are closely related. Li et al. reported that the adjusted hazard ratio (HR) (95% confidence interval [CI]) of workplace discrimination for incident hypertension was 1.54 (1.11-2.13) (2). Clausen et al. reported that the adjusted odds ratio (95% CI) of exposure to discrimination for the onset of depressive disorders was 2.73 (1.38-5.40) (3). Regarding the causal relationship, Jeon et al. reported that the adjusted HRs (95% CIs) of moderate and severe depressive symptoms for incident hypertension were 1.05 (1.01-1.11) and 1.12 (1.03-1.20), respectively (4). By applying time-dependent models, corresponding HRs (95% CI) were 1.12 (1.02-1.24) and 1.29 (1.10-1.50), respectively. They also clarified that high blood pressure was associated with decreased risk for developing depressive symptoms. Taking together, I suspect that depressive status would mediate the effect of workplace stress on subsequent hypertension. Perceived job stress may be a risk factor of depression, which would lead to an increased risk of hypertension.
References
1. Adjobimey M, Houehanou CY, Cisse IM, et al. Work environment and hypertension in industrial settings in Benin in 2019: a cross-sectional study. BMJ Open 2024;14(3):e078433.
2. Li J, Matthews TA, Clausen T, et al. Workplace discrimination and risk of hypertension: Findings from a prospective cohort study in the United States. J Am Heart Assoc. 2023;12(9):e027374.
3. Clausen T, Rugulies R, Li J. Workplace discrimination and onset of depressive disorders in the Danish workforce: A prospective study. J Affect Disord. 2022;319:79-82.
4. Jeon SW, Chang Y, Lim SW, et al. Bidirectional association between blood pressure and depressive symptoms in young and middle-age adults: A cohort study. Epidemiol Psychiatr Sci 2020;29:e142.
With great interest we read the recently published article titled "ChatGPT (GPT-4) versus doctors on complex cases of the Swedish family medicine specialist examination: an observational comparative study" by Arvidsson et al. [1] The study provides valuable insights into the capabilities and limitations of generative AI models like GPT-4 in complex medical decision-making scenarios. However, the study's approach, which relies on GPT-4 as a general-purpose model without any domain-specific fine-tuning or optimised prompting strategies, presents an inherent limitation. Deploying an AI system in such a manner is fundamentally inferior and does not align with best practices in any industry. In real-world applications, AI models are typically customised, fine-tuned, or integrated with structured knowledge bases to enhance their relevance and reliability in specific domains. The zero-shot prompting approach used in this study, while convenient for initial evaluation, does not reflect the practical implementation of AI solutions in healthcare or other high-stakes industries.
In the medical field, AI applications must be trained and validated within a well-defined context, leveraging domain-specific data, tailored prompts, and reinforcement learning with human feedback to improve performance over time. Successful AI implementation in healthcare involves collaboration with medical professionals to refine model outputs, ensuring that the AI system aligns with c...
With great interest we read the recently published article titled "ChatGPT (GPT-4) versus doctors on complex cases of the Swedish family medicine specialist examination: an observational comparative study" by Arvidsson et al. [1] The study provides valuable insights into the capabilities and limitations of generative AI models like GPT-4 in complex medical decision-making scenarios. However, the study's approach, which relies on GPT-4 as a general-purpose model without any domain-specific fine-tuning or optimised prompting strategies, presents an inherent limitation. Deploying an AI system in such a manner is fundamentally inferior and does not align with best practices in any industry. In real-world applications, AI models are typically customised, fine-tuned, or integrated with structured knowledge bases to enhance their relevance and reliability in specific domains. The zero-shot prompting approach used in this study, while convenient for initial evaluation, does not reflect the practical implementation of AI solutions in healthcare or other high-stakes industries.
In the medical field, AI applications must be trained and validated within a well-defined context, leveraging domain-specific data, tailored prompts, and reinforcement learning with human feedback to improve performance over time. Successful AI implementation in healthcare involves collaboration with medical professionals to refine model outputs, ensuring that the AI system aligns with clinical guidelines, best practices, and local healthcare policies. Furthermore, industries such as finance, manufacturing, and legal services do not deploy AI systems based on a one-size-fits-all approach. Instead, they develop specialised models that undergo extensive domain adaptation and real-world validation [2]. This study's reliance on zero-shot performance without such adaptations risks underestimating the true potential of AI in primary care settings. Evaluating AI performance without domain-specific tuning does not offer a fair comparison with human expertise, which is built upon years of specialised training and practical experience.
Additionally, the authors suggests that the use of GPT-4 in this way simulates a scenario where clinicians seek to get input on the management of patients by posting real patient case summaries in GPT-4. However, this behaviour is inherently problematic, as it risks exposing identifiable patient data, leading to significant privacy and ethical concerns. General practitioners should be made aware that relying on GPT-4 in this manner is not appropriate due to privacy risks. It is essential to ensure compliance with data protection regulations and ethical guidelines by avoiding the sharing of sensitive patient information with AI systems.
In light of these considerations, we suggest future research in this area should incorporate domain-specific fine-tuning, where AI models are trained using real-world primary care data and localised guidelines to enhance accuracy and contextual relevance. Optimised prompt engineering should also be employed, utilising structured prompts that guide the AI model to generate responses more aligned with clinical reasoning processes. Additionally, integrating AI with Clinical Decision Support Systems (CDSS) would combine AI capabilities with existing healthcare IT systems to provide augmented support rather than standalone decision-making. Finally, conducting comparative evaluations with fine-tuned models would allow for a better understanding of GPT-4's performance against customised AI solutions tailored for medical application
While the findings of the current study highlight the current limitations of GPT-4 in primary care, it is important to acknowledge that AI is an evolving field. Future advancements, combined with targeted implementation strategies, hold significant promise in supporting healthcare professionals and improving patient outcomes. Addressing the shortcomings of the zero-shot approach and adopting best practices from other industries can pave the way for more effective AI integration in healthcare.
References
1. Arvidsson R, Gunnarsson R, Entezarjou A, et al. ChatGPT (GPT-4) versus doctors on complex cases of the Swedish family medicine specialist examination: an observational comparative study. BMJ Open 2024;14:e086148. doi:10.1136/ bmjopen-2024-086148
2. Huyen C. AI Engineering; Building applications with foundation models. O'Reilly Media, Inc. December 2024
Couldn't the high numbers of students with self-rated mental health problems or considering leaving med school simply reflect the fact that they were asked during the pandemic? Incredibly worrying for everyone but perhaps even more so for those having to cope with the effects without being properly trained yet.
Dear editor,
Rauf et al. examined modifiable risk factors for cardiovascular diseases (CVD) in participants without ischaemic heart disease (1). There is a high prevalence of modifiable risk factors for atherosclerotic CVD, such as hypertension, diabetes and obesity, especially in women. In contrast, atherosclerotic CVD risk score was higher in men. The authors speculated that risk factors such as age, gender and blood lipid profile may also contribute to the association between the prevalence of risk factors and atherosclerotic CVD risk score. I present a comment with special reference to sex difference.
Pana et al. reported that women were undertreated compared to men after myocardial infarction, but better survival and outcome benefits were observed (2). This result may be partly explained as follows. Hetherington et al. summarized that women in pre-menopause periods would be protected from CVD by estrogen exposure (3). I understand that the risk assessment for CVD by using modifiable risk factors should be conducted by sex stratification. In addition, ethnic difference should be considered.
References
Show More1. Rauf R, Khan MN, Sial JA, et al. Primary prevention of cardiovascular diseases among women in a South Asian population: a descriptive study of modifiable risk factors. BMJ Open 2024;14(11):e089149.
2. Pana TA, Mamas MA, Myint PK, et al. Sex differences in myocardial infarction care and outcomes: a longitudinal Scottish National Data-Lin...
We read the protocol paper for the NISA trial (1) with interest. One issue that we believe warrants further consideration relates to the inclusion of infants from a multiple birth. As this trial is being conducted in preterm infants less than 32 weeks’ gestation and multiple births are not part of the exclusion criteria, it is likely that a relatively high percentage of eligible infants will be from a multiple birth and their sibling(s) may also be eligible. Multiples generally provide less information about the effect of an intervention than unrelated singletons, due to similarities in the outcomes of infants from the same birth. This has implications for both the trial design and analysis that may not have been fully considered here.
In our recent systematic review (2), we found that few published trials of preterm infants adequately account for multiple births. We are committed to improving practice around this issue and commend the authors for their consideration of multiple births in both the randomisation (stating that “twins or multiple births will be randomly assigned to each group, which means they will be randomly assigned according to birth order”; p3) and data collection tools (where the number of fetuses is recorded; online supplementary material 2). We further encourage the authors to:
Show More1. Consider the sample size/power implications of including multiple births in the trial. Our freely available online calculator may be useful for this purpose (3...
There have been significant therapeutic improvements for certain cancers. Although the increased incidence of thyroid cancer and prostate cancer can be entirely explained by detection, the increased incidence of breast cancer and melanoma is largely real (1-4), as shown by stratification of age groups based on mortality or incidence.
1) Corcos D, 2017. Breast cancer incidence as a function of the number of previous mammograms: analysis of the NHS screening programme. BioRxiv doi.org/10.1101/238527
2) Corcos D & Bleyer A, 2020. Epidemiologic signature in cancer: Prostate vs Breast. New England Journal of Medicine, 382(1):96
3) Corcos D, 2020. 2nd International DKFZ Conference on Cancer Prevention
4) Corcos D & Bleyer A, 2021. Cause of the Decades of Increase in Cutaneous Melanoma: Overdiagnosis, Ultraviolet Rays, Non-Ultraviolet Radiation? ResearchGate.
Dear Editor,
We read with interest the article “UK Medical Students' mental health and their intention to drop out”. It reiterates the growing evidence of the critical intersection of mental health and the need for effective wellbeing support to prevent medical students from dropping out (1). It highlighted that medical schools need to encourage students to seek help to reduce the stigma around mental health (1). However, our recent survey from across 43 medical schools, with 534 responses found that the opposite is currently occurring across the UK (2). Our survey revealed that current medical school wellbeing provisions are largely performative, reduced to tick-box exercises that fail to deliver meaningful support.
In our survey, we asked medical students how well supported they felt throughout their studies. It was disappointing that only 45% reported they had easily accessible psychological support. More alarming was the pervasive fears amongst students with some thinking “people get kicked off if it seems like they are struggling with mental health to prevent more suicides.” This perceived threat directly undermines the suggestion that students should be encouraged to seek help while there is still an unconscious stigma around mental health; perpetuating a dangerous cycle of silence.
A toxic culture has taken root in medical education. One medical student stated in our survey “no one really cares what we do as being burnt out is just part o...
Show MoreTo the editor:
Show MoreKacew et al review “reversals” of therapeutic guidelines for COVID-19 disease issued by the National Institutes of Health (NIH) and authorizations granted by the Food and Drug Administration (FDA). However, the FDA issued only emergency use authorizations, which are based “on a reasonable belief that the product may be effective…without waiting for all the information that would be needed for an FDA approval.” This is a very different standard than used in NIH guidelines.
In claiming a “reversal” in the case of CP, the authors rely on a single meta-analysis (Mihalek et al, ) that focused entirely on all-cause mortality, reviewing just 19 randomized controlled trials (RCTs). By contrast, the meta-analysis by Senefeld et al included 39 RCTs, finding a 13% reduction in mortality with CP, more than twice the reduction estimated by Mihalek et al.
Some RCTs of CP used insufficient antibody or treated too late for therapeutic effectiveness. Mihalek et al echoed these concerns in the RCT’s they reviewed, asserting that “It is possible that in some of the trials in included in our meta-analysis the antibody titer was not high enough to lead to any clinical change” and adding that “we suspect that convalescent plasma may be more effective in reducing clinical progression when administered early in the clinical course.”
The Senefeld et al meta-analysis found that mortality was 15% lower in recipients of plasma with high antibody conten...
My name is Gengchen Ye, and I am the first author of the paper titled "The Long COVID Symptoms and Severity Score: Development, Validation, and Application," published in [Value in Health, 2024]. I recently read the systematic review published in your journal, "Patient-reported outcome measures for post-COVID-19 condition: a systematic review of instruments and measurement properties".
Upon reviewing the supplemental materials and the main text of the article, I noticed that the construct validity results for the Long COVID Symptoms and Severity Score (LC-SSS) are described as having "1 out of 5 hypotheses confirmed." However, in our original publication, we reported that all five hypotheses were confirmed, demonstrating strong construct validity through significant correlations with quality of life and psychological measures.
This discrepancy suggests there may have been an inadvertent misunderstanding or misinterpretation of our findings. Accurate representation of the LC-SSS’s measurement properties is crucial for researchers and clinicians who rely on your systematic review for informed decision-making.
I kindly request that your team review this matter and consider issuing a correction or clarification to accurately reflect the construct validity results of the LC-SSS as reported in our original study.
Thank you for your attention to this matter and for your valuable contributions to the field. I am available to p...
Show MoreThank you for your message and for sharing your comments. In response, we would like to explain our methodological approach with regard to the assessment of construct validity.
Our systematic review follows the COSMIN methodology (Terwee et al. 2018). For the evaluation of construct validity, the review team is required to formulate generic hypotheses about expected relations-hips between the PROM under review and other well-established, high-quality comparator instruments commonly used in the field (Prinsen et al. 2018, Table 4). This approach does not aim to determine whe-ther the authors' original hypotheses were confirmed in the validation studies, but to determine if these generic hypotheses are supported. For sufficient construct validity, 75% of the hypotheses must be confir-med.
In accordance with the COSMIN guidelines, correlations with PROMs measuring similar constructs should be above 0.5, while correlations with PROMs measuring related but dissimilar constructs should be between 0.3 and 0.5. In the study by Gengchen Ye et al., the PROMs used to assess the construct validity of the Long COVID Symptom and Severity Score (LC-SSS) measure related but dissimilar constructs, inclu-ding the EuroQol 5-Dimension 5-Level (EQ-5D-5L), EuroQol Visual Analogue Scale (EQ-VAS), Patient Health Questionnaire-9 (PHQ-9), Insomnia Severity Index (ISI), and Beck Anxiety Inventory (BAI). Therefore, corre-lations in the range of 0.3-0.5 are hypothesized. However, i...
Show MoreAdjobimey et al. examined the association between occupational psychosocial factors and hypertension (1). Occupational stress, age ≥25 years, increased body mass index, permanent worker status, and seniority in the textile sector >5 years were significant risk factors of hypertension. To prevent hypertension, job strain and recognition at work in the ginning plants sector should be corrected through occupational health promotion. I have some comments.
It is generally accepted that physical and mental health status are closely related. Li et al. reported that the adjusted hazard ratio (HR) (95% confidence interval [CI]) of workplace discrimination for incident hypertension was 1.54 (1.11-2.13) (2). Clausen et al. reported that the adjusted odds ratio (95% CI) of exposure to discrimination for the onset of depressive disorders was 2.73 (1.38-5.40) (3). Regarding the causal relationship, Jeon et al. reported that the adjusted HRs (95% CIs) of moderate and severe depressive symptoms for incident hypertension were 1.05 (1.01-1.11) and 1.12 (1.03-1.20), respectively (4). By applying time-dependent models, corresponding HRs (95% CI) were 1.12 (1.02-1.24) and 1.29 (1.10-1.50), respectively. They also clarified that high blood pressure was associated with decreased risk for developing depressive symptoms. Taking together, I suspect that depressive status would mediate the effect of workplace stress on subsequent hypertension. Perceived job stress may be a risk factor of d...
Show MoreWith great interest we read the recently published article titled "ChatGPT (GPT-4) versus doctors on complex cases of the Swedish family medicine specialist examination: an observational comparative study" by Arvidsson et al. [1] The study provides valuable insights into the capabilities and limitations of generative AI models like GPT-4 in complex medical decision-making scenarios. However, the study's approach, which relies on GPT-4 as a general-purpose model without any domain-specific fine-tuning or optimised prompting strategies, presents an inherent limitation. Deploying an AI system in such a manner is fundamentally inferior and does not align with best practices in any industry. In real-world applications, AI models are typically customised, fine-tuned, or integrated with structured knowledge bases to enhance their relevance and reliability in specific domains. The zero-shot prompting approach used in this study, while convenient for initial evaluation, does not reflect the practical implementation of AI solutions in healthcare or other high-stakes industries.
In the medical field, AI applications must be trained and validated within a well-defined context, leveraging domain-specific data, tailored prompts, and reinforcement learning with human feedback to improve performance over time. Successful AI implementation in healthcare involves collaboration with medical professionals to refine model outputs, ensuring that the AI system aligns with c...
Show MorePages