Authors response to ‘Spa Therapy Is Not a Pill: Reconsidering Methods in the Evaluation of Complex Interventions’
We thank Forestier et al. for their rapid response to our study ‘Efficacy and safety of balneotherapy in rheumatology: a systematic review and meta-analysis’. We acknowledge that balneotherapy is a complex health intervention and hard to assess.
However, we did not approach balneotherapy “as if it were a pharmaceutical intervention”. We followed international guidance on synthetizing evidence (1) and grading their quality (2). These guidelines address the effects of health care interventions, including but not limited to pharmaceutical intervention. These approaches have successfully assessed balneotherapy in previous systematic reviews (3,4). We conducted and reported subgroup analysis to explore the complexity of the intervention and acknowledged in the discussion that our findings also highlighted the difficulty in assessing such complex intervention.
Regarding the bibliographic search, we followed international guidance, searched in the three main bibliographic databases and other sources including for unpublished trial. The ratio of the number of included studies and the number of retrieval in our review was consistent with the literature (5). We were even able to show the presence of a publication bias, despite the low power of the publication bias tests, highlighting the high number of retrieved and included studies in our review, c...
Authors response to ‘Spa Therapy Is Not a Pill: Reconsidering Methods in the Evaluation of Complex Interventions’
We thank Forestier et al. for their rapid response to our study ‘Efficacy and safety of balneotherapy in rheumatology: a systematic review and meta-analysis’. We acknowledge that balneotherapy is a complex health intervention and hard to assess.
However, we did not approach balneotherapy “as if it were a pharmaceutical intervention”. We followed international guidance on synthetizing evidence (1) and grading their quality (2). These guidelines address the effects of health care interventions, including but not limited to pharmaceutical intervention. These approaches have successfully assessed balneotherapy in previous systematic reviews (3,4). We conducted and reported subgroup analysis to explore the complexity of the intervention and acknowledged in the discussion that our findings also highlighted the difficulty in assessing such complex intervention.
Regarding the bibliographic search, we followed international guidance, searched in the three main bibliographic databases and other sources including for unpublished trial. The ratio of the number of included studies and the number of retrieval in our review was consistent with the literature (5). We were even able to show the presence of a publication bias, despite the low power of the publication bias tests, highlighting the high number of retrieved and included studies in our review, contrary to the previous Cochrane reviews (3,4). The authors of the rapid response claimed to have found more studies but did not provide any reference. Moreover, they did not clarify if they found more randomized trials or more non-randomized trials. To compare the number of included studies, one should also compare the eligibility criteria, as we did in our paper when comparing our findings with previous reviews. The distinction between randomized and non-randomized studies is of importance because of the impact of the randomization on the risk of bias and the level of evidence. We limited our systematic review to randomized trial only, but that’s not a “highly selective inclusion criteria”. We included different rheumatologic indications, allowing us to have a bigger sample size than previous systematic reviews of randomized trials. It is paradoxical to claim that our review was too selective, but then to claim that our methods “amplified heterogeneity”. Indeed, the larger the inclusion criteria of a review, the more heterogeneity might be expected. For the geographical inclusion criteria for example, as explained in our methods, “the larger the geographical area considered, the greater the risk of heterogeneity”, as it has been reported that many geographical factors could impact the effect of balneotherapy (6).
Regarding the choice of timing of the outcome, the Cochrane review by Verhagen et al. (4) reported that the 3-month follow-up was more commonly reported than the 6-month follow up. Therefore, choosing 3-month follow-up as primary outcome was expected to help aggregating more information. Indeed, we found more data at 3-month compared to 6-month follow-up. However, we also reported the analysis not only at 6 but also at 12-months (and at any time point ‘after the intervention’ and ‘at the time’ of the intervention). Therefore, there is no “questionable approach”, as we a priori planned to and eventually reported different time-points. The main limit here was the reporting of these different endpoints in the included trials.
Regarding the pooling of different conditions, we have discussed this matter in the introduction, the methods, and the discussion sections. The previous reviews (3,4) were limited to a small number of trials, precluding publication bias assessment for example. Using a broader category of indication allowed to increase the power of the meta-analysis. Such aggregation is also relevant because of its similarity to what is used in some health insurance system. However, we also reported subgroup analysis that help disentangling the potential difference in the treatment effect according to the underlying indication, mitigating the impact of the increased heterogeneity consecutive to the broad indication category. It is again paradoxical to claim that our review was too selective then too broad.
Regarding the pooling of different kind of balneotherapy, we discussed this matter in the methods and the discussion sections and reported subgroup analysis that mitigate the impact of the heterogeneity of the intervention. Regarding the pooling of different kind of comparator, we discussed this in the methods and the discussion sections and reported sensitivity analysis that help to account for these differences. Interestingly, we observed a smaller effect with a stronger comparator (‘placebo-like’), which might be in favor of the importance of a potential placebo effect, as discussed in the paper.
Regarding the classification of ‘placebo-like’ comparator, we can only answer regarding the study by Franke et al. (7) as it is the only one cited by the author of the rapid response. Franke et al. reported the following interventions: “radon baths […] or tap water baths under the same conditions”, as pointed out by the author of the rapid response (“The only difference between groups was the type of water used”). Therefore, the comparator, a “sham procedure”, is indeed more similar to a ‘placebo-like’ than to a ‘standard of care’ or other type of comparator. It’s unclear what the authors of the rapid response meant by “In such cases, the specific effect of spa therapy—particularly that of radon—is diluted by the presence of multiple active co-interventions. It is therefore questionable whether any conclusions can be drawn about the isolated effect of spa therapy or the water type from such designs, as the associated treatments likely had a substantial impact on outcomes.”. There is always confusion factors along any health intervention, that is one reason why randomized controlled study are needed to get unbiased estimate of a potential treatment effect.
Regarding the method of calculating effect sizes, combining “mean change-from-baseline” with “end value” value are not problematic for unstandardized mean differences thanks to the randomization (8). They are problematic however when using standardized mean difference. That is why we reported these two pooled estimates separately. Finally, it has also been pointed out that combining them might be valid (9), thus we also reported the combined estimates. We explained that in our methods section.
Regarding the quality-of-life (QoL) measures, we agree that the use of various QoL scale across balneotherapy studies increases the risk of heterogeneity when pooling the findings. However, prioritizing the less disease-specific measure improves the comparability across studies. Moreover, the most common QoL measures appeared to be valid in various rheumatic diseases (10–12).
Regarding the risk of bias, the RoB2 tool has been designed to help assessing the risk of bias for any randomized trials and is not limited to drug trials. The RoB2 tool has been successfully applied to various kind of interventions, such as community-based complex interventions (13), implementation of evidence-based guidelines in clinical practice (14), exercised-based ‘prehabilitation’ before surgery (15), among other examples. The RoB2 tool does apply even in the absence of blinding. We acknowledge the difficulty of blinding in balneotherapy. However, various blinding strategies have been successfully implemented in some of the trials included in our review, such as heated tap water, addition of artificial CO2 to the tap water, coloring tap water, exposing participants to the same odor and adjusting the pH of the water (7,16–18). The authors of the rapid responses pointed out that “patients are frequently the assessors of primary outcomes […]”; indeed that is a reason why blinding of participant is of particular importance. The importance of using “sham procedures” as comparator has been well documented in other field, particularly in pain-related conditions (19). The authors of the rapid response underlined the importance of “contextual factors, such as the therapists’ level of experience and the presence of co-interventions”. Indeed, as for any health intervention, such confusion factors stress out the need for randomized trial at low risk of bias.
Regarding the Zelen’s randomization, several ethical issues with this design have been pointed out (20). Moreover, the knowledge of the nature of the intervention in the experimental arm exposes the Zelen’s design to outcome measurement bias. Thus, Zelen’s design would not be appropriate for subjective patient reported outcomes (21). Other methodological issues have also been raised with Zelen’s randomization, such as dilution bias (22). Regarding “Alternative strategies for minimizing bias, such as […] the use of qualitative outcomes [5,6], are not accounted for by RoB 2”, it’s not very clear what is meant. The reference [5] (23) assessed the clinical effect of placebo, including pharmacologic, physical and psychological placebo, using quantitative outcomes. The reference [6] (24) showed that non-blinded assessors of subjective quantitative outcomes resulted in overestimate of the treatment effect, highlighting the need for blinding participant.
Regarding the CLEAR NPT tool, it is now outdated (2005) especially in the context of the availability of the more recent and internationally recognized tool of the Cochrane collaboration, first published in 2011(25) then updated in 2019 (26), addressing bias in randomized trials of healthcare interventions, not limited to drugs trials. The reference provided for the PEDro scale concluded that the Cochrane RoB tool “can be used to quantify risk of bias”.
Conclusion
We followed international guidelines for assessment of any health care interventions, not limited to pharmaceutical products. Our methodological decisions allowed aggregating a large amount of information, reflecting the diversity of balneotherapy in rheumatology in Europe, and exploring potential sources of heterogeneity, thus increasing the reliability and the applicability of the findings compared to the previous reviews. Improving the quality of the evidence in such complex intervention is indeed hard, but has been feasible in other non-pharmacological interventions (27,28).
References
1. Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions. Repr. with corr. Chichester: Wiley-Blackwell; 2009. 649 p. (Cochrane book series).
2. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008 Apr 26;336(7650):924–6.
3. Verhagen AP, Bierma-Zeinstra SMA, Boers M, Cardoso JR, Lambeck J, de Bie RA, et al. Balneotherapy for osteoarthritis. Cochrane Database Syst Rev. 2007 Oct 17;(4):CD006864.
4. Verhagen AP, Bierma-Zeinstra SMA, Boers M, Cardoso JR, Lambeck J, de Bie R, et al. Balneotherapy (or spa therapy) for rheumatoid arthritis. Cochrane Database Syst Rev. 2015 Apr 11;2015(4):CD000518.
5. Sampson M, Tetzlaff J, Urquhart C. Precision of healthcare systematic review searches in a cross-sectional sample. Res Synth Methods. 2011 Jun;2(2):119–25.
6. Gutenbrunner C, Bender T, Cantista P, Karagülle Z. A proposal for a worldwide definition of health resort medicine, balneology, medical hydrology and climatology. Int J Biometeorol. 2010 Sep;54(5):495–507.
7. Annegret F, Thomas F. Long-term benefits of radon spa therapy in rheumatic diseases: results of the randomised, multi-centre IMuRa trial. Rheumatol Int. 2013;33(11):2839–50.
8. Chapter 10: Analysing data and undertaking meta-analyses [Internet]. [cited 2023 Apr 14]. Available from: https://training-cochrane-org.ezproxy.u-pec.fr/handbook/current/chapter-10
9. da Costa BR, Nüesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, et al. Combining follow-up and change data is valid in meta-analyses of continuous outcomes: a meta-epidemiological study. J Clin Epidemiol. 2013 Aug;66(8):847–55.
10. Veehof MM, ten Klooster PM, Taal E, van Riel PLCM, van de Laar MAFJ. Comparison of internal and external responsiveness of the generic Medical Outcome Study Short Form-36 (SF-36) with disease-specific measures in rheumatoid arthritis. J Rheumatol. 2008 Apr;35(4):610–7.
11. Hurst NP, Kind P, Ruta D, Hunter M, Stubbings A. Measuring health-related quality of life in rheumatoid arthritis: validity, responsiveness and reliability of EuroQol (EQ-5D). Br J Rheumatol. 1997 May;36(5):551–9.
12. The Health Assessment Questionnaire (HAQ) Disability Index (DI) of the Clinical Health Assessment Questionnaire (Version 96.4).
13. Crocker TF, Lam N, Jordão M, Brundle C, Prescott M, Forster A, et al. Risk-of-bias assessment using Cochrane’s revised tool for randomized trials (RoB 2) was useful but challenging and resource-intensive: observations from a systematic review. J Clin Epidemiol. 2023 Sep 1;161:39–45.
14. Belavy DL, Tagliaferri SD, Buntine P, Saueressig T, Ehrenbrusthoff K, Chen X, et al. Interventions for promoting evidence-based guideline-consistent surgery in low back pain: a systematic review and meta-analysis of randomised controlled trials. Eur Spine J Off Publ Eur Spine Soc Eur Spinal Deform Soc Eur Sect Cerv Spine Res Soc. 2022 Nov;31(11):2851–65.
15. Garoufalia Z, Emile SH, Meknarit S, Gefen R, Horesh N, Zhou P, et al. A systematic review and meta-analysis of high-quality randomized controlled trials on the role of prehabilitation programs in colorectal surgery. Surgery. 2024 Nov 1;176(5):1352–9.
16. Bálint GP, Buchanan WW, Ádám A, Ratkó I, Poór L, Bálint PV, et al. The effect of the thermal mineral water of Nagybaracska on patients with knee joint osteoarthritis—a double blind study. Clin Rheumatol. 2007 Jun 1;26(6):890–4.
17. Fioravanti A, Manica P, Bortolotti R, Cevenini G, Tenti S, Paolazzi G. Is balneotherapy effective for fibromyalgia? Results from a 6-month double-blind randomized clinical trial. Clin Rheumatol. 2018;37(8):2203–12.
18. Hanzel A, Horvát K, Molics B, Berényi K, Németh B, Szendi K, et al. Clinical improvement of patients with osteoarthritis using thermal mineral water at Szigetvár Spa-results of a randomised double-blind controlled study. Int J Biometeorol. 2018;62(2):253–9.
19. Jonas WB, Crawford C, Colloca L, Kaptchuk TJ, Moseley B, Miller FG, et al. To what extent are surgery and invasive procedures effective beyond a placebo response? A systematic review with meta-analysis of randomised, sham controlled trials. BMJ Open. 2015 Dec 11;5(12):e009655.
20. Hawkins JS. The ethics of Zelen consent. J Thromb Haemost JTH. 2004 Jun;2(6):882–3.
21. Simon GE, Shortreed SM, DeBar LL. Zelen design clinical trials: why, when, and how. Trials. 2021 Aug 17;22(1):541.
22. Adamson J, Cockayne S, Puffer S, Torgerson DJ. Review of randomised trials using the post-randomised consent (Zelen’s) design. Contemp Clin Trials. 2006 Aug 1;27(4):305–19.
23. Hróbjartsson A, Gøtzsche PC. Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N Engl J Med. 2001 May 24;344(21):1594–602.
24. Hróbjartsson A, Thomsen ASS, Emanuelsson F, Tendal B, Hilden J, Boutron I, et al. Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. BMJ. 2012 Feb 27;344:e1119.
25. Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011 Oct 18;343:d5928.
26. Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019 Aug 28;366:l4898.
27. Pronk AJM, Roelofs A, Flum DR, Bonjer HJ, Abu Hilal M, Dijkgraaf MGW, et al. Two decades of surgical randomized controlled trials: worldwide trends in volume and methodological quality. Br J Surg. 2023 Jun 28;110(10):1300–8.
28. Temple J, Salmon P, Tudur-Smith C, Huntley CD, Fisher PL. A systematic review of the quality of randomized controlled trials of psychological treatments for emotional distress in breast cancer. J Psychosom Res. 2018 May 1;108:22–31.
I am not convinced by the use of the so-called Bayesian Confidence Propagation Neural Network (BCPNN) in this context.
In pharmacovigilance—particularly when evaluating safety signals—the use of a prior hypothesis regarding the safety of a vaccine or drug should be approached with caution. In this case, we lack reliable prior knowledge of the product’s safety profile. Assuming otherwise may be misleading and, potentially, dangerous.
Established disproportionality measures such as the Proportional Reporting Ratio (PRR) or Reporting Odds Ratio (ROR), when accompanied by confidence intervals (CIs), already provide valuable insight. If the CIs are wide, this simply reflects uncertainty—and that, in itself, is informative enough.
The main result of the paper appears to be the PRR of approximately 23 (with a lower bound exceeding 9) for preterm birth following the RSV vaccine. This is striking, yet it is not highlighted in the conclusions; one has to look in the appendix (link "supplemental material"), specifically Table S6 p.10, to find it.
Why is there such a significant discrepancy between the PRR (~23) and the Information Component (IC, ~2+)? Even at the lower bound, the PRR remains notably elevated. This likely stems from an inappropriate prior used in the Bayesian model. In fact, the paper serves as a good illustration of why Bayesian methods, particularly in the form of BCPNN, may not be suitable for p...
I am not convinced by the use of the so-called Bayesian Confidence Propagation Neural Network (BCPNN) in this context.
In pharmacovigilance—particularly when evaluating safety signals—the use of a prior hypothesis regarding the safety of a vaccine or drug should be approached with caution. In this case, we lack reliable prior knowledge of the product’s safety profile. Assuming otherwise may be misleading and, potentially, dangerous.
Established disproportionality measures such as the Proportional Reporting Ratio (PRR) or Reporting Odds Ratio (ROR), when accompanied by confidence intervals (CIs), already provide valuable insight. If the CIs are wide, this simply reflects uncertainty—and that, in itself, is informative enough.
The main result of the paper appears to be the PRR of approximately 23 (with a lower bound exceeding 9) for preterm birth following the RSV vaccine. This is striking, yet it is not highlighted in the conclusions; one has to look in the appendix (link "supplemental material"), specifically Table S6 p.10, to find it.
Why is there such a significant discrepancy between the PRR (~23) and the Information Component (IC, ~2+)? Even at the lower bound, the PRR remains notably elevated. This likely stems from an inappropriate prior used in the Bayesian model. In fact, the paper serves as a good illustration of why Bayesian methods, particularly in the form of BCPNN, may not be suitable for pharmacovigilance signal detection.
I read with great interest the recent article by Rous et al. (BMJ Open 2025;15:e086648), which presents an important modeling analysis of screening intervals for multi-cancer early detection (MCED) tests based on cell-free DNA (cfDNA) methylation profiling. The work underscores the growing utility of cfDNA-based diagnostics in detecting cancer-specific epigenetic signatures with high specificity.
However, I would like to respectfully offer an additional perspective that may have been overlooked, namely, the active immunologic role of methylated DNA in modulating tumor immunity. Based on our group's published work, we have shown that methylated DNA, particularly methylated CpG motifs, can directly stimulate the differentiation of Foxp3+ regulatory T cells (Tregs) (1-4). This immunologic pathway contributes to immune tolerance and may facilitate tumor immune evasion.
While current MCED models consider methylation purely as a passive biomarker of malignancy, it is important to recognize that the same methylated cfDNA fragments detected in plasma may also exert biologic effects on the host immune system. In particular, their capacity to expand Treg populations could help explain why some tumors remain clinically silent or escape immune surveillance, even when detectable at early stages by cfDNA analysis.
This dual role—diagnostic and immunoregulatory—has implications for both the interpretation of MCED test resul...
I read with great interest the recent article by Rous et al. (BMJ Open 2025;15:e086648), which presents an important modeling analysis of screening intervals for multi-cancer early detection (MCED) tests based on cell-free DNA (cfDNA) methylation profiling. The work underscores the growing utility of cfDNA-based diagnostics in detecting cancer-specific epigenetic signatures with high specificity.
However, I would like to respectfully offer an additional perspective that may have been overlooked, namely, the active immunologic role of methylated DNA in modulating tumor immunity. Based on our group's published work, we have shown that methylated DNA, particularly methylated CpG motifs, can directly stimulate the differentiation of Foxp3+ regulatory T cells (Tregs) (1-4). This immunologic pathway contributes to immune tolerance and may facilitate tumor immune evasion.
While current MCED models consider methylation purely as a passive biomarker of malignancy, it is important to recognize that the same methylated cfDNA fragments detected in plasma may also exert biologic effects on the host immune system. In particular, their capacity to expand Treg populations could help explain why some tumors remain clinically silent or escape immune surveillance, even when detectable at early stages by cfDNA analysis.
This dual role—diagnostic and immunoregulatory—has implications for both the interpretation of MCED test results and the development of adjunct immunotherapeutic strategies. It also raises the intriguing possibility that certain methylation signatures may predict not only tumor presence, but also immune responsiveness or resistance.
I suggest that future studies on MCED implementation and modeling consider incorporating the immune-modulatory consequences of methylated DNA, particularly in the context of regulatory T cell biology and tumor tolerance.
References:
1. Lawless OJ, Bellanti JA, Brown ML, et al. In vitro induction of T regulatory cells by a methylated CpG DNA sequence in humans: Potential therapeutic applications in allergic and autoimmune diseases. Allergy Asthma Proc. 2018 Mar 1;39(2):143-152.
2. Li D, Cheng J, Zhu Z, Catalfamo M, Goerlitz D, Lawless OJ, Tallon L, Sadzewicz L, Calderone R, Bellanti JA. Treg-inducing capacity of genomic DNA of Bifidobacterium longum subsp. infantis. Allergy Asthma Proc. 2020 Sep 1;41(5):372-385
3. Li D, Sorkhabi S, Cruz I, Foley PL, Bellanti JA. Studies of methylated CpG ODN from Bifidobacterium longum subsp. infantis in a murine model: Implications for treatment of human allergic disease. Allergy Asthma Proc. 2025 Jan 1;46(1):e13-e23.
4. Li D, Cruz I, Sorkhabi S, Foley PL, Wagner J, Bellanti JA. Dose-response studies of methylated and nonmethylated CpG ODNs from Bifidobacterium longum subsp. infantis for optimizing Treg cell stimulation. Allergy Asthma Proc. 2025 Mar 1;46(2):98-104.
Sincerely,
Joseph A. Bellanti, MD
Professor of Pediatrics and Microbiology-Immunology (Emeritus)
Director, International Center for Interdisciplinary Studies of Immunology (ICISI)
Georgetown University Medical Center
3900 Reservoir Road, NW, Room 308 NW
Washington, DC 20057
Tel: 301-938-2940
Fax: 202-318-0444 bellantj@georgetown.edu
Reader’s comment nr 1: Sample Size Calculation
The study reports: “For the estimation of HPV prevalence, a sample size of 267 participants was needed with an anticipated prevalence of HPV of 50%, 6% precision, and a 5% level of significance.”
• While 6% precision is technically possible, this departs from the standard 5% cutoff.
• Since the study employed snowball sampling —a convenience sampling method from a single center—to estimate nationwide HPV prevalence, it failed to consider the design effect and non-response rates inherent in such non-random sampling methods. After accounting for these factors, the required sample size should be approximately 446 participants for 6% precision and 642 participants for 5% precision. Not 267.
Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling Studies. American Journal of Epidemiology, 2006; 163(5): 471–478.
Snowball Sampling: A Review of Key Issues and Methodological Considerations. International Journal of Social Research Methodology, 2013; 16(4): 351-367. (doi:10.1080/13645579.2013.801561)
RESPONSE to comment nr 1:
Thank you so much for raising this concern.
Design effect (DE) is used for comple...
Reader’s comment nr 1: Sample Size Calculation
The study reports: “For the estimation of HPV prevalence, a sample size of 267 participants was needed with an anticipated prevalence of HPV of 50%, 6% precision, and a 5% level of significance.”
• While 6% precision is technically possible, this departs from the standard 5% cutoff.
• Since the study employed snowball sampling —a convenience sampling method from a single center—to estimate nationwide HPV prevalence, it failed to consider the design effect and non-response rates inherent in such non-random sampling methods. After accounting for these factors, the required sample size should be approximately 446 participants for 6% precision and 642 participants for 5% precision. Not 267.
Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling Studies. American Journal of Epidemiology, 2006; 163(5): 471–478.
Snowball Sampling: A Review of Key Issues and Methodological Considerations. International Journal of Social Research Methodology, 2013; 16(4): 351-367. (doi:10.1080/13645579.2013.801561)
RESPONSE to comment nr 1:
Thank you so much for raising this concern.
Design effect (DE) is used for complex sampling techniques like stratified random sampling or cluster random sampling/multi-stage cluster sampling where the DE considerations are required because of additional variance due to stratification and or clustering. We went for non-random sampling, the population was homogenous, and the variation is therefore only based on the subject of interest. Thus the design effect was kept as 1 and hence got the given sample size (n=267).
Reader’s comment nr 2: Methodological Errors and Biased Estimates
The study states: "We employed a Cox proportional hazard model algorithm to estimate crude and adjusted PRs and their 95% CIs in univariate and multivariate models."
Could the authors share the actual SPSS working for fitting the Cox Proportional Hazard Model to their cross-sectional data? Would also greatly appreciate it if you could justify your analytical approach?
• Fitting the Cox proportional model on cross-sectional data is fundamentally flawed, as it treats a single time point as if it represents survival time. This misapplication violates the Cox model's assumptions, which are designed for time-to-event data with variability, not cross-sectional data. Without censoring, this model is over-parameterized and results in systematic bias. It tends to inflate study estimates, which is a common issue in longitudinal studies with high censoring rates.
• Thus, the ratios and p-values produced by SPSS stem from a misapplication of the Cox Proportional model. The software might issue warnings about insufficient variation in the time variable but will still attempt to fit the model.
• For binary outcomes (e.g., event occurrence: yes/no) in cross-sectional datasets, logistic regression should be used instead. This approach models the probability of an event as a function of covariates, which is appropriate for the study design.
RESPONSE to comment nr 2:
• Since this is a cross-sectional study with a binary outcome, we applied a Cox proportional hazard model algorithm using robust standard error (SE), which we believe is an appropriate model building technique and estimated the association between the exposure and the outcome using means of prevalence ratios (PRs).
• When adjustments for potential confounders are needed, logistic regression models are commonly used to estimate odds ratios (ORs) that are reported in a similar way as the PR. However, OR are less suitable when the outcome is very common like in our study. In these situations, interpreting ORs as if they were PRs may be inadequate.
• (ref: Barros AJ, Hirakata VN. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol. 2003;3:21. DOI: 10.1186/1471-2288-3-21)
• For example, if 80 out of 100 exposed subjects have a particular disease and 50 out of 100 non-exposed subjects have the disease, then the odds ratio (OR) is (80/20)/(50/50)=4. However, the prevalence ratio (PR) is (80/100)/(50/ 100)=1.6. The latter indicates that the exposed subjects are only 1.6 times as likely to have the disease as the non-exposed subjects, and this is the number in which most people would be interested.
• There is considerable literature on the advantages and disadvantages of OR versus PR that we believe supports our choice of using the PR. (Greenland, Stromberg, Axelson et al and others) , , , , .
• 1. Greenland S. Interpretation and choice of effect measures in epidemiologic studies. Am J Epidemiol 1987;125:761–8.
• 2. Stromberg U. Prevalence odds ratio v prevalence ratio. Occup Environ Med 1994;51:143–4.
• 3. Axelson O, Fredricksson M, Ekberg K. Use of the prevalence ratio v the prevalence odds ratio as a measure of risk in cross sectional studies. Occup Environ Med 1994;51:574.
• Thus, since the outcome of interest (HPV infection) was so common >10%, we preferred to use PRs and hope that the reader understands this and agrees.
• Moreover, Barros et al have reported that Cox or Poisson regression with robust variance and log-binomial regression provide correct estimates and are a better alternative for the analysis of cross-sectional studies with binary outcomes than logistic regression, since the prevalence ratio is more interpretable and easier to communicate to non-specialists than the odds ratio (Barros AJ, Hirakata VN. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol. 2003;3:21. DOI: 10.1186/1471-2288-3-21)
In fact, the study that the reader has referenced under bullet point 10, Wiley DJ et al (2013) published in PLOS ONE, also estimated adjusted prevalence ratios from multivariate Poisson regression analysis.
Reader comment nr 3. Unexplained Inclusion criteria
The study states:
“A total of 265 study participants with different sexual orientations, that is, MSM (homosexuals, bisexuals) and transgender women living with and without HIV were recruited. Participants were eligible for the study if they self-reported having had anal sex in the preceding at least 6 months and were age 18 and above.”
• The use of a 6-month cutoff for recent anal sex introduces significant methodological flaws, particularly when assessing HPV prevalence nationally.
• Anal HPV infections, particularly high-risk types like HPV-16, are well-documented to persist far beyond six months, even in the absence of recent sexual exposure. Limiting inclusion criteria to MSM who have engaged in anal sex within the past six months is unjustifiably restrictive and excludes individuals with persistent infections. This narrow approach biases the sample toward more sexually active individuals (by design predominantly male sex workers), inflating prevalence estimates.
Hernandez et al. (2014) and Lin et al. (2020).
• No explanation was provided for the age cutoff at 18 years. Excluding participants under 18 overlooks the reality that many MSMs initiate sexual activity at younger ages, particularly those in marginalized groups who may face early sexual abuse or exploitation, as acknowledged later in the study. This exclusion introduces further bias by failing to account for the experiences of high-risk youth.
RESPONSE to comment nr 3:
This maybe an interesting idea for another study, but we set out to study young adults (defined as 18 years and above by the WHO’s criteria) and not adolescents or minors who would require parental or garudian consent, something that ew feel would be unethical and could have jeopardized the safety of participating adolescents given that this study was conducted in a very conservative country. We focused on recruiting participants aged 18 and above in our research project also because adults are generally considered to be legally and cognitively more capable of providing informed consent, making them suitable for participation in a study that involved sensitive questions and it aws important to us that decision and consent to participate was truly informed and that they were ready to self-report sensitive sexual behaviors
Thus, this study was conducted among young adults who self-reported to have had anal sex in the last preceding at least 6 months that indeed went beyond also. But for careful readers, it is clearly stated in the paper that the mean duration of active sexual behavior of our study population was 12 years with a standard deviation of 8.2.
Reader’s Comment nr 4. Vague study procedure section
There is only one, just one sentence in the whole section about the study's own procedure, and that too is very generic!!
The following is the complete section dedicated to the "Study Procedure":
“A 35–40 min long structured questionnaire was administered at a private space to interview the study participants in order to collect data concerning sociodemographic information, sexual and reproductive history and medical history relevant to HIV, any anal disease and use of ART. A blood sample was obtained for confirmation of their HIV status (through Architect HIV Ag-Ab Combo kit by ABBOTT), viral load (real time PCR Artus Qiagen) and CD4 +T cell count (through FACS Count TM flow cytometry, Becton Dickinson, Franklin Lakes, New Jersey, USA). Two anal swabs for HPV testing were collected by a trained general physician. A water-moistened swab was inserted 5–6 cm proximal to the anal verge and the samples were obtained from anorectal transition zone around the dentate line in the anal canal. The swab was then agitated vigorously in a tube containing 3 mL of methanol- based fixative—a Sample Transport Medium (UTM-RT viral transport media with flocked polyester swabs, Copan Diagnostics, Corona, California, USA) and transported to the lab. In Molecular lab the solution was stored at −70°C before further processing and was later used for the extraction and detection of HPV DNA by the PCR. The DNA from rectal swab was extracted by using Qiagen DNA mini kit and DNA was eluted in 80 uL of elution buffer and stored at −80°C for further use. HPV DNA was detected by using MY09/MY11 primers from L1 region resulted in amplification of 450 bp of L1 gene. PCR for Housekeeping gene glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was done for every extracted DNA and samples tested negative for GAPDH gene were excluded from study. Nested multiplex PCR assay was used26 for detection of HPV subtypes included LR types 6/11 and HR (group 1) types 16, 18, 31, 33, 35, 52, 56, 58 and 59 as described by Sotlar et al. 27 Primers for first-round PCR for GP-E6/E7 consensus sequence and second round PCR for HPV subtypes 16, 18, 6/11 31, 33, 35, 52, 56, 58 and 59 were synthesised from Euro- fins MWG/Operon Germany. The HPV subtype specific primers were used in four cocktails and size of the nested PCR product was used for the identification of each HPV type by gel electrophoresis.”
• The study fails to provide sufficient detail on aspects such as covariate selection (including what covariates were selected, why, their potential implications, and their confounding roles), the framework employed, and the reliability and validity of the tools used to measure exposure, outcomes, and covariates.
• Instead, strangely enough, these elements are replaced by irrelevant details on routine well-established laboratory procedures. The procedural descriptions of HIV and HPV testing could have been succinctly referenced, preserving valuable space that should have been dedicated to clarifying the study's own design and observational data procedures.
RESPONSE to comment nr 4:
We do not think this critique is relevant or warranted since we have adequately and thoroughly described the study procedures including a description of study design, study population, setting, and included variables. Since this study primarily was a prevalence study, it was necessary to elaborate carefully on the sample collection, processing, and testing protocols, as this significantly affect HPV test results, HPV prevalence data reliability and generalizability. Moreover, our intention was that the comprehensive information provided would guide future policy decisions regarding HPV screening programs.
Regarding which variables that were included in the analysis, this is also described in the result section that all reported associations were carefully analyzed for any confounding by all relevant variables such as, age, sexual and behavioral factors.
Reader’s comment nr 5. Discrepancies in Reporting
5.1. The title of Table 1 refers to the "characteristics of the study population in (city of) Karachi, Pakistan." However, the study title and aims mention the country rather than the city, suggesting that it is a nationwide study when it is not. In fact, the study doesn't even adequately represent Karachi due to its weak sampling methods, small sample size, lack of representativeness, and the use of a single center. More so, study participants were not recruited through a multistage probability random sampling approach, further undermining the study’s claims of broad relevance.
I don’t understand why the study title says "Pakistan" instead of "Karachi, Pakistan," which would better reflect the study’s limited scope. This could help avoid giving the misleading impression that the study covers the whole country when it actually only involves one center in the city of Karachi.
5.2. Table 1 shows descriptive data on the number of cigarettes smoked and the number of anal encounters. However, these factors were not included in the adjusted analyses.
Could the authors clarify why these variables were excluded? Additionally, what was the rationale behind selecting the covariates included in the multivariate models?
RESPONSE to comment nr 5.1:
We clearly describe the study setting under the methods section and have already acknowledged and justified this in our discussion section. Of course, it is a limitation that this study was not conducted nationwide, it would have been great to have the resources to conduct a much bigger study. However, as most researchers are well aware funding amnd resources constraints always influence how much research and data collection can be done. This does not make the results less valid and precise.
RESPONSE to comment nr 5.2:
The above-mentioned variables were analyzed and did not turn out to be significant in the final model. It is common practice in all statistical analyses to keep only the statistically significant variables in the final model
Reader’s comment nr 6. Discussion section
Table-3 shows statistically significant Receptive Anal Sex and HPV Risk. While there is a noticeable association between receptive anal sex and HPV16 infection, this association is weakened after adjusting for HIV status, suggesting that HIV and its associated immunosuppression might be the stronger contributing factor to HPV infection risk in this group. Yet, the effect-moderating role of ART therapy, which offers a more nuanced understanding of HPV transmission, where immunosuppression related to HIV plays a crucial role in HPV acquisition and progression, rather than sexual activity alone, was not discussed.
We know that the relationship between HPV prevalence and MSM is not as straightforward as discussed by this study. Actually, the study's findings also suggest that HIV-positive status might be a more significant risk factor for HPV16 and overall HPV infection.
Therefore, the study hypothesis that MSM population is primarily at risk of HPV-related anal lesions/cancer due to sexual behavior (e.g., anal sex role) alone does not fully capture the complexity of the issue.
RESPONSE to comment nr 6:
We never hypothesized that HPV prevalence in MSM is related to sexual practices alone. Although, previous research indicate that receptive anal intercourse is a risk factor for anal lesions, anal HPV infection is not limited to men who have receptive anal sex, it can also be acquired during non-receptive sexual activity.
Moreover, it is true that immunosuppression due to HIV plays a role in the acquisition and progression of HPV infection but exploring the effect of HIV related immunosuppression as well as the role of ART was beyond the scope of our study.
Reader’s comment nr7. Misleading statements (with no or improper citation) can be found throughout the text, e.g., the study in their intro section states:
“Currently, HPV vaccines, in which the bivalent vaccine protects against HPV16/18 and quadrivalent vaccine protects against HPV16/18 and the LR types HPV6/11,(25) are available in pharmacies in Pakistan but are not yet made available to the target population.”
Here, the authors potentially imply that the government is not facilitating its widespread use or perhaps deliberately limiting access for high-risk groups like MSWs or MSMs. [Moderator: your (over)interpretation would appear to turn on the single word "made".] However, this claim is made without sufficient clarification or citations to substantiate the assertion.
If the vaccine is indeed available in pharmacies, what mechanisms—whether policy-driven, financial, or social—are contributing to its inaccessibility for the target population? Is the limitation due to high costs, societal stigma, or a lack of targeted public health programs? Or is there evidence of intentional policy decisions to restrict access for these marginalized groups?
The lack of citation for this critical point weakens the argument and leaves significant gaps in understanding the barriers to HPV vaccination in this marginalized group. Here, the authors missed an opportunity to explore solutions, such as advocating for government-subsidized vaccination programs, awareness campaigns, or addressing systemic barriers.
(Reference cited here (25) is irrelevant: [25. Schiller JT, Castellsagué X, Garland SM. A review of clinical trials of human papillomavirus prophylactic vaccines. Vaccine 2012;30:F123–38])
REPONSE to comment nr 7:
The reader makes sweeping comments about our underlying messages in the text. He or she mentions only one specific point though, and that is an over interpretation of a single word “made” in one sentence.
It is a fact that HPV vaccine is not available to target the key population under study and beyond that we did not imply anything. Our study is not about reasons for non-availability and its solutions. This is solely the reader’s own interpretation.
Reader’s comment nr 8. Study Limitations Section
The authors provided a standard, generic list of limitations instead of a more detailed analysis that would demonstrate their expertise in both epidemiologic research methodology and the subject matter.
Most of the statements are misleading and factually incorrect. Below are a few points of concern:
8.1. The study states: “Karachi is the largest metropolitan city of Pakistan, representing all ethnicities of the country. Moreover, in a country like Pakistan, which is by and large a heteronormative society and where homophobia prevails against this sexual minority due to cultural and religious inhibitions, using above-mentioned sampling technique was the most feasible option for the recruit- ment process.”
This statement is highly problematic and fails to address critical flaws in its sampling approach and provides a misleading portrayal of Karachi’s representativeness. While Karachi is diverse, it is also the most secular city in Pakistan, and societal views around MSM and gender minorities in this metropolis may differ significantly from those in more conservative, less secular regions, challenging its suitability as a **proxy **for national representation.
The claim that snowball sampling was the most feasible technique is unfounded. Karachi, with over 27 million residents and numerous opportunities to engage directly with these populations, offers far better avenues for more robust sampling. The reliance on one center and an insufficient sample size severely undermines the study’s validity and its broader policy implications, as the findings lack depth and fail to reflect the heterogeneity of experiences across the country.
RESPONSE to comment nr 8.1:
The reader has again tried to argue that our statements are misleading and incorrect, but the first author is from Pakistan and has worked in Karachi for over 20 years with the target group of interest and is very familiar with this setting.
It is a well-known fact that Karachi being the largest metropolitan city of Pakistan represents all ethnicities in the country. It may be more secular than other cities, but this is not a study on societal views or religion. We fail to understand as why the reader thinks that the population of Karachi is not representative of the MSM and transgender people in this type of study
Despite its large population, there are very limited opportunities to engage directly with this marginalized community in the city. Given their marginalized and hidden nature, it would be difficult to identify willing participants in smaller towns, without jeopardizing their safety and confidentiality. Our study is also meant to be explorative and not intended to have broad policy implications. Now that we have reported a hidden problem, we hope that further large scale nationwide studies will be conducted to guide future policies.
Reader’s comment nr 8.2. The study states: “Moreover, the cross-sectional design precludes us to determine a temporal relationship, however, the risk factors identified in our study appear robust and have also been found in studies from other countries “
The authors again provide a generic critique of cross-sectional study designs. A well-executed cross-sectional study, with clear objectives, can generate valuable, meaningful insights comparable to longitudinal studies, particularly in terms of exploratory and prevalence data.
Response to comment nr 8.2: The reader has commented on our own comment about study limitation. We did not intend to undermine cross sectional study designs, but rather to point out the inability to draw temporal association from that sort of design. We entirely agree with the reader about the usefulness of cross-sectional studies.
Reader’s comment nr 9. Misreporting findings from published literature, e.g. the study states:
“The reported incidence of anal cancer is 1–2/100 000 in the general population,12 while in MSM it has been reported up to 35 cases per 100 00013 and even up to 131/100 000 in MSM living with HIV (14,15) regardless if they are on antiretroviral treatment or not.”
((ref 14: Silverberg MJ, et al. Risk of anal cancer in HIV-infected and HIV-uninfected individuals in North America. Clin Infect Dis. 2012 Apr;54(7):1026-34. doi: 10.1093/cid/cir1012. Epub 2012 Jan 30. PMID: 22291097; PMCID: PMC3297645.)
• The statement that the incidence rate of 131 per 100,000 among MSM with HIV is independent of ART use is not factual and reflects a lack of understanding of the data referenced. This figure is clearly the crude incidence rate reported in the reference 14 for the years 2004–2007, not an ART-independent value.
• In contrast, **the adjusted relative risk (RR) for MSM with HIV during the same period was 78.8 (40.8–152.1), ** accounting for confounding factors such as ART and immune status.
• It underscores the problematic practice of cherry-picking and misinterpreting numbers from other studies to fit preconceived narratives, which undermines the credibility and validity of scientific research.
RESPONSE to comment nr 9:
The incidence of anal cancer in MSM living with HIV, is from a clearly referenced article. It was not our intention to be selective, nor did we have any preconceived narratives. The reader is free to have a different perceived interpretation of a reference stated in our article, but this does not undermines the credibility of our scientific research.
Reader’s comment nr 10. Study Replication
Below are 4 high quality prospective cohort studies for the purpose of highlighting the points discussed in this section.
1. Hernandez, A. L., Efird, J. T., Holly, E. A., Berry, J. M., Jay, N., & Palefsky, J. M. (2014). Incidence of and risk factors for type-specific anal human papillomavirus infection among HIV-positive MSM. AIDS (London, England), 28(9), 1341–1349. https://doi-org.ezproxy.u-pec.fr/10.1097/QAD.0000000000000254 Similar, with correct methodology:
2. Wiley DJ, Li X, Hsu H, Seaberg EC, Cranston RD, et al. (2013) Factors Affecting the Prevalence of Strongly and Weakly Carcinogenic and Lower-Risk Human Papillomaviruses in Anal Specimens in a Cohort of Men Who Have Sex with Men (MSM). PLOS ONE 8(11): e79492. https://doi-org.ezproxy.u-pec.fr/10.1371/journal.pone.0079492
3. Anal human papillomavirus infection and associated neoplastic lesions in men who have sex with men: a systematic review and meta-analysis Machalek, Dorothy A et al. The Lancet Oncology, Volume 13, Issue 5, 487 – 500 2012
4. Silverberg MJ, et al. Risk of anal cancer in HIV-infected and HIV-uninfected individuals in North America. Clin Infect Dis. 2012 Apr;54(7):1026-34. doi: 10.1093/cid/cir1012. Epub 2012 Jan 30. PMID: 22291097; PMCID: PMC3297645.
10.1. Significant differences exist in the methodological rigor between the current study and Studies 1-4 above. The current study is essentially a replica of the well-established studies but differs in its flawed methodology, smaller sample size, restrictive and non-representative single-center sampling approach, and cross-sectional design. For instance, Study 4 involved 13 cohorts with a total of 34,189 HIV-infected and 114,260 HIV-uninfected individuals, using robust longitudinal methods and Poisson regression, yielding valid and generalizable results. Similarly, Study 3, which pooled data from 53 studies, found that while anal HPV and anal cancer precursors are common in MSM, progression to cancer appears significantly lower than for cervical pre-cancerous lesions.
As a result, the current study fails to critically appraise high-quality literature, missing the opportunity to address existing knowledge and methodological gaps that it could potentially fill.
Could the authors provide adequate and concrete evidence of any unique contribution their study brings to the existing literature, particularly regarding unique study variables or covariates that were not previously assessed?
RESPONSE to comment nr 10:
Again, our study was the first of its kind to be conducted in an extremely conservative society where medical problems by this marginalized population are rarely considered or mentioned let alone research conducted on such a sensitive issue. We thought that with limited resources, and uncertainty of follow-up visits, a small-scale cross-sectional study could provide valuable initial information and be of interest to readers (which it obviously really is!).
We, however, never intended to imitate or excel above-mentioned cohort studies. The assertion by the reader that we should have filled the knowledge gap on a larger scale and explored new variables is unjustified, considering the scope of our study.
Response to: Hand acceleration time (HAT) as a diagnostic tool in the assessment of haemodialysis access-induced distal ischaemia (HAIDI): study protocol for a prospective cohort study in the Barcelona south metropolitan area, by Gonzalez et al.
Reshabh Yadav MD PhD 1, Marc R.M. Scheltinga MD PhD
Department of Surgery, Máxima Medical Center, Veldhoven, The Netherlands
To the Editor,
We congratulate Gonzalez et al. with their research protocol on HAT (hand acceleration time) in end stage renal disease (ESRD) patients requiring a haemodialysis access (1). They propose to conduct a study based on the assumption that HAT assessed by duplex ultrasound (DUS) reflects the vascular status of an arm. Aim is to quantify HAT before and after haemodialysis access construction and to determine whether pre- and postoperative HAT values can predict haemodialysis access-induced distal ischemia (HAIDI).
Based on our experience with HAIDI, some aspects of the protocol are worthwhile commenting on:
HAIDI in relation to ‘steal’.
Earlier studies suggested that HAIDI is caused by reversal of blood flow (‘steal’) that is shunted away from the hand (‘stolen from the hand’) due to the presence of an arteriovenous connection as in a vascular access for haemodialysis. On the contrary, steal is a phenomenon that has no pathophysiological significance related to HAIDI (2). The authors justifiably conclude that ESRD patients who often have diabetes melli...
Response to: Hand acceleration time (HAT) as a diagnostic tool in the assessment of haemodialysis access-induced distal ischaemia (HAIDI): study protocol for a prospective cohort study in the Barcelona south metropolitan area, by Gonzalez et al.
Reshabh Yadav MD PhD 1, Marc R.M. Scheltinga MD PhD
Department of Surgery, Máxima Medical Center, Veldhoven, The Netherlands
To the Editor,
We congratulate Gonzalez et al. with their research protocol on HAT (hand acceleration time) in end stage renal disease (ESRD) patients requiring a haemodialysis access (1). They propose to conduct a study based on the assumption that HAT assessed by duplex ultrasound (DUS) reflects the vascular status of an arm. Aim is to quantify HAT before and after haemodialysis access construction and to determine whether pre- and postoperative HAT values can predict haemodialysis access-induced distal ischemia (HAIDI).
Based on our experience with HAIDI, some aspects of the protocol are worthwhile commenting on:
HAIDI in relation to ‘steal’.
Earlier studies suggested that HAIDI is caused by reversal of blood flow (‘steal’) that is shunted away from the hand (‘stolen from the hand’) due to the presence of an arteriovenous connection as in a vascular access for haemodialysis. On the contrary, steal is a phenomenon that has no pathophysiological significance related to HAIDI (2). The authors justifiably conclude that ESRD patients who often have diabetes mellitus and/or severe atherosclerosis may develop HAIDI due to an impaired arterial remodeling capacity leading to a gradual loss of perfusion pressures down the arm towards the hand (2). Interestingly, one study found that all patients who developed HAIDI following vascular access construction were preoperatively found to have a single-artery dominant hand perfusion pattern indicating a compromised collateralization (3).
The role of finger plethysmography.
It has been demonstrated that abnormal digital brachial indices (<0.8 or >1.0) in ESRD patients prior to haemodialysis access construction are associated with lower 2-year access patency rates and increased cardiovascular related mortality (4). A minimal 40-mm Hg drop in finger pressure during a preoperative modified Allen Test yielded a 10 times greater risk of developing HAIDI later on (3). It may be worthwhile for the authors to include finger plethysmography in their research protocol.
A 6 month time frame for diagnosing chronic HAIDI is (too) short.
An earlier study found that time of onset in HAIDI is related to the access type. Acute HAIDI (within 1 day after access construction) is most often due to upper arm graft insertion. Subacute HAIDI (between 1-30 days after surgery) is four times more frequently observed following native HD accesses compared to non-autogenous HD accesses. However, (sub)acute types of HAIDI are rare. In contrast, the majority of HAIDI patients develop hand ischemia much later (5). One study reported severe HAIDI requiring invasive treatment after 16 ± 3 months following access construction (5). It is mandatory to observe patients at risk for HAIDI for a minimal 2-year time period.
Hand ischemic questionnaire as an objective diagnostic tool.
Symptoms of HAIDI are coldness, pain, and cramps. Severity and frequency of these ischemic symptoms can be assessed by a validated questionnaire (6). Ischemic scores reflected severity of ischemia and may be used to monitor efficacy of surgery aimed at reversing hand hypoperfusion (7). The authors may want to consider the use of this questionnaire as a monitoring tool.
Looking forward to the results of this interesting study.
Sincerely,
Reshabh Yadav, MD PhD
Marc R. M. Scheltinga MD PhD
References
1. Gonzalo B, Videla S, Espinar E, Palacios S, Herranz C, Iborra Ortega E. Hand acceleration time (HAT) as a diagnostic tool in the assessment of haemodialysis access-induced distal ischaemia (HAIDI): study protocol for a prospective cohort study in the Barcelona south metropolitan area. BMJ Open. 2025 Jan 2;15(1).
2. Scheltinga MRM, Bruijninckx CMA. Haemodialysis access-induced distal ischaemia (HAIDI) is caused by loco-regional hypotension but not by steal. Eur J Vasc Endovasc Surg. 2012 Feb;43(2):218–23.
3. Yadav R, Gerrickens MWM, Teijink JAW, Scheltinga MRM. Systolic finger pressures during an Allen test before hemodialysis access construction predict severe postoperative hand ischemia. J Vasc Surg. 2021 Dec;74(6):2040–6.
4. Yadav R, Gerrickens MWM, Teijink JAW, Scheltinga MRM. Abnormal preoperative digital brachial index is associated with lower 2-year arteriovenous fistula access patency. J Vasc Surg. 2021 Jul;74(1):237–45.
5. Scheltinga MRM, van Hoek F, Bruijninckx CMA. Time of onset in haemodialysis access-induced distal ischaemia (HAIDI) is related to the access type. Nephrol Dial Transplant. 2009 Oct;24(10):3198–204.
6. van Hoek F, Scheltinga MRM, Kouwenberg I, Moret KEM, Beerenhout CH, Tordoir JHM. Steal in hemodialysis patients depends on type of vascular access. Eur J Vasc Endovasc Surg. 2006 Dec;32(6):710–7.
7. van Hoek F, Scheltinga MRM, Luirink M, Pasmans H, Beerenhout C. Banding of hemodialysis access to treat hand ischemia or cardiac overload. Semin Dial. 2009;22(2):204–8.
Following the publication of the original article, it has come to the authors’ attention that the timing of the analysis was still based on the wording of the original funding application, and had not been updated prior to publication of the trial protocol paper.
The original funding application where both the short- and long-term outcomes were deemed as co-primary outcomes and as such would have been analysed at study end. At point of funding by the NIHR, we were requested to consider only the short-term outcome to be the primary outcome, and as such the timing of the analysis should have been changed so that short term outcomes were analysed first and longer-term outcomes after 2 years post-partum. The analysis plan was adjusted at that time, according to the NIHRs request.
The article currently states (under the Main analysis section) that: “All analyses will be undertaken after database lock following data collection at 2 years.”
However, the wording in this section should read: “Analysis of the short-term outcomes will be carried out after database lock following data collection at birth. The longer-term outcomes will be analysed after database lock following data collection at 2 years post-partum.”
To ensure that knowledge of the short-term outcomes will not impact the scientific integrity of the longer-term outcomes, we will continue to adhere to strict retention protocols to follow up the mothers at 2 years for the longer-term health...
Following the publication of the original article, it has come to the authors’ attention that the timing of the analysis was still based on the wording of the original funding application, and had not been updated prior to publication of the trial protocol paper.
The original funding application where both the short- and long-term outcomes were deemed as co-primary outcomes and as such would have been analysed at study end. At point of funding by the NIHR, we were requested to consider only the short-term outcome to be the primary outcome, and as such the timing of the analysis should have been changed so that short term outcomes were analysed first and longer-term outcomes after 2 years post-partum. The analysis plan was adjusted at that time, according to the NIHRs request.
The article currently states (under the Main analysis section) that: “All analyses will be undertaken after database lock following data collection at 2 years.”
However, the wording in this section should read: “Analysis of the short-term outcomes will be carried out after database lock following data collection at birth. The longer-term outcomes will be analysed after database lock following data collection at 2 years post-partum.”
To ensure that knowledge of the short-term outcomes will not impact the scientific integrity of the longer-term outcomes, we will continue to adhere to strict retention protocols to follow up the mothers at 2 years for the longer-term health and development outcomes of the child. In addition, we believe that there is no risk of the long-term outcomes being affected by the knowledge of the short-term outcomes as data collection is via a survey completed by the mothers on the health and development of the child.
This clarification has been approved by the Trial Steering Committee.
This rapid response correction is submitted whilst recruitment to the trial is still ongoing.
Yours sincerely,
Dr Rebecca Cannings-John, on behalf of the TRUFFLE 2 Trial Management Group
The study protocol by Zaman et al describes PAK-SEHAT as the research initiative for investigating premature atherosclerotic cardiovascular disease (ASCVD) in Pakistan. The research targets an important knowledge gap in cardiovascular healthcare research for South Asia because its CVD prevalence continues to increase in populations with low-to-middle income status.
Several comments arise from our reading of the research. The research identifies “young adults” as very mature males who are younger than 60 and very mature females who are younger than 65 years but this definition goes beyond the typical age range of 18–44 years [1]. The expanded population inclusion might dim the line separating early and conventional ASCVD manifestation.
CCTA along with CIMT serves as an effective method to detect subclinical plaques in patients [2]. These expensive diagnostic tests represent a major barrier that affects the system-wide implementation of public health screening and intervention programs in Pakistan.
Excluding participants with BMI higher than 40 kg/m² or eGFR lower than 60 ml/min/1.73m² may unintentionally exclude persons at high risk from the study. People with South Asian origins who have metabolic syndrome or renal impairment tend to develop ASCVD at an earlier stage according to research [3] and their removal from the study might diminish the application of study findings to wider populations.
The protocol states it will recruit nationally in P...
The study protocol by Zaman et al describes PAK-SEHAT as the research initiative for investigating premature atherosclerotic cardiovascular disease (ASCVD) in Pakistan. The research targets an important knowledge gap in cardiovascular healthcare research for South Asia because its CVD prevalence continues to increase in populations with low-to-middle income status.
Several comments arise from our reading of the research. The research identifies “young adults” as very mature males who are younger than 60 and very mature females who are younger than 65 years but this definition goes beyond the typical age range of 18–44 years [1]. The expanded population inclusion might dim the line separating early and conventional ASCVD manifestation.
CCTA along with CIMT serves as an effective method to detect subclinical plaques in patients [2]. These expensive diagnostic tests represent a major barrier that affects the system-wide implementation of public health screening and intervention programs in Pakistan.
Excluding participants with BMI higher than 40 kg/m² or eGFR lower than 60 ml/min/1.73m² may unintentionally exclude persons at high risk from the study. People with South Asian origins who have metabolic syndrome or renal impairment tend to develop ASCVD at an earlier stage according to research [3] and their removal from the study might diminish the application of study findings to wider populations.
The protocol states it will recruit nationally in Pakistan although it fails to clarify appropriate representation of every province and ethnic groups and socioeconomic backgrounds. The study requires clarification about population sampling across different regions since cardiovascular health risks differ between areas [4]. This information helps prevent vital subgroup differences from being neglected.
The research depends on single-site data collection and industrial sponsorship which creates issues about research diffusion across diverse settings and possible funding-related biases. Research in public health achieves its best outcomes when an independent assessment maintains strong transparency standards and preserves public trust [5].
The PAK-SEHAT initiative holds great importance to us so we look forward to seeing how our proposals can improve both its application and its impact.,
Sincerely,
Zunaira Kiran, Umaimah Mirza, Imteshal Sarfaraz
Comments related to the findings reported by Warrington and Holm regarding the use of Artificial Intelligence (AI) by UK General Medical Council (GMC) registered doctors (Warrington DJ, Holm S. BMJ Open 2024;14:e089090. doi:10.1136/bmjopen-2024-089090).
A key observation regarding the study is its apparent lack of distinction between participants' use of regulated AI products (classified as medical devices) and non-regulated AI tools (such as general-purpose LLMs). The wide range of respondent specialties reported further highlights this potential issue; for instance, clinicians in radiology or pathology are more likely to encounter regulated, task-specific AI, whereas those in public health or psychiatry might be more likely to experiment with non-regulated, general-purpose models.
During the review process, I used Gemini Advanced (specifically, the model designated by the user as 2.5 Pro Experimental) to assist with processing screenshots of table1, table 2 and fig 1 into spreadsheet processable data. The same Large Language Model (LLM) was also prompted to categorize the clinical risks associated with the AI uses listed in Figure 1 of the original paper. The author is of the opinion that the LLM’s risk categorisation (column 2 in table A below) adopted a patient-centric perspective.
However, the author is of the opinion that a "composite" clinical risk assessment, which considers both the nature of the specific usage instance and the pot...
Comments related to the findings reported by Warrington and Holm regarding the use of Artificial Intelligence (AI) by UK General Medical Council (GMC) registered doctors (Warrington DJ, Holm S. BMJ Open 2024;14:e089090. doi:10.1136/bmjopen-2024-089090).
A key observation regarding the study is its apparent lack of distinction between participants' use of regulated AI products (classified as medical devices) and non-regulated AI tools (such as general-purpose LLMs). The wide range of respondent specialties reported further highlights this potential issue; for instance, clinicians in radiology or pathology are more likely to encounter regulated, task-specific AI, whereas those in public health or psychiatry might be more likely to experiment with non-regulated, general-purpose models.
During the review process, I used Gemini Advanced (specifically, the model designated by the user as 2.5 Pro Experimental) to assist with processing screenshots of table1, table 2 and fig 1 into spreadsheet processable data. The same Large Language Model (LLM) was also prompted to categorize the clinical risks associated with the AI uses listed in Figure 1 of the original paper. The author is of the opinion that the LLM’s risk categorisation (column 2 in table A below) adopted a patient-centric perspective.
However, the author is of the opinion that a "composite" clinical risk assessment, which considers both the nature of the specific usage instance and the potential scale of patient impact arising from a flawed AI output (e.g., contrasting the impact of an error in analyzing an individual patient's test results with that of disseminating flawed guidance derived from an AI-assisted literature review), is more appropriate. Author assumed most probable categorisation of software and composite risk assessments (low, medium and high scale) are recorded in columns 3 and 4 of table A
Table A
Use Case (ref Figure 1,Warrington DJ, Rank (gemini Author assumed Author's
Holm S. BMJ Open 2024;14:e089090. clinical risk most probable composite risk
doi:10.1136/bmjopen-2024-089090) prompt - output) categorisation of assessment
software (low, medium
and high
scale)
search the scientific literature 1 Non-regulated Low
stay up to date with my medical knowledge 2 Non-regulated Low
write educational material 3 Non-regulated High
write research papers or essays 4 Non-regulated High
write reflective pieces for my portfolio 5 Non-regulated Low
automate administerial tasks 6 Non-regulated Medium
perform data analytics 7 Non-regulated Medium
write patient letters or notes 8 Non-regulated Medium
interpret pathology slides 9 Regulated Low
interpret scans or results 10 Regulated Low
make diagnoses 11 Non-regulated Medium
make treatment recommendations 12 Non-regulated Medium
plan or perform radiotherapy 13 Regulated Low
plan or perform surgical operations 14 Regulated Low
Other 15 Non-regulated Medium
Furthermore, regarding the survey questions exploring AI's role in decision-making (specifically referencing the implications of question 5d in Table 2, concerning autonomous decisions), this author is of the view that such questions may implicitly understate the clinician's ultimate responsibility. Under the current UK regulatory system, it is my understanding that the clinician, not the AI device, remains accountable for the clinical decision. Consequently, the onus is on the clinician to critically evaluate and, if necessary, seek "a second opinion" on the AI's output before integrating it into their decision-making. The patient retains the right to request a second opinion on the clinician's final clinical judgment.
Finally, as a point of good practice for maintaining professional accountability and audit trails, healthcare registrants using LLMs for clinical decision support might consider exporting AI outputs to secure documents (e.g., Google Docs or similar platforms), particularly when using tools lacking integrated tracking features.
In breast cancer screening, the term "overdiagnosis" is a misnomer. It would be more accurate to state that the natural history of screen-detected cancer has not been adequately verified.
Overdiagnosis is typically defined as the diagnosis of a lesion as cancer that will not cause symptoms or result in death. This definition assumes that cancer detection is appropriate and occurs in individuals for whom the diagnosis would be clinically relevant. For instance, in elderly patients, cancer may not lead to symptoms or death within their life expectancy. However, the issue in breast cancer screening is not related to the duration of observation but rather to the diagnosis itself.
Cancer is generally diagnosed when a mass is detected through imaging or endoscopy and its malignant nature is confirmed histopathologically. However, early-stage breast cancer is an unusual case. These lesions may not form a detectable mass and are diagnosed as cancer based solely on histopathological findings. There is no scientific or clinical verification that cancers identified in this manner are biologically cancerous. Consequently, most clinical studies on breast cancer screening are essentially uncontrolled case series, lacking rigorous controls and, as such, are not scientifically interpretable or statistically reliable.
A systematic review by the U.S. Preventive Services Task Force (USPSTF) did not provide clear evidence that breast cancer screening reduces cancer...
In breast cancer screening, the term "overdiagnosis" is a misnomer. It would be more accurate to state that the natural history of screen-detected cancer has not been adequately verified.
Overdiagnosis is typically defined as the diagnosis of a lesion as cancer that will not cause symptoms or result in death. This definition assumes that cancer detection is appropriate and occurs in individuals for whom the diagnosis would be clinically relevant. For instance, in elderly patients, cancer may not lead to symptoms or death within their life expectancy. However, the issue in breast cancer screening is not related to the duration of observation but rather to the diagnosis itself.
Cancer is generally diagnosed when a mass is detected through imaging or endoscopy and its malignant nature is confirmed histopathologically. However, early-stage breast cancer is an unusual case. These lesions may not form a detectable mass and are diagnosed as cancer based solely on histopathological findings. There is no scientific or clinical verification that cancers identified in this manner are biologically cancerous. Consequently, most clinical studies on breast cancer screening are essentially uncontrolled case series, lacking rigorous controls and, as such, are not scientifically interpretable or statistically reliable.
A systematic review by the U.S. Preventive Services Task Force (USPSTF) did not provide clear evidence that breast cancer screening reduces cancer mortality. These randomized controlled trials (RCTs) include three phases: mammography as a screening tool, histopathology as a diagnostic test, and treatment. None of these phases have a solid scientific foundation. Even if the results of these trials appear positive, we cannot confidently attribute the outcomes to early detection and treatment. Before designing an RCT, it must first be verified that cancers detected by screening have a higher mortality rate than the general healthy population and that treatment can reduce this rate. If such verification is obtained, retrospective studies comparing imaging and pathology findings could suffice for assessing the efficacy of screening mammography, negating the need for large-scale RCTs.
The USPSTF has assigned a Grade B recommendation to breast cancer screening. This decision may be based on the fact that the complications from breast cancer treatment are mostly psychological, with rare physical consequences. However, without clear evidence of benefit, the potential harm cannot be justified, no matter how small. Given the lack of definitive evidence, a Grade I (Insufficient evidence) recommendation would be more appropriate. It seems to me that the USPSTF's Grade B recommendation has contributed to a misconception within the medical community that breast cancer screening is proven to reduce mortality. In reality, there are no clinical trials that provide the necessary evidence to support this claim.
REFERENCE
US Preventive Services Task Force; Recommendation: Breast Cancer:Screening. https://www.uspreventiveservicestaskforce.org/uspstf/recommendation/brea... (accessed Mar 25th 2025)
Zahl PH, Gotzsche PC, Mahlen J. Natural history of breast cancers detected in the Swedish mammography screening programme: a cohort study. Lancet Oncol. 2011 Nov;12(12):1118-24. doi: 10.1016/S1470-2045(11)70250-9.
RJ Forestier, FB Erol Forestier, I Santos, A Muela Garcia, A Françon.
Centre de Recherches Rhumatologiques et Thermales d’Aix-les-Bains, Aix Les Bains, France.
This meta-analysis approaches spa therapy as if it were a pharmaceutical intervention, which we believe does not fully reflect the complex and multifaceted nature of such treatments.
We have been conducting clinical trials and systematic reviews in this field for over 30 years. In our experience, spa therapy is a complex intervention traditionally based on the use of thermal mineral water, often combined with massages, baths, showers, mud applications, and supervised pool-based exercises -each of which may have therapeutic effects of its own.
We were surprised by the conclusions of this meta-analysis regarding both the therapeutic effect and the risk of bias, as they differ markedly from our own findings and appear to stem from several questionable methodological choices.
Bibliographic Incompleteness
The limited scope of the literature search is particularly problematic. In 2020, we identified 122 comparative trials on balneotherapy, whereas this meta-analysis included only 42 randomized controlled trials. Our complementary search updated to 2025 identified 42 trials focused solely on knee osteoarthritis, and a total of 141 trials after removing duplicates related to multiple conditions. The highly selective inclusion criteria adopted in this meta-analysis substantially reduced t...
RJ Forestier, FB Erol Forestier, I Santos, A Muela Garcia, A Françon.
Centre de Recherches Rhumatologiques et Thermales d’Aix-les-Bains, Aix Les Bains, France.
This meta-analysis approaches spa therapy as if it were a pharmaceutical intervention, which we believe does not fully reflect the complex and multifaceted nature of such treatments.
We have been conducting clinical trials and systematic reviews in this field for over 30 years. In our experience, spa therapy is a complex intervention traditionally based on the use of thermal mineral water, often combined with massages, baths, showers, mud applications, and supervised pool-based exercises -each of which may have therapeutic effects of its own.
We were surprised by the conclusions of this meta-analysis regarding both the therapeutic effect and the risk of bias, as they differ markedly from our own findings and appear to stem from several questionable methodological choices.
Bibliographic Incompleteness
The limited scope of the literature search is particularly problematic. In 2020, we identified 122 comparative trials on balneotherapy, whereas this meta-analysis included only 42 randomized controlled trials. Our complementary search updated to 2025 identified 42 trials focused solely on knee osteoarthritis, and a total of 141 trials after removing duplicates related to multiple conditions. The highly selective inclusion criteria adopted in this meta-analysis substantially reduced the number of eligible studies, which is especially concerning given the complexity and diversity of spa therapy interventions. This restrictive approach may have contributed to several other questionable methodological decisions, such as:
• Excluding studies from Türkiye and Israel for the sake of homogeneity, although practices in these countries are very similar to those in Europe;
• Favoring 3-month outcomes as the primary endpoint, even when 6-month data were available - a questionable approach in the context of chronic diseases.
Heterogeneity Amplified by Methodological Choices
We acknowledge that heterogeneity is inherently high in this field, due to the complex and multifaceted nature of spa therapy. However, we believe that the methodological choices made in this meta-analysis have significantly exacerbated the problem. As a result, the primary issue became the amplified heterogeneity, which compromises both the feasibility and the validity of the analysis, and makes it difficult to draw reliable conclusions about the presence or absence of a therapeutic effect. These problematic methodological choices include the following:
• An overall pooled analysis was performed across very different conditions, yet this result is presented as the main finding in both the abstract and the main text, despite the limited clinical relevance of such aggregation.
• The authors attempted to group conditions into mechanical disorders, inflammatory disorders, and fibromyalgia, but from a musculoskeletal medicine perspective, grouping clinically distinct conditions—such as rheumatoid arthritis and spondyloarthritis, or low back pain, knee osteoarthritis, hand osteoarthritis, and tendinitis- is both clinically and methodologically inappropriate.
• As previously noted, there is also substantial heterogeneity in the spa interventions themselves—in terms of duration, intensity, and content. While some studies assessed a single intervention, most involved multiple and diverse components. Control interventions also varied considerably, including active comparators, partial controls, waiting lists, usual care, or no treatment. Therefore, pooling these studies without distinction is not methodologically appropriate. If an overall analysis is conducted, subgroup analyses should at minimum be performed to account for these differences.
• Some studies, such as that by Franke et al. [1], were classified as placebo-controlled, but in reality, they compared two complex interventions: a combination of spa therapy, educational programs, exercise, and physiotherapy—with and without radon. The only difference between groups was the type of water used (radon-rich vs. radon-poor), while the other components of treatment were identical. In such cases, the specific effect of spa therapy—particularly that of radon—is diluted by the presence of multiple active co-interventions. It is therefore questionable whether any conclusions can be drawn about the isolated effect of spa therapy or the water type from such designs, as the associated treatments likely had a substantial impact on outcomes.
• The method of calculating effect sizes is also problematic, as the authors combined “mean change change-from-baseline scores” with “end value” results across studies. End values are sensitive to baseline imbalance. If the baseline scores differ between groups, post-intervention scores may be misleading and may increase heterogeneity.
• The quality-of-life (QoL) measures used across studies varied considerably, with some employing generic instruments and others using condition-specific tools. Even the authors noted that they tried to select the least disease-specific QoL measure when multiple were available, pooling different measures that may have different ranges, scaling and sensitivity to change, makes the interpretation of pooled effect sizes more difficult - even with SMD. Besides, generic QoL instruments are generally recognized to be less responsive to clinical change and are not ideally suited for evaluating the therapeutic effects of interventions in rheumatic conditions. Beside the pain the use function-related outcome measures would have been more appropriate and clinically meaningful.
• The quality-of-life (QoL) measures used across studies varied, with some relying on generic instruments and others on condition-specific tools. The authors noted that they selected the least disease-specific measure when multiple options were available. However, pooling outcomes derived from different instruments—each with distinct ranges, scaling characteristics, and sensitivity to change—complicates the interpretation of effect sizes, even when standardized mean differences (SMDs) are applied.
• Moreover, generic QoL measures are widely recognized as being less responsive to clinical change and are not ideally suited for evaluating the therapeutic effects of interventions in rheumatic conditions. In addition to pain outcomes, the inclusion of function-related, condition-specific measures would have been more appropriate and clinically meaningful.
Problems linked to the risk of bias assessment
We fully agree that some spa therapy trials are of low methodological quality. However, it would be incorrect to assume that this applies to all studies in the field.
The Cochrane ROB2 tool is well-suited to evaluating explanatory drug trials with double-blind, placebo-controlled designs. However, it fails to adequately distinguish between more and less rigorous non-pharmacological studies, especially those with a pragmatic design, due to its limited capacity to address the absence of blinding and the variability in the implementation of complex interventions. These limitations include the following concerns:
• ROB2 particularly assumes that blinding of patients and outcome assessment is feasible, whereas in practice, this is very difficult to achieve [2] - except in rare cases such as radon studies. For example, sulphurous thermal waters have a strong odor, and bicarbonate-rich waters are often fizzy and salty, making them easily distinguishable from tap water. Moreover, patients are frequently the assessors of primary outcomes, particularly in rheumatology, where self-reported questionnaires are commonly used.
• Important contextual factors, such as the therapists’ level of experience and the presence of co-interventions, are not considered in the RoB 2 framework, despite their potential influence on outcomes.
• Alternative strategies for minimizing bias, such as Zelen randomization [3,4] or the use of qualitative outcomes [5,6], are not accounted for by RoB 2, despite their relevance in non-pharmacological trials.
Given the complexity of non-pharmacological interventions, tools such as the CLEAR scale [7] or the PEDro scale [8] would have been better suited for assessing internal validity in this context.
Conclusion
In conclusion, this meta-analysis raises important questions, not only about the evidence surrounding spa therapy, but also about how we choose to evaluate complex, non-pharmacological interventions. While efforts to synthesize the literature are commendable, the methodological decisions made here significantly limit the reliability and applicability of the findings. Spa therapy is a multifaceted intervention, deeply embedded in real-world clinical contexts. Evaluating it with the same methodological lens as a pharmaceutical product not only oversimplifies its nature but risks drawing misleading conclusions. Future reviews must better reflect the diversity and complexity of spa interventions, and adopt evaluation tools that are fit for purpose. Only then can we reach valid and clinically meaningful conclusions that inform both practice and policy.
References
[1] Annegret F, Thomas F. Long-term benefits of radon spa therapy in rheumatic diseases: results of the randomised, multi-centre IMuRa trial. Rheumatol Int. 2013 Nov;33(11):2839-50. doi: 10.1007/s00296-013-2819-8. Epub 2013 Jul 18.
[2] Boutron I, Tubach F, Giraudeau B, Ravaud P. Blinding was judged more difficult to achieve and maintain in nonpharmacologic than pharmacologic trials. J Clin Epidemiol. 2004 Jun;57(6):543-50. doi: 10.1016/j.jclinepi.2003.12.010.
[3] Zelen M. A new design for randomized clinical trials. N Engl J Med. 1979 May 31;300(22):1242-5. doi: 10.1056/NEJM197905313002203.
[4] Simon GE, Shortreed SM, DeBar LL. Zelen design clinical trials: why, when, and how. Trials. 2021 Aug 17;22(1):541. doi: 10.1186/s13063-021-05517-w. PMID: 34404466; PMCID: PMC8371763.
[5] Hróbjartsson A, Gøtzsche PC. Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N Engl J Med. 2001 May 24;344(21):1594-602. doi: 10.1056/NEJM200105243442106. Erratum in: N Engl J Med 2001 Jul 26;345(4):304. PMID: 11372012.
[6] Hróbjartsson A, Thomsen AS, Emanuelsson F, Tendal B, Hilden J, Boutron I, Ravaud P, Brorson S. Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. BMJ. 2012 Feb 27;344:e1119. doi: 10.1136/bmj.e1119. PMID: 22371859.
[7] Boutron I, Moher D, Tugwell P, Giraudeau B, Poiraudeau S, Nizard R, Ravaud P. A checklist to evaluate a report of a nonpharmacological trial (CLEAR NPT) was developed using consensus. J Clin Epidemiol. 2005 Dec;58(12):1233-40. doi: 10.1016/j.jclinepi.2005.05.004. Epub 2005 Oct 13. PMID: 16291467.
[8] Moseley AM, Rahman P, Wells GA, Zadro JR, Sherrington C, Toupin-April K, Brosseau L. Agreement between the Cochrane risk of bias tool and Physiotherapy Evidence Database (PEDro) scale: A meta-epidemiological study of randomized controlled trials of physical therapy interventions. PLoS One. 2019 Sep 19;14(9):e0222770. doi: 10.1371/journal.pone.0222770. PMID: 31536575; PMCID: PMC6752782.
Authors response to ‘Spa Therapy Is Not a Pill: Reconsidering Methods in the Evaluation of Complex Interventions’
We thank Forestier et al. for their rapid response to our study ‘Efficacy and safety of balneotherapy in rheumatology: a systematic review and meta-analysis’. We acknowledge that balneotherapy is a complex health intervention and hard to assess.
However, we did not approach balneotherapy “as if it were a pharmaceutical intervention”. We followed international guidance on synthetizing evidence (1) and grading their quality (2). These guidelines address the effects of health care interventions, including but not limited to pharmaceutical intervention. These approaches have successfully assessed balneotherapy in previous systematic reviews (3,4). We conducted and reported subgroup analysis to explore the complexity of the intervention and acknowledged in the discussion that our findings also highlighted the difficulty in assessing such complex intervention.
Regarding the bibliographic search, we followed international guidance, searched in the three main bibliographic databases and other sources including for unpublished trial. The ratio of the number of included studies and the number of retrieval in our review was consistent with the literature (5). We were even able to show the presence of a publication bias, despite the low power of the publication bias tests, highlighting the high number of retrieved and included studies in our review, c...
Show MoreDear Editors and Authors,
I am not convinced by the use of the so-called Bayesian Confidence Propagation Neural Network (BCPNN) in this context.
In pharmacovigilance—particularly when evaluating safety signals—the use of a prior hypothesis regarding the safety of a vaccine or drug should be approached with caution. In this case, we lack reliable prior knowledge of the product’s safety profile. Assuming otherwise may be misleading and, potentially, dangerous.
Established disproportionality measures such as the Proportional Reporting Ratio (PRR) or Reporting Odds Ratio (ROR), when accompanied by confidence intervals (CIs), already provide valuable insight. If the CIs are wide, this simply reflects uncertainty—and that, in itself, is informative enough.
The main result of the paper appears to be the PRR of approximately 23 (with a lower bound exceeding 9) for preterm birth following the RSV vaccine. This is striking, yet it is not highlighted in the conclusions; one has to look in the appendix (link "supplemental material"), specifically Table S6 p.10, to find it.
Why is there such a significant discrepancy between the PRR (~23) and the Information Component (IC, ~2+)? Even at the lower bound, the PRR remains notably elevated. This likely stems from an inappropriate prior used in the Bayesian model. In fact, the paper serves as a good illustration of why Bayesian methods, particularly in the form of BCPNN, may not be suitable for p...
Show MoreTo the Editor:
I read with great interest the recent article by Rous et al. (BMJ Open 2025;15:e086648), which presents an important modeling analysis of screening intervals for multi-cancer early detection (MCED) tests based on cell-free DNA (cfDNA) methylation profiling. The work underscores the growing utility of cfDNA-based diagnostics in detecting cancer-specific epigenetic signatures with high specificity.
Show MoreHowever, I would like to respectfully offer an additional perspective that may have been overlooked, namely, the active immunologic role of methylated DNA in modulating tumor immunity. Based on our group's published work, we have shown that methylated DNA, particularly methylated CpG motifs, can directly stimulate the differentiation of Foxp3+ regulatory T cells (Tregs) (1-4). This immunologic pathway contributes to immune tolerance and may facilitate tumor immune evasion.
While current MCED models consider methylation purely as a passive biomarker of malignancy, it is important to recognize that the same methylated cfDNA fragments detected in plasma may also exert biologic effects on the host immune system. In particular, their capacity to expand Treg populations could help explain why some tumors remain clinically silent or escape immune surveillance, even when detectable at early stages by cfDNA analysis.
This dual role—diagnostic and immunoregulatory—has implications for both the interpretation of MCED test resul...
Responses to comments from a reviewer on a article published in BMJ Open in 2021
(https://www.pubpeer.com/publications/ABB5D808AAF436219EBFB9896A0D05)
Reader’s comment nr 1: Sample Size Calculation
The study reports: “For the estimation of HPV prevalence, a sample size of 267 participants was needed with an anticipated prevalence of HPV of 50%, 6% precision, and a 5% level of significance.”
• While 6% precision is technically possible, this departs from the standard 5% cutoff.
• Since the study employed snowball sampling —a convenience sampling method from a single center—to estimate nationwide HPV prevalence, it failed to consider the design effect and non-response rates inherent in such non-random sampling methods. After accounting for these factors, the required sample size should be approximately 446 participants for 6% precision and 642 participants for 5% precision. Not 267.
Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling Studies. American Journal of Epidemiology, 2006; 163(5): 471–478.
Snowball Sampling: A Review of Key Issues and Methodological Considerations. International Journal of Social Research Methodology, 2013; 16(4): 351-367. (doi:10.1080/13645579.2013.801561)
RESPONSE to comment nr 1:
Show MoreThank you so much for raising this concern.
Design effect (DE) is used for comple...
Response to: Hand acceleration time (HAT) as a diagnostic tool in the assessment of haemodialysis access-induced distal ischaemia (HAIDI): study protocol for a prospective cohort study in the Barcelona south metropolitan area, by Gonzalez et al.
Reshabh Yadav MD PhD 1, Marc R.M. Scheltinga MD PhD
Department of Surgery, Máxima Medical Center, Veldhoven, The Netherlands
To the Editor,
We congratulate Gonzalez et al. with their research protocol on HAT (hand acceleration time) in end stage renal disease (ESRD) patients requiring a haemodialysis access (1). They propose to conduct a study based on the assumption that HAT assessed by duplex ultrasound (DUS) reflects the vascular status of an arm. Aim is to quantify HAT before and after haemodialysis access construction and to determine whether pre- and postoperative HAT values can predict haemodialysis access-induced distal ischemia (HAIDI).
Based on our experience with HAIDI, some aspects of the protocol are worthwhile commenting on:
HAIDI in relation to ‘steal’.
Show MoreEarlier studies suggested that HAIDI is caused by reversal of blood flow (‘steal’) that is shunted away from the hand (‘stolen from the hand’) due to the presence of an arteriovenous connection as in a vascular access for haemodialysis. On the contrary, steal is a phenomenon that has no pathophysiological significance related to HAIDI (2). The authors justifiably conclude that ESRD patients who often have diabetes melli...
Following the publication of the original article, it has come to the authors’ attention that the timing of the analysis was still based on the wording of the original funding application, and had not been updated prior to publication of the trial protocol paper.
The original funding application where both the short- and long-term outcomes were deemed as co-primary outcomes and as such would have been analysed at study end. At point of funding by the NIHR, we were requested to consider only the short-term outcome to be the primary outcome, and as such the timing of the analysis should have been changed so that short term outcomes were analysed first and longer-term outcomes after 2 years post-partum. The analysis plan was adjusted at that time, according to the NIHRs request.
The article currently states (under the Main analysis section) that: “All analyses will be undertaken after database lock following data collection at 2 years.”
However, the wording in this section should read: “Analysis of the short-term outcomes will be carried out after database lock following data collection at birth. The longer-term outcomes will be analysed after database lock following data collection at 2 years post-partum.”
To ensure that knowledge of the short-term outcomes will not impact the scientific integrity of the longer-term outcomes, we will continue to adhere to strict retention protocols to follow up the mothers at 2 years for the longer-term health...
Show MoreThe study protocol by Zaman et al describes PAK-SEHAT as the research initiative for investigating premature atherosclerotic cardiovascular disease (ASCVD) in Pakistan. The research targets an important knowledge gap in cardiovascular healthcare research for South Asia because its CVD prevalence continues to increase in populations with low-to-middle income status.
Several comments arise from our reading of the research. The research identifies “young adults” as very mature males who are younger than 60 and very mature females who are younger than 65 years but this definition goes beyond the typical age range of 18–44 years [1]. The expanded population inclusion might dim the line separating early and conventional ASCVD manifestation.
CCTA along with CIMT serves as an effective method to detect subclinical plaques in patients [2]. These expensive diagnostic tests represent a major barrier that affects the system-wide implementation of public health screening and intervention programs in Pakistan.
Excluding participants with BMI higher than 40 kg/m² or eGFR lower than 60 ml/min/1.73m² may unintentionally exclude persons at high risk from the study. People with South Asian origins who have metabolic syndrome or renal impairment tend to develop ASCVD at an earlier stage according to research [3] and their removal from the study might diminish the application of study findings to wider populations.
The protocol states it will recruit nationally in P...
Show MoreComments related to the findings reported by Warrington and Holm regarding the use of Artificial Intelligence (AI) by UK General Medical Council (GMC) registered doctors (Warrington DJ, Holm S. BMJ Open 2024;14:e089090. doi:10.1136/bmjopen-2024-089090).
Show MoreA key observation regarding the study is its apparent lack of distinction between participants' use of regulated AI products (classified as medical devices) and non-regulated AI tools (such as general-purpose LLMs). The wide range of respondent specialties reported further highlights this potential issue; for instance, clinicians in radiology or pathology are more likely to encounter regulated, task-specific AI, whereas those in public health or psychiatry might be more likely to experiment with non-regulated, general-purpose models.
During the review process, I used Gemini Advanced (specifically, the model designated by the user as 2.5 Pro Experimental) to assist with processing screenshots of table1, table 2 and fig 1 into spreadsheet processable data. The same Large Language Model (LLM) was also prompted to categorize the clinical risks associated with the AI uses listed in Figure 1 of the original paper. The author is of the opinion that the LLM’s risk categorisation (column 2 in table A below) adopted a patient-centric perspective.
However, the author is of the opinion that a "composite" clinical risk assessment, which considers both the nature of the specific usage instance and the pot...
In breast cancer screening, the term "overdiagnosis" is a misnomer. It would be more accurate to state that the natural history of screen-detected cancer has not been adequately verified.
Show MoreOverdiagnosis is typically defined as the diagnosis of a lesion as cancer that will not cause symptoms or result in death. This definition assumes that cancer detection is appropriate and occurs in individuals for whom the diagnosis would be clinically relevant. For instance, in elderly patients, cancer may not lead to symptoms or death within their life expectancy. However, the issue in breast cancer screening is not related to the duration of observation but rather to the diagnosis itself.
Cancer is generally diagnosed when a mass is detected through imaging or endoscopy and its malignant nature is confirmed histopathologically. However, early-stage breast cancer is an unusual case. These lesions may not form a detectable mass and are diagnosed as cancer based solely on histopathological findings. There is no scientific or clinical verification that cancers identified in this manner are biologically cancerous. Consequently, most clinical studies on breast cancer screening are essentially uncontrolled case series, lacking rigorous controls and, as such, are not scientifically interpretable or statistically reliable.
A systematic review by the U.S. Preventive Services Task Force (USPSTF) did not provide clear evidence that breast cancer screening reduces cancer...
RJ Forestier, FB Erol Forestier, I Santos, A Muela Garcia, A Françon.
Centre de Recherches Rhumatologiques et Thermales d’Aix-les-Bains, Aix Les Bains, France.
This meta-analysis approaches spa therapy as if it were a pharmaceutical intervention, which we believe does not fully reflect the complex and multifaceted nature of such treatments.
Show MoreWe have been conducting clinical trials and systematic reviews in this field for over 30 years. In our experience, spa therapy is a complex intervention traditionally based on the use of thermal mineral water, often combined with massages, baths, showers, mud applications, and supervised pool-based exercises -each of which may have therapeutic effects of its own.
We were surprised by the conclusions of this meta-analysis regarding both the therapeutic effect and the risk of bias, as they differ markedly from our own findings and appear to stem from several questionable methodological choices.
Bibliographic Incompleteness
The limited scope of the literature search is particularly problematic. In 2020, we identified 122 comparative trials on balneotherapy, whereas this meta-analysis included only 42 randomized controlled trials. Our complementary search updated to 2025 identified 42 trials focused solely on knee osteoarthritis, and a total of 141 trials after removing duplicates related to multiple conditions. The highly selective inclusion criteria adopted in this meta-analysis substantially reduced t...
Pages