Statistics from Altmetric.com
We would like to thank Héctor David Meza-Comparán1 for their interest in our work and their insightful comments on our study published in the Journal of Investigative Medicine.2 Here we address the issues raised in it.
As correctly pointed out, ‘clinical gestalt’ refers to ‘a physician’s unstructured estimate’ or an ‘overall clinical impression’. No formal definition was provided because it is a widespread term with a consistent connotation in the literature. On the other hand, as was emphasized in our introduction and in our discussion section, the main objective of our work was to make the point that the current validity of all mortality scores is likely impaired by the change in the pandemic context. Therefore, being thorough and including all Mexican COVID-19 mortality scores was beyond the purpose of our work.
Regarding the differences in years of experience between residents, we respectfully disagree. While it is true that senior residents are more likely to be confident than junior residents, this confidence applies most likely to the late clinical scenarios where senior residents have more experience (ie, nosocomial pneumonia, acute respiratory distress syndrome or pulmonary embolism) and not at hospital admission (before these complications occur). Nonetheless, because the admission process to the Internal Medicine residency program in our hospital is the most competitive in the country, the students in the top percentiles of each generation are the ones that usually conform to every resident’s cohort and their graduation years tend to be quite homogenous.
With regard to the issues raised about our sample size, we would like to point out that, as was specified in our Methods section, we did not use the default input parameters of easyROC but those necessary for using Obuchowski’s method which considers allocation ratio and levels of observer variability.3 Arguably, and without the intention of falling into a semantic discussion about what it means to be non-inferior in terms of area under the curve (AUC), for any diagnostic or classification tool it is reasonable to deem relevant any discrepancy beyond the original CI. Since the originally documented AUC of the LOW-HARM score was 0.96 (95% CI 0.94 to 0.98),4 as detailed in our Methods section, detecting a 0.05 AUC difference with a case allocation ratio of 0.7 (because the mortality at our center is ~0.3) with a power of 0.8 and a significance cut-off level of 0.05 would require 159 patients. Since we included 166 patients and since the discrepancies with the original AUC of all scores were so large, we think it is unlikely our results are due to low statistical power.
In summary, we think that despite its inherent limitations, our work strongly suggests that the clinical utility and predictive performance of most COVID-19 mortality scores (and of many other clinical scenarios as well) demand regular reassessment. In contrast, the inherent Bayesian nature of clinical gestalt makes it continuously sensitive and quick to adapt to highly dynamic contexts (ie, hospitalization strain) while improving continuously as more information updates clinical practice (ie, novel therapies such as dexamethasone and vaccines).
Patient consent for publication
This study does not involve human participants.
Contributors Both authors contributed to the writing of this text and approve the current version.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.