Differences in scores between GPT-4 and the doctor groups
Mean difference (95% CI) | P value | |
A vs C: random doctor vs GPT-4 | 1.6 (0.9 to 2.2) | <0.001 |
B vs C: top-tier doctor vs GPT-4 | 2.7 (2.2 to 3.3) | <0.001 |
A vs B: top-tier vs random doctor | 1.2 (0.7 to 1.7) | <0.001 |