Deep learning-based triage and analysis of lesion burden for COVID-19: a retrospective study with external validation

September 23, 2020 steven

Summary

Background

Prompt identification of patients suspected to have COVID-19 is crucial for disease control. We aimed to develop a deep learning algorithm on the basis of chest CT for rapid triaging in fever clinics.

Methods

We trained a U-Net-based model on unenhanced chest CT scans obtained from 2447 patients admitted to Tongji Hospital (Wuhan, China) between Feb 1, 2020, and March 3, 2020 (1647 patients with RT-PCR-confirmed COVID-19 and 800 patients without COVID-19) to segment lung opacities and alert cases with COVID-19 imaging manifestations. The ability of artificial intelligence (AI) to triage patients suspected to have COVID-19 was assessed in a large external validation set, which included 2120 retrospectively collected consecutive cases from three fever clinics inside and outside the epidemic centre of Wuhan (Tianyou Hospital [Wuhan, China; area of high COVID-19 prevalence], Xianning Central Hospital [Xianning, China; area of medium COVID-19 prevalence], and The Second Xiangya Hospital [Changsha, China; area of low COVID-19 prevalence]) between Jan 22, 2020, and Feb 14, 2020. To validate the sensitivity of the algorithm in a larger sample of patients with COVID-19, we also included 761 chest CT scans from 722 patients with RT-PCR-confirmed COVID-19 treated in a makeshift hospital (Guanggu Fangcang Hospital, Wuhan, China) between Feb 21, 2020, and March 6, 2020. Additionally, the accuracy of AI was compared with a radiologist panel for the identification of lesion burden increase on pairs of CT scans obtained from 100 patients with COVID-19.

Findings

In the external validation set, using radiological reports as the reference standard, AI-aided triage achieved an area under the curve of 0·953 (95% CI 0·949–0·959), with a sensitivity of 0·923 (95% CI 0·914–0·932), specificity of 0·851 (0·842–0·860), a positive predictive value of 0·790 (0·777–0·803), and a negative predictive value of 0·948 (0·941–0·954). AI took a median of 0·55 min (IQR: 0·43–0·63) to flag a positive case, whereas radiologists took a median of 16·21 min (11·67–25·71) to draft a report and 23·06 min (15·67–39·20) to release a report. With regard to the identification of increases in lesion burden, AI achieved a sensitivity of 0·962 (95% CI 0·947–1·000) and a specificity of 0·875 (95 %CI 0·833–0·923). The agreement between AI and the radiologist panel was high (Cohen’s kappa coefficient 0·839, 95% CI 0·718–0·940).

Interpretation

A deep learning algorithm for triaging patients with suspected COVID-19 at fever clinics was developed and externally validated. Given its high accuracy across populations with varied COVID-19 prevalence, integration of this system into the standard clinical workflow could expedite identification of chest CT scans with imaging indications of COVID-19.