Deep learning-based triage and analysis of lesion burden for COVID-19: a retrospective study with external validation
Summary
Background
Prompt identification of patients suspected to have COVID-19 is crucial for disease control. We aimed to develop a deep learning algorithm on the basis of chest CT for rapid triaging in fever clinics.
Methods
We trained a U-Net-based model on unenhanced chest CT scans obtained from 2447 patients admitted to Tongji Hospital (Wuhan, China) between Feb 1, 2020, and March 3, 2020 (1647 patients with RT-PCR-confirmed COVID-19 and 800 patients without COVID-19) to segment lung opacities and alert cases with COVID-19 imaging manifestations. The ability of artificial intelligence (AI) to triage patients suspected to have COVID-19 was assessed in a large external validation set, which included 2120 retrospectively collected consecutive cases from three fever clinics inside and outside the epidemic centre of Wuhan (Tianyou Hospital [Wuhan, China; area of high COVID-19 prevalence], Xianning Central Hospital [Xianning, China; area of medium COVID-19 prevalence], and The Second Xiangya Hospital [Changsha, China; area of low COVID-19 prevalence]) between Jan 22, 2020, and Feb 14, 2020. To validate the sensitivity of the algorithm in a larger sample of patients with COVID-19, we also included 761 chest CT scans from 722 patients with RT-PCR-confirmed COVID-19 treated in a makeshift hospital (Guanggu Fangcang Hospital, Wuhan, China) between Feb 21, 2020, and March 6, 2020. Additionally, the accuracy of AI was compared with a radiologist panel for the identification of lesion burden increase on pairs of CT scans obtained from 100 patients with COVID-19.
Findings
In the external validation set, using radiological reports as the reference standard, AI-aided triage achieved an area under the curve of 0·953 (95% CI 0·949–0·959), with a sensitivity of 0·923 (95% CI 0·914–0·932), specificity of 0·851 (0·842–0·860), a positive predictive value of 0·790 (0·777–0·803), and a negative predictive value of 0·948 (0·941–0·954). AI took a median of 0·55 min (IQR: 0·43–0·63) to flag a positive case, whereas radiologists took a median of 16·21 min (11·67–25·71) to draft a report and 23·06 min (15·67–39·20) to release a report. With regard to the identification of increases in lesion burden, AI achieved a sensitivity of 0·962 (95% CI 0·947–1·000) and a specificity of 0·875 (95 %CI 0·833–0·923). The agreement between AI and the radiologist panel was high (Cohen’s kappa coefficient 0·839, 95% CI 0·718–0·940).
Interpretation
A deep learning algorithm for triaging patients with suspected COVID-19 at fever clinics was developed and externally validated. Given its high accuracy across populations with varied COVID-19 prevalence, integration of this system into the standard clinical workflow could expedite identification of chest CT scans with imaging indications of COVID-19.
Funding
Special Project for Emergency of the Science and Technology Department of Hubei Province, China.
Introduction
Early identification of patients with COVID-19 has been recommended by WHO to control transmission and to prevent depletion of hospital resources.
,
,
However, not all countries have sufficient RT-PCR testing capacity. Additionally, due to technological constraints, even in developed countries, RT-PCR can take up to 3 days to provide a result.
Furthermore, studies have found that RT-PCR testing can produce false negative results,
which could result in patients with COVID-19 remaining unidentified in the community, enabling the epidemic to continue to spread despite aggressive interventions such as regional lockdowns.
Chest CT has been used to supplement RT-PCR testing of patients with suspected COVID-19.
,
,
,
In the guidance issued by the British Society of Thoracic Imaging, chest CT is used as a radiological decision tool, but is limited to seriously ill patients with suspected COVID-19, for whom chest x-ray results are uncertain or normal.
,
Evidence before this study
We searched Google Scholar for deep learning studies on the triage of patients with suspected COVID-19 on the basis of chest CT published between Dec 1, 2019, and March 22, 2020, using the search terms “COVID-19” OR “2019-nCoV” OR “Coronavirus Disease 2019” OR “Novel Coronavirus” AND “chest CT” AND “Triage” AND “Deep learning”. Our search yielded no studies that developed and validated deep learning algorithms to triage patients with suspected COVID-19. After removal of the search term “Triage”, we identified seven studies (one peer-reviewed publication and six preprint articles) that developed and validated deep learning algorithms for differential diagnosis associated with COVID-19 on the basis of assembled CT datasets that contained a small number of real-time PCR confirmed COVID-19 cases. Assembled datasets that combined COVID-19 cases with other types of pneumonia might not represent the distribution of COVID-19 in real-world settings. Few studies applied external validation to test the performance of algorithms and therefore could not rule out the possibility of model overfitting (ie, whereby an algorithm can perform well on patients from the same data source used for algorithm training, but poorly on data obtained from different sources).
Added value of this study
We developed a deep learning algorithm for triaging patients with suspected COVID-19 and analysing lesion burden of patients with confirmed COVID-19 on the basis of chest CT. We trained the algorithm on the largest available set of confirmed COVID-19 cases and validated the algorithm on multiple datasets, which indicated the algorithm was robust with high clinical efficacy. We compared two proposed AI triage pathways (scan-to-second-reader triage and scan-to-fever-clinician triage) with standard of care in a fever clinic, and showed that both workflows increased the efficiency of suspected case identification. We also considered the potential value of deep learning in COVID-19 clinical management across different health-care systems, which showed that the developed AI system might also assist radiologists to precisely assess how lesion burden changed over time on CT imaging since AI’s performance was satisfactory with 96·2% sensitivity and 87·5% specificity.
Implications of all the available evidence
The robust and satisfactory performance of our deep learning algorithm indicates its potential clinical use for screening patients with suspected COVID-19 in fever clinics and monitoring disease progression among patients with confirmed COVID-19. Shortening the time to diagnosis would enable earlier isolation and treatment of affected patients, which is crucial to curb the pandemic.
CT imaging is important for monitoring changes in disease burden.
,
According to the Chinese COVID-19 clinical guidance, patients with lung opacities on CT that increase by 50% within 24–48 h require immediate clinical intervention.
At fever clinics, CT reports are usually required within 1 h of the chest scan. At treatment hospitals, radiologists need to carefully compare opacities on CT scans across time to alert cases suspected of deterioration.
Time pressure, heavy workload, and a shortage of experienced radiologists resulted in challenges for imaging-based management of COVID-19.
breast cancer detection,
and cerebral haemorrhage triage.
To expedite chest CT-based triage in fever clinics, we aimed to develop a fully automated deep learning algorithm to flag suspected COVID-19 cases and analyse lesion burdens. By validating the algorithm on fever clinic cases across regions with variable COVID-19 prevalence, we aimed to assess the clinical value of the developed algorithm in real-world scenarios.
Methods
Study design
We did a retrospective diagnostic study, using CT images obtained from Tongji Hospital (Wuhan, China), and CT images and radiological reports obtained from three fever clinics (Tianyou Hospital [Wuhan, China], Xianning Central Hospital [Xianning, China], and The Second Xiangya Hospital [Changsha, China]). We also obtained unenhanced chest CT scans from patients with RT-PCR-confirmed COVID-19 treated in a single makeshift hospital (Guanggu Fangcang Hospital, Wuhan China) to validate the sensitivity of the algorithm.
to segment lung opacities on chest CT. Opacity segmentation could automatically analyse lung lesion volumes and alert positive CT scans to expedite patient triage. Full details of algorithm development are in the appendix (pp 1–3). The triage cutoff threshold was determined on the basis of the receiver operating characteristic curve of an internal validation set. The accuracy and efficiency of AI triage was then assessed on an external validation dataset. Figure 1A shows the research pipeline of the study.
CT image datasets and algorithm development
Positive cases in the development set were annotated by radiologists (appendix p 4). 105 cases were excluded due to difficulty with annotation. After data annotation, the development set was randomly split into a training set (1318 patients with COVID-19; 640 patients without COVID-19) and a testing set (329 patients with COVID-19; 160 patients without COVID-19) with a ratio of 8:2.
Full details of data inclusion and exclusion criteria and data partition are shown in figure 2A.
Data collection for external validation of triage performance
Triage performance was assessed for accuracy and efficiency; the original radiological reports were used as the reference standard. Two radiologists who were independent of those who wrote or approved the original reports, classified cases into four categories: 1, clear mention of suspected COVID-19 in the radiological impression section; 2, ambiguous radiological impression description, but presence of COVID-19 imaging features in the radiological findings section; 3, ambiguous radiological impression description but absence of COVID-19 imaging features in the radiological findings section; 4, negative radiological findings. The first radiologist (HQ) rated all reports and marked cases for which categorisation was unclear. The second radiologist (ZD) subsequently reviewed the unclear cases and made final decisions. The two radiologists were masked to the results of AI-aided triage. Cases with scores of 1 or 2 were categorised as COVID-19-positive, and cases with scores of 3 or 4 were defined as COVID-19-negative.
Figure 1B shows a typical workflow in a Chinese fever clinic and scan-to-second-reader triage and scan-to-fever-clinician triage were proposed to expedite clinical workflow. To measure AI triage time, we recorded the time AI took to flag each true positive case. To measure the triage time of standard of care, we calculated the time intervals between CT exam completion and initial draft report and between CT exam completion and senior radiologists’ report approval based on timestamps recorded by the Radiology Information System of each hospital or fever clinic.
to test the specificity of the algorithm on non-COVID-19 cases, we included 686 scans from 651 patients who visited Tianyou Hospital or The Third People’s Hospital of Shenzhen for respiratory diseases between Oct 1, 2019, and Oct 31, 2019, before the COVID-19 outbreak (405 scans from 385 patients from Tianyou Hospital; 281 scans from 266 patients from The Third People’s Hospital of Shenzhen). 652 (95%) of 686 chest CT scans had positive findings such as ground glass opacities, pulmonary fibrosis, consolidations, inter_stitial thickening, pleural effusion, emphysema, and nodules or masses.
Data collection for assessment of change in lesion burden
The developed model could automatically calculate lung lesion burden volumes, thus, we collected pairs of CT scans from the same patients with COVID-19 to assess the accuracy of AI for the identification of lesion burden increase in comparison to radiologists. Since radiologists are at a disadvantage compared with AI when assessing lung lesion burden volumes (in cm3 or percentages) using the naked eye, we developed a qualitative task in which radiologists judged whether a new scan showed an increase in lung lesion burden volume when compared with a previous scan. This task was also more clinically relevant than estimating the lung lesion burden volume in cm3 or percentages because in real-world settings, radiologists are responsible for reporting disease progression, such as increases in lesion volume or size. We consecutively included 100 patients with RT-PCR-confirmed COVID-19 who were admitted to Tongji Hospital between Dec 26, 2019, and Jan 31, 2020 (no cases overlapped with algorithm development or internal validation) and had undergone at least two CT scans. For patients with more than two scans, the first two scans were selected. A panel of three radiologists (CW and others) served as the reference standard. Each radiologist independently classified each second scan as either increase (increase in lesion burden volume) or no increase (no change or decrease in lesion burden volume) and marked cases that were difficult to categorise. Consensus was reached through majority vote among the three radiologists. For the AI algorithm, cases with lesion burden volumes segmented in the second scan that were larger than that in the first scan were categorised as increase, and cases with no changes were categorised as no increase.
Statistical analysis
We used Cohen’s kappa coefficient to assess the agreement between AI and the radiologist panel with regard to increase in lesion burden. The interrater agreement of the radiologist panel was calculated using Fleiss’ kappa.
Continuous variables were reported as median and IQR. Categorical variables were reported as frequencies and percentages. A two-sided p value of less than 0·05 was considered statistically significant. All statistical analyses were done using R (version 3·6.2).
Role of the funding source
The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Results
Table 1Characteristics of the Tongji dataset used for algorithm development and internal validation
Data are n (%), or median (IQR).
Table 2Characteristics of the external validation dataset and accuracy of AI-aided triage
Data are n (%), median (IQR), or n/N (n%), unless otherwise stated. AUC=area under the receiver operating curve.
Table 3Triage efficiency for the external validation set
Table 4Performance of AI and radiologists for the identification of changes in lesion burden between two CT scans
Data are n, unless stated otherwise. 52 patients had an increase in lesion burden volume and were defined as positive. 48 patients did not have any increase in lesion burden volume and were defined as negative. We presented the complete information to show interrater variability. AI=artificial intelligence.
Discussion
To the best of our knowledge, this study was the first to develop and validate an AI algorithm for triaging suspected COVID-19 cases on the basis of chest CT in fever clinics. A large sample of chest CT scans from RT-PCR-confirmed COVID-19 cases were obtained to develop the deep learning algorithm and consecutive cases were collected from regions of varying COVID-19 prevalence to assess the accuracy and efficiency of AI triage, using radiological reports as the reference standard.
The 2-week imaging workload was higher in high-prevalence regions (Tianyou Hospital and Xianning Central Hospital) than in low-prevalence regions (The Second Xiangya Hospital). Additionally, the requirement for radiological responsiveness inside the epidemic centre was higher than that outside the epidemic centre. Regarding the accuracy of AI-aided triage, the overall sensitivity and specificity were above the performance targets of 90% and 80%. The general pattern across patient populations showed that COVID-19 prevalence could influence AI’s performance, with highest performance in the Tianyou Hospital dataset (high prevalence) and the lowest performance in The Second Xiangya Hospital dataset (low prevalence). One explanation could be that the algorithm was trained entirely on a dataset collected in Wuhan (a high-prevalence region) and testing the algorithm on an external population with low disease prevalence could decrease its performance. Considering the efficiency of AI-guided triage, the proposed scan-to-second-reader triage and scan-to-fever-clinician triage workflows reduced time to triage compared with standard clinical workflow across different fever clinics. Although the accuracy of AI-aided triage was lower in the Second Xiangya Hospital dataset than the Tianyou Hospital and Xianning Central Hospital datasets, of the ten patients with RT-PCR-confirmed COVID-19 who had positive CT scans, AI successfully flagged all ten cases and shortened the time from scan-to-fever-clinician. Moreover, considering that AI-aided triage will be of greater importance in medical contexts where workload is high and medical resources are scarce, the guarantee of reliable performance in populations with high disease prevalence is important.
Since RT-PCR can produce false negatives,
negative results could not rule out virus infection. Therefore, selection bias and potential false negatives among these patients with negative RT-PCR test results could have compromised the specificity of AI-aided triage.
To further validate the performance of AI-aided triage in patients with and without COVID-19, we collected CT scans from patients with RT-PCR-confirmed COVID-19 who were asymptomatic or had mild symptoms who were admitted to a fangcang hospital in Wuhan and CT scans from patients with various respiratory diseases, who were admitted to Tianyou Hopsital or The Third People’s Hospital of Shenzhen before the COVID-19 outbreak. AI-aided triage was found to be reliable in these patients, which substantiates its efficacy in assisting COVID-19 identification.
In these scenarios, when chest CT is used as a surrogate tool to identify suspected COVID-19 cases, AI-aided triage could facilitate timely isolation of patients with suspected COVID-19 and alleviate pressure on medical staff, especially in regions with high disease prevalence. Additionally, in countries where RT-PCR testing is available with timely results, AI-aided triage might help to notify incidental findings. According to the report of a US doctor on March 25, 2020, patients who visited the emergency department for reasons other than COVID-19, such as a traffic accident, were found to have SARS-CoV-2 infection.
In this scenario, AI could notify incidental COVID-19 findings on CT and alert medical staff of timely nosocomial infection prevention. Lung lesion burden assessment could also potentially be automated by AI to inform therapeutic management.
,
,
have applied deep learning to differentiate COVID-19 from other chest diseases including influenza A and community-acquired pneumonia, and in one previous study
an algorithm was developed to segment and quantify COVID-19 opacities on chest CT. However, some of these studies were not validated externally or were tested on non-consecutively collected clinical cases,
,
whereas others did not specify the clinical context in which their algorithms could be applied, or focused on a narrow set of differential diagnoses, which would not cover the full disease spectrum in real-world clinical contexts.
,
The current study has several limitations. First, the algorithm was trained on data from Tongji Hospital only, which could compromise the robustness of the algorithm, as indicated by the results for the Second Xiangya Hospital dataset. Future studies could use multiple data sources to train models to improve generalisability. Second, we adopted the U-Net model structure to assess the feasibility of AI-aided triage for COVID-19. More methodologically rigorous algorithms could be developed to improve case classification. Third, the comparison of efficiency between AI-aided triage and standard of care with regard to time taken to triage was estimated in an ideal scenario where clinicians would respond instantly to AI notifications. However, in real-world clinical settings, this might not be realistic. Therefore, a prospective randomised control trial is needed to more accurately estimate the reduction in time to triage gained from AI. Fourth, since the main purpose of the study was to develop an AI algorithm for chest CT triage, clinical and laboratory information, with the exception of radiological findings and RT-PCR results, were not collected. Fifth, we did not directly compare the accuracy of AI for lesion burden analysis with individual radiologists. Future studies could systematically compare the accuracy of quantitative and qualitative lesion burden analysis of AI and radiologists.
The current study showed the efficacy of a deep learning algorithm for the triage of patients with suspected COVID-19 in fever clinics. The integration of AI into the standard clinical workflow has the potential to relieve burden on clinicians and expedite the isolation of suspected cases and disease control.
Contributors
WW, SW, ZL, and LX conceived and designed the study. WW, ZL, and LX managed the study. MW, CX, LH, SX, CQ, JW, GD, JL, XC, and CZ collected the data. KC, PY, and RZ developed and validated the algorithm. MW, CQ, SX, LH, HZ, CW, and TZ collected, anonymised, and prepared CT scans and reports from LX, JW, and GD, respectively. LX arranged for radiologists to rate the CT scans and reports. YC and RZ did the statistical analysis. CX, WW, MW, and ZL wrote the initial draft. All authors critically reviewed the report, and all read and approved the final version.
Declaration of interests
SW, RZ, CX, YC, and PY, and KC are employees of Beijing Infervision Technology. All other authors declare no competing interests.
Data sharing
Acknowledgments
We thank Xiaoxiang Zhang and Jin Li (Department of Computer Centre, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China) for providing support with the data collection. We thank Qiufeng Huang, Dayong Zhang, Tian Tian, and Wei Cai for their assistance with validation dataset annotation for lesion burden analysis. We thank Ben Yarbrough for proofreading and providing language suggestions for this manuscript.
Supplementary Materials
References
- 1.
Coronavirus disease (COVID-19) situation report–181.
- 2.
WHO Director-General’s opening remarks at the media briefing on COVID-19 −16 March 2020.
- 3.
Operational considerations for case management of COVID-19 in health facility and community.
- 4.
Laboratory testing for 2-19 novel coronavirus (2019-nCoV) in suspected human cases. Interim guidance.
- 5.
Chinese clinical guidance for COVID-19 pneumonia diagnosis and treatment (7th edition).
- 6.
South Korea pioneers coronavirus drive-through testing station.
- 7.
Variation in false-negative rate of reverse transcriptase polymerase chain reaction-based SARS-CoV-2 tests by time since exposure.
Ann Intern Med. 2020; ()
- 8.
Report of the WHO–China Joint Mission on Coronavirus Disease 2019 (COVID-19).
- 9.
Essentials for radiologists on COVID-19: an update—radiology scientific expert panel.
Radiology. 2020; 296: e113-e114
- 10.
Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study.
Lancet Infect Dis. 2020; 20: 425-434
- 11.
Correlation of chest CT and RT-PCR testing in Coronavirus Disease 2019 (COVID-19) in China: a report of 1014 cases.
Radiology. 2020; 296: e32-e40
- 12.
Coronavirus Disease 2019 (COVID-19): a perspective from China.
Radiology. 2020; 296: e15-e25
- 13.
Assessing risk factors for SARS-CoV-2 infection in patients presenting with symptoms in Shanghai, China: a multicentre, observational cohort study.
Lancet Digital Health. 2020; 2: e323-e330
- 14.
Radiology decision tool for suspected COVID-19.
- 15.
The role of CT in case ascertainment and management of COVID-19 pneumonia in the UK: insights from high-incidence regions.
Lancet Respir Med. 2020; 8: 438-440
- 16.
Chinese clinical guidance for COVID-19 pneumonia diagnosis and treatment (5th edition).
- 17.
Temporal changes of CT findings in 90 patients with COVID-19 pneumonia: a longitudinal study.
Radiology. 2020; 296: e55-e64
- 18.
Rapid AI development cycle for the coronavirus (COVID-19) pandemic: initial results for automated detection & patient monitoring using deep learning CT image analysis.
arXiv. 2020; ()
- 19.
Radiologists’ variation of time to read across different procedure types.
J Digit Imaging. 2017; 30: 86-94
- 20.
Dermatologist-level classification of skin cancer with deep neural networks.
Nature. 2017; 542: 115-118
- 21.
International evaluation of an AI system for breast cancer screening.
Nature. 2020; 577: 89-94
- 22.
Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study.
Lancet. 2018; 392: 2388-2396
- 23.
U-Net: convolutional networks for biomedical image segmentation.
MICCAI. 2015; 9351: 234-241
- 24.
Use of chest imaging in COVID-19.
- 25.
13 deaths in a day: an ‘apocalyptic’ coronavirus surge at an NYC hospital. New York Times.
- 26.
Artificial Intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT.
Radiology. 2020; ()
- 27.
Clinically applicable AI system for accurate diagnosis, quantitative measurements and prognosis of COVID-19 pneumonia using computed tomography.
Cell. 2020; 181: 1423-1433
- 28.
Lung infection quantification of COVID-19 in CT images with deep learning.
arXiv. 2020; ()
Article Info
Publication History
Identification
Copyright
© 2020 The Author(s). Published by Elsevier Ltd.
User License
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |
Permitted
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works
Not Permitted
- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article