AI for Breast Cancer Screening and Detection — The Future is Already Here
Recent WHO analyses for global breast cancer incidence estimate 2.26 million new cases and 685,000 deaths from breast cancer in 2020 (WHO, 2020). Clinical breast exams and, where possible, screenings by mammograms and ultrasounds, are the first line of defense in catching primary breast cancer early. But screening is an imperfect endeavor. The efficacy of mammogram screening is tempered by reader variability, missed diagnoses in 20% of cases, high false positives, false negatives in women with dense breasts and lag time for recalls. Furthermore, in digital mammogram screenings, detection of a suspicious lesion has poor positive predictive value meaning that biopsies of those lesions are often negative (Tagliafico et al, 2020). These factors combined with the tremendous disease burden of breast cancer underscore the need to improve screening and diagnosis. Artificial intelligence applications have the potential, and indeed, some are already in clinical use, to improve the efficacy of medical imaging in the screening and diagnosis of primary breast cancer.
The application of AI to breast mammography and imaging is not new. The first commercial computer-assisted detection or diagnosis (CAD) solution was approved by the FDA in 1998. These technologies became widespread in the U.S. (as compared to Europe) due to insurance reimbursement in the U.S. Unfortunately, these technologies also had high false positive rates. In 2015, a definitive study concluded that CAD-assisted readings did not provide any improvement in accuracy compared to readings without the use of CAD. What then has changed in the intervening years since 1998, and why does the application of AI to breast cancer detection look more promising now (Bennani-Baiti & Baltzer, 2020)?
In the past decade, the application of deep learning and transfer learning to breast imaging algorithms has improved the quality of the algorithms. Computing power has increased the quantity of data that can be analyzed and, crucially, enabled a greater number of hidden layers, or depth of the learning network, in the algorithms. Access to breast image databases has improved training data, although larger and more diverse training sets are still desirable (Bennani-Baiti & Baltzer, 2020).
The application of AI to medical imaging relies on what is known as radiomics, defined as the “high throughput extraction and analysis of quantitative features from imaging data. Radiomic features provide information on the gray-scale patterns, inter-pixel relationships, as well as shape and spectral properties of radiological images.” Those features can then be used to develop AI applications. Radiomics, and thus the application of AI to breast cancer screening and diagnosis, is based on the idea that the extracted features from images reflect genetic and molecular mechanisms that, in turn, are linked to tissue phenotypes (Lee et al, 2020).
Currently, the research on AI is being applied to several imaging modalities to enhance the characterization of breast tissue (normal vs. abnormal) and the detection of cancer (benign vs. malignant). Most research and commercial application focuses on 2D and 3D mammography, since mammograms remain the gold standard for screening in older women. But researchers are also exploring the ability of AI to improve the performance of ultrasound, thermography and MR imaging.
Digital Mammography and Digital Breast Tomosynthesis
In mammography, machine learning, and, in particular, convolutional neural networks (CNNs), is being used to aid radiologists in cancer screenings. There are four main applications of AI to mammography for screening and cancer detection: classification of images or patches as normal or abnormal; characterization of findings (masses or diffuse or clustered microcalcifications) as benign or malignant; localization of masses; and characterization of breast density as an indicator of breast cancer risk. Applications either perform a single-stage classification as normal/benign/malignant or a two-stage classification as abnormal/normal and then benign/malignant (Zhang et al, 2020; Agnes et al, 2019; Wong et al, 2020; Fanizzi et al, 2020).
As opposed to conventional CAD technologies, the deep learning CNNs being applied to 2D digital mammography (DM) and 3D digital breast tomosynthesis (DBT) mammograms learn what features in the images are indicative of lesions through labeled training data. The algorithms are not “told” by developers what to look for. Rather, they learn to distinguish normal tissue from abnormal markings. However, because of the often limited size of training datasets, two algorithms are frequently used — one to detect calcifications and one to detect soft tissue lesions — with results combined in the final analysis. In both cases, the algorithms need to identify areas of suspicion, which goes beyond just image classification. Training characterization algorithms, though, requires annotated images, which are difficult to come by given the time-consuming nature of annotation (Sechopoulos et al, 2020).
As a result, algorithms frequently are not trained solely on mammographic images. Rather, transfer learning is used to address the often relatively small datasets (Arora et al, 2020; Chougrad et al, 2018; Samala et al, 2020). This approach has yielded promising algorithms. In a similar approach, a CNN based on DM images can be fine-tuned using transfer learning to develop a CNN for DBT mammogram images, which is particularly important given that less image data exists for DBT than for DM (Sechopoulos et al, 2020). Because lesions take up only a fraction of the space in a mammogram, pooling becomes necessary to hone in on the areas most likely to be malignant (Shu et al, 2020).
DBT screenings generally improve cancer detection and recall rates compared to DM screenings (Lowry et al, 2020). However, readings take about twice as long to read as 2D images, which has hampered DBT adoption in large-scale screening programs (Sechopoulos et al, 2020). Thus, commercial AI solutions, such as FDA-approved Profound AI for DBT and Genius AI, aim to increase the efficiency of radiologists while enhancing reader performance.
An analysis of 3 commercially-available AI algorithms for the diagnosis of breast cancer from screening mammograms determined that one algorithm performed sufficiently to merit further clinical study and assessment as an independent reader. Combining first reader radiologists with the best-performing algorithm outperformed a first- and second-reader radiologist combo in identifying positive cases (Salim et al, 2020). Studies have shown that these technologies, though, when used as stand-alone tools are not quite on par with human readers (Sasaki et al, 2020). In fact commercial solutions. such as Transpara, are marketed as clinical decision support tools, not as stand-alone screeners or diagnostics. Another analysis of published algorithms for mammogram classification shows that generalizability remains an issue. High-performing algorithms saw performance drop significantly on external datasets, underscoring the need to perform further assessment before adopting algorithms into clinical practice (Wang et al, 2020). As independent clinical studies of AI-based mammogram solutions become available, the performance across sites and populations will become clearer. CNNs do appear quite promising, but further investigations are needed before these technologies are likely to be incorporated into routine mammography (Wong et al, 2020).
Anecdotally, the bulk of research on AI-based screening seems to focus on mammograms, yet important research is emerging on other imaging modalities. MRI is not a typical screening modality, but dynamic contrast-enhanced MRI screening is recommended for women at high risk for breast cancer or who have very dense breasts or implants. Abbreviated MRI protocols are improving the cost-effectiveness and efficiency of MRI screenings, which will likely lead to an increased use of breast MRI in the future. Increased use means increased burden on radiologists and increased costs of interpretations. AI solutions that can efficiently and accurately detect and diagnose breast lesions will become critical to managing any increases in costs and workload. In one study of such a solution, the AI system used as a stand-alone tool showed better performance than human readers alone. When used as an aid, the AI system significantly improved the performance of human readers. The pilot study did result in some missed diagnoses and misdiagnoses, indicating a need to expand the variety of training data to address those errors (Adachi et al, 2020).
Breast ultrasound, while not typically used for initial breast screenings, is sometimes used in detection for particular reasons. It might be used in pregnant women or in women with very dense tissue or to examine a finding in a clinical breast exam. With ultrasound, the same issues with mammograms come into play — reader variability, sensitivity and specificity. A study of the commercial technology Koios DS for Breast showed that this AI-based decision support for breast ultrasound imaging improves the accuracy of readers’ breast lesion assessments while reducing variability among readers (Mango et al, 2020).
Thermography is a lesser-used tool in breast cancer screening and for good reason. Although the FDA has approved the use of thermography in conjunction with another screening or diagnostic tool, it has warned that there is “no valid scientific data to demonstrate that thermography devices, when used on their own or with another diagnostic test, are an effective screening tool for any medical condition including the early detection of breast cancer” (FDA, 2019).
However, AI is potentially changing the utility of infrared thermography and renewing interest in this modality for the screening and diagnosis of breast cancer. In the past, one of the challenges with thermography was that the manual interpretation of images was subject to variability and error. Previous studies have demonstrated that infrared thermography can be useful in detecting the increased vascularization, vasodilation and heat production caused by non-palpable breast cancer. Researchers argue that thermography, unlike mammography, does not pressurize the breast tissue, thereby avoiding potential rupture of encapsulated tumors. It also does not involve exposure to ionizing radiation (Yousefi et al, 2020).
One recent study evaluated the Thermalytix technology, which applies deep learning models to detect thermal heterogeneity, and thus potential malignancy, in normal vs. symptomatic breast tissues. The technology generates a quantitative risk score based on features extracted from the images, and it annotates the thermal hot spots and the vascularity around those hot spots. In this observational study, the model achieved an overall sensitivity of 91% and a specificity of 82.4%. It was also able to detect T1 lesions ≤ 2 cm (Kakileti et al, 2020). While it may be difficult to envision thermography becoming a mainstream screening tool, it offers two significant advantages: it is low-cost and portable. If AI can prove its worth in thermography, then it may well become a useful screening tool in resource-constrained settings.
The Future of AI in Breast Screening
Whether AI-based readings will someday replace radiologists is as much an ethical and legal question as it is a medical and technological one. It is also a question generated more by AI hype than by experts familiar with the technological and medical aspects of radiological imaging. To date, there are no algorithms that can outperform double readings by two breast radiologists (Bennani-Baiti & Baltzer, 2020). Most medical experts and ethicists would argue for human-in-the-loop solutions anyway.
Even if AI does not completely take over the task of reading scans or images, there are several areas of potential utility in breast cancer screening: replacing the second reader (second readers are more common in Europe than in the U.S.); sifting out only the abnormal exams for human review and thereby reducing the workload of an already short-staffed specialty; measuring breast density; and improving diagnostic accuracy. These last two are the focus of current commercial solutions. AI may also enable automatic measurement and mapping of lesions. A fruitful area of AI development would be the integration of data from multiple imaging modalities, and, ideally, from past mammograms as well, to improve the positive predictive value of screening. Perhaps even more exciting is the possibility that radiological screening may one day be combined with clinical and biomarker data not only to screen but also to assess risk more accurately (Bennani-Baiti & Baltzer, 2020; Chiwome et al, 2020).
For the time being, several relevant questions remain. Stand-alone, first reader plus AI, AI-assisted first reader or AI pre-selection? Should AI interpretations be made known to radiologists before or after their own interpretations? Will AI-indicated markings prevent identification of other lesions? Stand-alone AI breast cancer screening seems unlikely in the near term, unless algorithms can one day analyze the imaging data and integrate the indication, the clinical history and the clinical findings with that image data, as radiologists currently do. They would also need to screen for more common lesions as well as for rare ones (Bennani-Baiti & Baltzer, 2020). One thing seems certain though: AI-assisted breast cancer screening is here to stay. Whether it will have widespread adoption and demonstrate the intended benefits across clinical settings and diverse populations will be determined in the next decade.