News
Article
Author(s):
A breast-level artificial intelligence score may be able to estimate the risk of future breast cancer, allowing for patients to take preventative measures.
Early screening when detecting breast cancer is significant to improve morbidity and mortality, but screening methods are not always accurate. Artificial intelligence (AI) systems and algorithms have been developed to help mark potential areas of concern and provide scores to detect cancer risk or malignancy. Investigators of research published in JAMA Network Open combined consecutive AI scores with long-term cancer outcomes to examine whether a regulatory-cleared, commercial AI algorithm for breast cancer detection could estimate the development of future breast cancers diagnosed on subsequent screening rounds.
For this population-based, retrospective cohort study, data were gathered from BreastScreen Norway, a national screening for breast cancer that offers 680,000 Norwegian women aged 50 to 69 years 2-view digital mammography screening every 2 years (24 ± 6 months). Radiologist assessments and breast cancer outcomes are prospectively recorded for all screening examinations. According to the database, from 2017 to 2021, the screening attendance rate was approximately 76%, recall rate was 3.3%, and the screening-detected cancer and interval cancer rates were 6.2 per 1000 and 1.8 per 1000 screening examinations, respectively.
For the assessments, scores were rated on a 1 to 5 scale: 1 meant normal findings, 2 meant probable benign, 3 meant an intermediate suspicion, 4 meant probably malignant neoplasm, and 5 meant a high suspicion of malignant neoplasm. If a score of 2 or higher was given, experts reviewed and discussed whether the patient should be recalled for diagnostic evaluation.
The AI algorithm, INSIGHT MMG, was applied to all mammogram screenings and provided a continuous cancer detection score for each examination ranging from 0 to 100. Increasing values indicated a higher likelihood of cancer being present on the current mammogram.
A total of 246,472 women underwent 671,828 screening examinations during the study period, and the study sample included women who did not have a history of breast cancer, and had at least 3 consecutive biennial screening rounds, where at least the first 2 rounds were not associated with a breast cancer diagnosis. The following examinations were excluded: those among women with fewer than 3 screening examinations (n = 194,525); those performed before and after the 3 consecutive examinations included in the study (n = 98,410); those without AI scores (n = 23,868); those performed after breast cancer diagnosis (n = 2597); those among women who reported a palpable lump at screening (n = 2259); technically inadequate examinations (n = 605); those among women with breast cancer detected more than 24 months following the third consecutive screening (n = 70); and those where the cancer was located in the axilla (n = 9).
The results included data from 116,495 women who underwent at least 3 consecutive screening rounds, with a total of 1265 screening-detected cancers and 342 interval cancers. The mean patient ages were 58.5 years for women with screening-detected cancer, 57.4 for women with interval cancer, and 56.4 for women without breast cancer.
The mean AI scores for the breast developing screening-detected cancer were 19.2 at the first study round, 30.8 at the second, and 82.7 at the third. In comparison, for the breast not developing breast cancer, the mean AI scores were 9.5, 8.2, and 5.0 at the first, second, and third study rounds, respectively. For women with interval cancers, the mean AI scores were 17.8, 20.1, and 33.1 at the first, second, and third study rounds, respectively, and mean AI scores were 10.5, 10.1, and 8.4 at the first, second, and third study rounds, respectively, for contralateral breast not developing interval breast cancer. Further, women who did not have a breast cancer diagnosis during the study period had mean AI scores of 7.1, 6.7, and 6.4 at the first, second, and third study rounds, respectively.
For women who developed screening-detected cancer, mean absolute differences in AI score between breasts were 21.3 at the first study round, 30.7 at the second, and 79.0 at the third. For those with interval scores, scores were 19.7 (first round), 21.0 (second), and 34.0 (third), and those who did not develop breast cancer, scores were 9.9 (first), 9.6 (second), and third (9.3).
Further, with the examination-level AI score, the area under the receiver operating characteristic curves (AUCs) for discriminating between women who developed screening-detected cancer and women without cancer were 0.64 (95% CI, 0.62-0.65) at the first study round, 0.73 (95% CI, 0.71-0.74) at the second study round, and 0.97 (95% CI, 0.96-0.97) at the third study round. The AUCs for interval cancers compared with those who had no cancer increased from 0.66 to 0.78 across the 3 study rounds, and the AUCs for all cancers combined compared with no cancer increased from 0.64 to 0.93 across the 3 study rounds. In addition, AUCs for the absolute difference were 0.63 (95% CI, 0.61-0.65) at the first study round, 0.72 (95% CI, 0.71-0.74) at the second round, and 0.96 (95% CI, 0.95-0.96) at the third study for screening-detected cancer, and 0.64 (95% CI, 0.61-0.67) at the first, 0.65 (95% CI, 0.62-0.68) at the second, and 0.77 (95% CI, 0.74-0.79) at the third for interval cancers.
Study limitations include the study’s design being based on retrospective data, only 1 AI system was evaluated for cancer detection, and the study population primarily consisted of White women. Additionally, the authors suggested that future research should evaluate the predictive accuracy of future cancer risk using other AI cancer detection tools among a more diverse population of patients.