DrABC: deep learning accurately predicts germline pathogenic mutation status in breast cancer patients based on phenotype data.
BACKGROUND:Identifying breast cancer patients with DNA repair pathway-related germline pathogenic variants (GPVs) is important for effectively employing systemic treatment strategies and risk-reducing interventions. However, current criteria and risk prediction models for prioritizing genetic testing among breast cancer patients do not meet the demands of clinical practice due to insufficient accuracy. METHODS:The study population comprised 3041 breast cancer patients enrolled from seven hospitals between October 2017 and 11 August 2019, who underwent germline genetic testing of 50 cancer predisposition genes (CPGs). Associations among GPVs in different CPGs and endophenotypes were evaluated using a case-control analysis. A phenotype-based GPV risk prediction model named DNA-repair Associated Breast Cancer (DrABC) was developed based on hierarchical neural network architecture and validated in an independent multicenter cohort. The predictive performance of DrABC was compared with currently used models including BRCAPRO, BOADICEA, Myriad, PENN II, and the NCCN criteria. RESULTS:In total, 332 (11.3%) patients harbored GPVs in CPGs, including 134 (4.6%) in BRCA2, 131 (4.5%) in BRCA1, 33 (1.1%) in PALB2, and 37 (1.3%) in other CPGs. GPVs in CPGs were associated with distinct endophenotypes including the age at diagnosis, cancer history, family cancer history, and pathological characteristics. We developed a DrABC model to predict the risk of GPV carrier status in BRCA1/2 and other important CPGs. In predicting GPVs in BRCA1/2, the performance of DrABC (AUC = 0.79 [95% CI, 0.74-0.85], sensitivity = 82.1%, specificity = 63.1% in the independent validation cohort) was better than that of previous models (AUC range = 0.57-0.70). In predicting GPVs in any CPG, DrABC (AUC = 0.74 [95% CI, 0.69-0.79], sensitivity = 83.8%, specificity = 51.3% in the independent validation cohort) was also superior to previous models in their current versions (AUC range = 0.55-0.65). After training these previous models with the Chinese-specific dataset, DrABC still outperformed all other methods except for BOADICEA, which was the only previous model with the inclusion of pathological features. The DrABC model also showed higher sensitivity and specificity than the NCCN criteria in the multi-center validation cohort (83.8% and 51.3% vs. 78.8% and 31.2%, respectively, in predicting GPVs in any CPG). The DrABC model implementation is available online at http://gifts.bio-data.cn/ . CONCLUSIONS:By considering the distinct endophenotypes associated with different CPGs in breast cancer patients, a phenotype-driven prediction model based on hierarchical neural network architecture was created for identification of hereditary breast cancer. The model achieved superior performance in identifying GPV carriers among Chinese breast cancer patients.