Translating promise into practice: a review of machine learning in suicide research and prevention.
Kirtley Olivia J,van Mens Kasper,Hoogendoorn Mark,Kapur Navneet,de Beurs Derek
The lancet. Psychiatry
In ever more pressured health-care systems, technological solutions offering scalability of care and better resource targeting are appealing. Research on machine learning as a technique for identifying individuals at risk of suicidal ideation, suicide attempts, and death has grown rapidly. This research often places great emphasis on the promise of machine learning for preventing suicide, but overlooks the practical, clinical implementation issues that might preclude delivering on such a promise. In this Review, we synthesise the broad empirical and review literature on electronic health record-based machine learning in suicide research, and focus on matters of crucial importance for implementation of machine learning in clinical practice. The challenge of preventing statistically rare outcomes is well known; progress requires tackling data quality, transparency, and ethical issues. In the future, machine learning models might be explored as methods to enable targeting of interventions to specific individuals depending upon their level of need-ie, for precision medicine. Primarily, however, the promise of machine learning for suicide prevention is limited by the scarcity of high-quality scalable interventions available to individuals identified by machine learning as being at risk of suicide.
Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data.
Gould Michael K,Huang Brian Z,Tammemagi Martin C,Kinar Yaron,Shiff Ron
American journal of respiratory and critical care medicine
Most lung cancers are diagnosed at an advanced stage. Presymptomatic identification of high-risk individuals can prompt earlier intervention and improve long-term outcomes. To develop a model to predict a future diagnosis of lung cancer on the basis of routine clinical and laboratory data by using machine learning. We assembled data from 6,505 case patients with non-small cell lung cancer (NSCLC) and 189,597 contemporaneous control subjects and compared the accuracy of a novel machine learning model with a modified version of the well-validated 2012 Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial risk model (mPLCOm2012), by using the area under the receiver operating characteristic curve (AUC), sensitivity, and diagnostic odds ratio (OR) as measures of model performance. Among ever-smokers in the test set, a machine learning model was more accurate than the mPLCOm2012 for identifying NSCLC 9-12 months before clinical diagnosis ( < 0.00001) and demonstrated an AUC of 0.86, a diagnostic OR of 12.3, and a sensitivity of 40.1% at a predefined specificity of 95%. In comparison, the mPLCOm2012 demonstrated an AUC of 0.79, an OR of 7.4, and a sensitivity of 27.9% at the same specificity. The machine learning model was more accurate than standard eligibility criteria for lung cancer screening and more accurate than the mPLCOm2012 when applied to a screening-eligible population. Influential model variables included known risk factors and novel predictors such as white blood cell and platelet counts. A machine learning model was more accurate for early diagnosis of NSCLC than either standard eligibility criteria for screening or the mPLCOm2012, demonstrating the potential to help prevent lung cancer deaths through early detection.
DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data.
Multi-omics data are good resources for prognosis and survival prediction; however, these are difficult to integrate computationally. We introduce DeepProg, a novel ensemble framework of deep-learning and machine-learning approaches that robustly predicts patient survival subtypes using multi-omics data. It identifies two optimal survival subtypes in most cancers and yields significantly better risk-stratification than other multi-omics integration methods. DeepProg is highly predictive, exemplified by two liver cancer (C-index 0.73-0.80) and five breast cancer datasets (C-index 0.68-0.73). Pan-cancer analysis associates common genomic signatures in poor survival subtypes with extracellular matrix modeling, immune deregulation, and mitosis processes. DeepProg is freely available at https://github.com/lanagarmire/DeepProg.
Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction.
Khera Rohan,Haimovich Julian,Hurley Nathan C,McNamara Robert,Spertus John A,Desai Nihar,Rumsfeld John S,Masoudi Frederick A,Huang Chenxi,Normand Sharon-Lise,Mortazavi Bobak J,Krumholz Harlan M
Importance:Accurate prediction of adverse outcomes after acute myocardial infarction (AMI) can guide the triage of care services and shared decision-making, and novel methods hold promise for using existing data to generate additional insights. Objective:To evaluate whether contemporary machine learning methods can facilitate risk prediction by including a larger number of variables and identifying complex relationships between predictors and outcomes. Design, Setting, and Participants:This cohort study used the American College of Cardiology Chest Pain-MI Registry to identify all AMI hospitalizations between January 1, 2011, and December 31, 2016. Data analysis was performed from February 1, 2018, to October 22, 2020. Main Outcomes and Measures:Three machine learning models were developed and validated to predict in-hospital mortality based on patient comorbidities, medical history, presentation characteristics, and initial laboratory values. Models were developed based on extreme gradient descent boosting (XGBoost, an interpretable model), a neural network, and a meta-classifier model. Their accuracy was compared against the current standard developed using a logistic regression model in a validation sample. Results:A total of 755 402 patients (mean [SD] age, 65  years; 495 202 [65.5%] male) were identified during the study period. In independent validation, 2 machine learning models, gradient descent boosting and meta-classifier (combination including inputs from gradient descent boosting and a neural network), marginally improved discrimination compared with logistic regression (C statistic, 0.90 for best performing machine learning model vs 0.89 for logistic regression). Nearly perfect calibration in independent validation data was found in the XGBoost (slope of predicted to observed events, 1.01; 95% CI, 0.99-1.04) and the meta-classifier model (slope of predicted-to-observed events, 1.01; 95% CI, 0.99-1.02), with more precise classification across the risk spectrum. The XGBoost model reclassified 32 393 of 121 839 individuals (27%) and the meta-classifier model reclassified 30 836 of 121 839 individuals (25%) deemed at moderate to high risk for death in logistic regression as low risk, which were more consistent with the observed event rates. Conclusions and Relevance:In this cohort study using a large national registry, none of the tested machine learning models were associated with substantive improvement in the discrimination of in-hospital mortality after AMI, limiting their clinical utility. However, compared with logistic regression, XGBoost and meta-classifier models, but not the neural network, offered improved resolution of risk for high-risk individuals.