Machine learning algorithms have been shown to make medical diagnoses as accurate as doctors in the field of cardiology, dermatology, and oncology. By using machine learning approaches, diagnostic success may be increased whilst human error rate could be reduced. The 60th American Society of Hematology (ASH) Annual Meeting was held in San Diego, California, from 1–4 December 2018. On Saturday 1 December 2018, an oral abstract session was held entitled: 722. Clinical Allogeneic Transplantation: Acute and Chronic GvHD, Immune Reconstitution: GvHD Grading and Outcomes and Management. During this session, Abstract #68 was presented entitled: Prediction of Acute Graft-Versus-Host Disease Following Allogeneic Hematopoietic Stem Cell Transplantation Using a Machine Learning Algorithm and was presented by Yasuyuki Arai from Kyoto University, Kyoto, Japan.
Doctor Arai on behalf of colleagues presented data from a registry database dependent retrospective cohort study with the primary objective to establish and validate an index for acute graft-versus-host disease (aGvHD). Alternating decision tree (ADTree) machine learning algorithm was used to develop a predictive model. Adult patients undergoing transplantation were included in the analysis and randomly divided into the training cohort (70% of patients) and the validation cohort (30% of patients). The ADTree was validated in the validation cohort using the competitive risk hazard model. Study endpoints included 1-year incidence of aGvHD and 2-year overall survival (OS).
Patients and methods
- N = 26,695 patients
- Training group: n = 18,645 patients
- Validation group: n = 8,050 patients
- Grades 2-4 and 3-4 aGvHD: 42.8% (95% CI, 42.2–43.4) and 17.1% (95% CI, 16.6–17.5)
- Predictive ADTree models were used
- Variables such as underlying disease, donor source, HLA donor type, sex mismatch, conditioning regimen, GvHD prophylaxis, and donor age were adapted into each model for aGvHD prediction
- Models were tested in the validation cohort
- Incidence of aGvHD was clearly stratified according to the categorized ADTree scores
- Cumulative incidence of grade 2-4 aGvHD: 29.0% in low risk patients, 35.3% in low-intermediate risk patients, 48.7% in high-intermediate risk patients, 41.8% in intermediate risk patients, and 58.7% in high risk patients
- Cumulative incidence of grade 3-4 aGvHD: 8.6% for low risk, 12.5% for low-intermediate risk, 14.9% for intermediate risk, 21.1% for high-intermediate risk, and 28.6% for high risk patients
- These two scores for aGVHD also demonstrated the relationship with the inferior overall survival after HSCT
Doctor Arai concluded that variables were automatically extracted by machine learning algorithms (ADTree), in the absence of any bias from researchers. Models were clinically reasonable, and the algorithm provided sensible risk stratification scores for the incidence of aGvHD. This data should be further validated in future studies.