Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Last quarter for my Advanced AI class, I performed some machine learning experiments on the Surveillance Epidemiology and End Results (SEER) database.  It was my first in-depth study using machine learning and I was particularly primed for the topic having just read The Signal and the Noise by Nate Silver.  While Nate does not specifically address machine learning, he is a clear supporter of Bayesian-based statistics, so the topic was apropos.

My biggest takeaway was perhaps that applying the complex machine-learning algorithm is the easy part thanks to established software libraries and toolkits.  Preparing the data for analysis and understanding the results are the most time consuming and complicated aspects of the task.  I worked closely with an oncologist, who did most of the heavy-lifting with the clinical analysis. Continue reading “Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients”