Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Last quarter for my Advanced AI class, I performed some machine learning experiments on the Surveillance Epidemiology and End Results (SEER) database.  It was my first in-depth study using machine learning and I was particularly primed for the topic having just read The Signal and the Noise by Nate Silver.  While Nate does not specifically address machine learning, he is a clear supporter of Bayesian-based statistics, so the topic was apropos.

My biggest takeaway was perhaps that applying the complex machine-learning algorithm is the easy part thanks to established software libraries and toolkits.  Preparing the data for analysis and understanding the results are the most time consuming and complicated aspects of the task.  I worked closely with an oncologist, who did most of the heavy-lifting with the clinical analysis. Continue reading “Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients”

Supporting Open Access research

I’ve started a small, graduate research project for my AI class that’s been stealing my attention lately. I’ll be data mining a large data set with the machine learning software Weka, to train the software how to predict prognosis (estimated survivability from diagnosis) of stage IV breast cancer patients. Weka seems to have an impressive array of machine learning tools, but most of my time is being spent converting data from one format to the other.  It feels a lot like moving sand from one pile to the other with tweezers.

This research, like all research, is incremental. Several researchers have done a similar study and fortunately their papers are available here, here, here, here and here. Having ready and open access to these papers is crucial for me to be able to learn past techniques and build upon them. I’m not expecting to cure cancer here, only to maybe add a little piece of information to the puzzle, if I’m lucky.

Now imagine an environment where those papers were blocked or were cost prohibitive to the point of being inaccessible. Continue reading “Supporting Open Access research”