I’ve started a small, graduate research project for my AI class that’s been stealing my attention lately. I’ll be data mining a large data set with the machine learning software Weka, to train the software how to predict prognosis (estimated survivability from diagnosis) of stage IV breast cancer patients. Weka seems to have an impressive array of machine learning tools, but most of my time is being spent converting data from one format to the other.  It feels a lot like moving sand from one pile to the other with tweezers.

This research, like all research, is incremental. Several researchers have done a similar study and fortunately their papers are available here, here, here, here and here. Having ready and open access to these papers is crucial for me to be able to learn past techniques and build upon them. I’m not expecting to cure cancer here, only to maybe add a little piece of information to the puzzle, if I’m lucky.

Now imagine an environment where those papers were blocked or were cost prohibitive to the point of being inaccessible. Unfortunately, that’s a large part of the research world today. Recently Aaron Swartz‘s death has brought access to academic research to the forefront (that and ridiculously aggressive prosecutors and outdated computer crime laws). But there has been an Open Access movement that seeks to find a way to advance research without restricting who can view the data. In academic research, this just makes sense. I particularly like this summary over at the EFF targeting grad students. If anything, there are more students than professors / journal staff so we have the number advantage 😉

What’s encouraging about Open Access is that there are journals using this model right now! Like PLOS One which is an peer-reviewed, open, online journal supporting any scientific discipline. It used to be cool to be published in Nature, where it only costs $199.00 a year for access to the articles, but how about we make it cooler and more useful to be published in an open-access journal?


4 thoughts on “Supporting Open Access research

  1. This is especially relevant in the wake of what happened to Aaron Swartz.
    The big issue with most of these journals, as I understand it, is how good their rejection policy is. Nature, for example, has a bad reputation among many in the medical field because it prints most articles it receives, whether or not they present valid results.
    Compatible data formats, though… that would be a godsend.

    1. Just like the normal journals, I imagine it would take some time for an Open Access journal to gain a reputation. I’m picturing more of an “open source” style distributed review process, but I’m not sure what PLOS does…

      Isn’t LaTeX enough of a standard data format 😛 ? What I find annoying, especially in the CS world, is that the paper is available but the code that produced the results is not. This is almost as important as the paper to try to reproduce / validate results!

