Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Last quarter for my Advanced AI class, I performed some machine learning experiments on the Surveillance Epidemiology and End Results (SEER) database.  It was my first in-depth study using machine learning and I was particularly primed for the topic having just read The Signal and the Noise by Nate Silver.  While Nate does not specifically address machine learning, he is a clear supporter of Bayesian-based statistics, so the topic was apropos.

My biggest takeaway was perhaps that applying the complex machine-learning algorithm is the easy part thanks to established software libraries and toolkits.  Preparing the data for analysis and understanding the results are the most time consuming and complicated aspects of the task.  I worked closely with an oncologist, who did most of the heavy-lifting with the clinical analysis. Continue reading “Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients”


Supporting Open Access research

I’ve started a small, graduate research project for my AI class that’s been stealing my attention lately. I’ll be data mining a large data set with the machine learning software Weka, to train the software how to predict prognosis (estimated survivability from diagnosis) of stage IV breast cancer patients. Weka seems to have an impressive array of machine learning tools, but most of my time is being spent converting data from one format to the other.  It feels a lot like moving sand from one pile to the other with tweezers.

This research, like all research, is incremental. Several researchers have done a similar study and fortunately their papers are available here, here, here, here and here. Having ready and open access to these papers is crucial for me to be able to learn past techniques and build upon them. I’m not expecting to cure cancer here, only to maybe add a little piece of information to the puzzle, if I’m lucky.

Now imagine an environment where those papers were blocked or were cost prohibitive to the point of being inaccessible. Continue reading “Supporting Open Access research”

AI Winter has returned

About two weeks ago, AI Winter set upon us again.  Although not the same AI Winter that froze over most of the AI research in the 80s and 90s.  This AI winter is an “inter-city, bi-weekly, programming study group” that will study seminal Artificial Intelligence papers over the next three months, for fun.  Philadelphia is ground zero for the event, but there are cities around the world following the program.

In the first meeting, we discussed Alan Turing’s Computing Machinery and Intelligence, which among other things, describes the Turing test (although he doesn’t call it that).  Remarkably, everybody who arrived (over 30 people) had read the paper, which I think beats any graduate class I’ve attended thus far.  Each week a volunteer facilitates the discussion and another volunteer takes notes.  The discourse was lively and with a healthy mix of backgrounds from all in attendance, there were many insights.  It’s like a book club for readers who like breakthrough research papers in AI.  Oh and afterwards, we drink some beer.

Apparently NoSQL Summer, a “reading club for databases, distributed systems & NoSQL-related papers” was the spark which launched the Philly Lambda group to host Functional Fall, both of which were a success and hence, AI Winter.  The idea is based on “studying the masters, not their pupils.” i.e. reading the first hand source vs. a summary (ironically, that link cites Wikipedia for the quote…)

The AI winter papers should complement my Advanced AI class pretty well.  Speaking of that, I’ll be performing some data mining with a popular machine learning tool called WEKA, but more on that later.

Zen Mind, Beginner’s Mind, Hacker’s Mind

I recently finished Zen Mind, Beginner’s Mind (ZMBM).  I know I told everybody that I was supposed to be reading Liars and Outliers and taking a break from fiction, but L&O was a bit too dry, so I picked up ZMBM and subsequently, I’m reading The Handmaid’s Tale.

Zen Mind, Beginner's Mind
Zen Mind, Beginner’s Mind (Photo credit: Wikipedia)

Admittedly, I have no idea how I choose books.  I’ve been using GoodReads, mainly to record when I finish a book, but a secondary motive is to deduce what kind of search I’m doing through bookspace.  It must be some sort of A* Search, but I don’t know what I’m using for a heuristic.  Hmm, it would be interesting to see the Markov chains on my book completions.  Bad AI jokes aside, this meta reasoning is a good segue back to ZMBM.

In the spirit of Zen, I freely admit I know nothing about it 😉  This book, which is a collection of talks from Shunryu Suzuki is a very powerful book.  Powerful in that I discovered that I share a lot of the core ideas of Zen.  These ideas are difficult to describe, which is why I think this book is written in a very simple, socratic style.  So, I’ll just list some of the key topics that resonated with me.

  • Beginner’s Mind.  This is the aspect of Zen I like the most, probably best explain with the following kōan.  To me, this is the way of constantly being a student and constantly learning.  Which, is even easier now that coursera is expanding!

Nan-in, a Japanese master during the Meiji era (1868-1912), received a university professor who came to inquire about Zen.

Nan-in served tea. He poured his visitor’s cup full, and then kept on pouring.

The professor watched the overflow until he no longer could restrain himself. “It is overfull. No more will go in!”

“Like this cup,” Nan-in said, “you are full of your own opinions and speculations. How can I show you Zen unless you first empty your cup?”

  • Imperfection.  “We should find perfect existence through imperfect existence.” -ZMBM.  This concept should be readily accepted among mathematicians, computer scientists and fans of Kurt Gödel and his incompleteness theorem.  Very loosely stated, the theorem states that there are true statements in a system (number theory specifically) that are unprovable.  Like: “this sentence is not provable.”  It’s hard to understate the significance of Gödel’s work as it proved that mathematic number theory is incomplete, which was pretty much like Copernicus saying the Earth is round.  But I think Gödel’s theorem is a very Zen idea and I see no contradiction in perfect imperfection.
  •  Monkey Mind vs Simple Mindedness.  I find this funny expression of reverse personification very apt.  Again a quote from ZMBM, “You are just wandering around the goal with your monkey mind.  You are always looking for something without knowing what you are doing.”  With more and more demands for our attention, it’s nice to simply focus on one thing.  Whether it be writing, reading, sitting, eating or coding, that is the activity to focus on, right then.  It’s a sharp contrast to our multitasked society and again, this is consistent with the Unix way.

A Unix novice came to Master Foo and said: “I am confused. Is it not the Unix way that every program should concentrate on one thing and do it well?”

Master Foo nodded.

The novice continued: “Isn’t it also the Unix way that the wheel should not be reinvented?”

Master Foo nodded again.

“Why, then, are there several tools with similar capabilities in text processing: sed, awk and Perl? With which one can I best practice the Unix way?”

Master Foo asked the novice: “If you have a text file, what tool would you use to produce a copy with a few words in it replaced by strings of your choosing?”

The novice frowned and said: “Perl’s regexps would be excessive for so simple a task. I do not know awk, and I have been writing sed scripts in the last few weeks. As I have some experience with sed, at the moment I would prefer it. But if the job only needed to be done once rather than repeatedly, a text editor would suffice.”

Master Foo nodded and replied: “When you are hungry, eat; when you are thirsty, drink; when you are tired, sleep.”

Upon hearing this, the novice was enlightened.

Zen has always been a part of hacker culture and especially the use of kōans, but until ZMBM I’ve never read primarily on the topic.  However, I don’t think that you have to be hacker to enjoy the book, although it certainly helps.

Hacker Emblem made with Zen Brush. I think Zen is probably the most marketable religious moniker. Not surprisingly, there are no products called “Christian Brush” or “Brush of Zoroastrianism” even. The new God and Kings expansion of Civ5 has been increasing my religious exposure.

AI Sportswriters brewing coffee 1890s style

This months WIRED magazine, which I insist on receiving by mail, had some great articles.  I also read books made out of paper, so if you are Generation Y or later you may just want to go to WIRED website and read these articles for free.

Anyway, onto my WIRED roundup with: Fewer Voters, Better Elections by Joshua Darvis.  Scrap the one vote per person system and run it like clinical trials where 100,000 people are randomly selected to vote.  This is certainly one way to implement voting reform… Personally, I think it would be interesting to have a different representative system.  Currently, congressional representatives in the U.S. are elected based on a geographical area, with the idea being that particular elected official accurately represents his or her

Almost All the Wired Magazines Ever Published
I don’t consider myself a hoarder, but I do keep my WIREDs.  This is not my collection, but maybe one day…(Photo credit from flickr: outtacontext)

constituents based on location.  But what about if we had representatives based on profession?  I feel that I agree with more software engineers than I do my neighbors.  Passing thought experiments for sure as I doubt any reform is up-and-coming in the voting arena.

In a short product review, apparently the Bodum Bistro 11001 Coffeemaker is the thing to get these days.  Me, I’ve switched to a french press.  Mainly out of necessity since in my current living arrangement, I do not have a counter.  Essentially coffee makers are expensive heating elements.  They look nice, but basically they drip water and then keep it hot.  So, $250 seems a bit steep for me when there are cheaper ways to heat water.  I also use a burr grinder and keep my coffee in a mason jar.  I’m suddenly realizing I’m living in the 1890s, or in Portland.

Lastly, Steven Levy, of Hackers fame, writes of rise of AI in sports reporting in the Rise of the Robot Reporter.  As I learned from my Game AI class last quarter, there is a lot of active research in AI generated narrative (stories).  In the game world, this allows games like Skyrim to have unlimited quests and to be never-ending (story! sorry… couldn’t resist. Where are the actors in that movie now?!).  The idea with the robo-reporter is that for sports stories, which are very data-centric, the AI would generate the post-game article.  Once the AI is aware of the rules of the game, it would then know what plays were pivotal and be able to detect the turning point of the game.  The story would then be written prior to the teams shaking hands.

Narrative generation is not yet human-quality, so there is no near-term fear that robots will take over sports journalist jobs.  However it provides a great starting point for writing the article.  But what I find more interesting is its applicability into video games.  Imagine an online game, I’m thinking a MMORPG type, where battles won and lost are documented by in-game newspapers, written by AIs.  Did you just make the leader board?  You can read a detailed article about it in the Daily Paper.  This could even be provided as paid downloadable content.  Everybody has a newspaper from the day they were born, but how about a copy of paper from Skyrim on that day?

Now, if I could only find a way to get my hands on the new german WIRED. Maybe when I go to Germany in June I’ll have to hunt down a copy…

Now AIs have all the fun: they play and create the game!

A new AI system, called Angelina is extending procedural content generation to create an entire video game. As part of Michael Cook’s PhD, from Imperial College of London, he developed Angelina, which randomly creates the level design, the enemies, the enemy movements and combat tactics, and the power-ups.

Ok, not everything is generated right now. The music and graphics are human-made, but procedural generated techniques for generating music and graphics do exist. As the New Scientist article hints, what’s to stop an artist from using Angelina for pushing out a new game every 12 hours and posting it to the App Store… A game generated from

Bill Gosper's Glider Gun in action—a variation...
Video Games beget Video Games via Wikipedia

Angelina is available online to play.  It’s pretty impressive.  It’s no Half-Life, but remember this was automatically generated!  Now, if there was a video game that created video games, we’d have a practical example of a self-reproducing machine besides Conway’s Game of Life.

And then there is this video, by Quantic Dream that primarily shows the improvements in near-human CG animation. It’s stunning visually, but it’s also a gripping vignette. Showing the singularity moment when AIs become self-aware. When this happens, I think they will make more than scrolling 8-bit games!

Lastly, I found an interesting paper on Automatic Quest Generation. In this paper, Jonathon Doran and Ian Parberry survey 3000 quests from various online games like World of Warcraft and categorize the type of quest. They then go own to create a set of rules (a grammar for those CS-types reading) to produce the quest procedurally. Those quests can get boring fast, and I’m not surprised to find out that most have the form:

    • 〈 goto 〉 kill | i.e. Goto Place X, kill thing Y
    • 〈 goto 〉 〈 get 〉 give | i.e. Goto Place X, get magic potion Y, give it to NPC Z.

At some point while playing WoW (a few years ago…), I stopped reading the actual quest description (i.e. the story) just to see the lists of tasks I had to accomplish.  It was at that point that I also stopped finding the game fun and stopped playing.  So if designers focus on a good main story, they can offload small side quests to the AI.  After reading this paper and watching associated video, I think I’m going to incorporate a subset of their grammar into my game project, and combine it with some player modeling. I can’t give away too much to my potential test subjects, after all, there will be cake.

Game AI as Storytelling

I created a short screencast based on a very interesting paper on Game AI as Storytelling, which was produced from Georgia Tech’s Intelligent Narrative Computing group.  Imagine if you lived in the world of Little Red Riding Hood (or Rotkäppchen for purists), where you can kill random characters.  How many variations of the story can there be?  Well, if you limit yourself to 5 killings, there are over 1300 branches.  I won’t give too much away, but some stories involve a fairy and a new character called Grendel…  My short presentation hardly does the paper justice, but serves as a very high-level overview.

The screencast is in two parts: part 1 and part 2.  My mic sensitivity was set too high for part 1, so please make sure your volume isn’t too loud!  Here are my slides, which I created with the LaTeX Beamer class.  This was my first presentation made with Beamer, and it was a great experience.  Forget PowerPoint!  Beamer allows me to draft a presentation from pure text, in Emacs, and use LaTeX.  Brilliant!

So, I tormented some friends with playing an interactive fiction game with a drama manager based off of this published work.  It’s a subset of what the authors researched.  I decided to model player frustration, and as you can see, this is a frustrating game, even with hints! 🙂  To be fair to the player, the cause of most of the frustration was in the input system (natural language processing), which I didn’t try to resolve.

1200 Cycles is about 1 minute, the player played the game for about 1 hour

The next step is extend this with some procedural generated subplots, ala Skyrim.  If anybody wants to be a test subject (think Aperture Science), let me know and I’ll try to package up the game and instructions for returning the data.  There will be cake.