Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Last quarter for my Advanced AI class, I performed some machine learning experiments on the Surveillance Epidemiology and End Results (SEER) database.  It was my first in-depth study using machine learning and I was particularly primed for the topic having just read The Signal and the Noise by Nate Silver.  While Nate does not specifically address machine learning, he is a clear supporter of Bayesian-based statistics, so the topic was apropos.

My biggest takeaway was perhaps that applying the complex machine-learning algorithm is the easy part thanks to established software libraries and toolkits.  Preparing the data for analysis and understanding the results are the most time consuming and complicated aspects of the task.  I worked closely with an oncologist, who did most of the heavy-lifting with the clinical analysis. Continue reading “Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients”

Supporting Open Access research

I’ve started a small, graduate research project for my AI class that’s been stealing my attention lately. I’ll be data mining a large data set with the machine learning software Weka, to train the software how to predict prognosis (estimated survivability from diagnosis) of stage IV breast cancer patients. Weka seems to have an impressive array of machine learning tools, but most of my time is being spent converting data from one format to the other.  It feels a lot like moving sand from one pile to the other with tweezers.

This research, like all research, is incremental. Several researchers have done a similar study and fortunately their papers are available here, here, here, here and here. Having ready and open access to these papers is crucial for me to be able to learn past techniques and build upon them. I’m not expecting to cure cancer here, only to maybe add a little piece of information to the puzzle, if I’m lucky.

Now imagine an environment where those papers were blocked or were cost prohibitive to the point of being inaccessible. Continue reading “Supporting Open Access research”

AI Winter has returned

About two weeks ago, AI Winter set upon us again.  Although not the same AI Winter that froze over most of the AI research in the 80s and 90s.  This AI winter is an “inter-city, bi-weekly, programming study group” that will study seminal Artificial Intelligence papers over the next three months, for fun.  Philadelphia is ground zero for the event, but there are cities around the world following the program.

In the first meeting, we discussed Alan Turing’s Computing Machinery and Intelligence, which among other things, describes the Turing test (although he doesn’t call it that).  Remarkably, everybody who arrived (over 30 people) had read the paper, which I think beats any graduate class I’ve attended thus far.  Each week a volunteer facilitates the discussion and another volunteer takes notes.  The discourse was lively and with a healthy mix of backgrounds from all in attendance, there were many insights.  It’s like a book club for readers who like breakthrough research papers in AI.  Oh and afterwards, we drink some beer.

Apparently NoSQL Summer, a “reading club for databases, distributed systems & NoSQL-related papers” was the spark which launched the Philly Lambda group to host Functional Fall, both of which were a success and hence, AI Winter.  The idea is based on “studying the masters, not their pupils.” i.e. reading the first hand source vs. a summary (ironically, that link cites Wikipedia for the quote…)

The AI winter papers should complement my Advanced AI class pretty well.  Speaking of that, I’ll be performing some data mining with a popular machine learning tool called WEKA, but more on that later.

Zen Mind, Beginner’s Mind, Hacker’s Mind

I recently finished Zen Mind, Beginner’s Mind (ZMBM).  I know I told everybody that I was supposed to be reading Liars and Outliers and taking a break from fiction, but L&O was a bit too dry, so I picked up ZMBM and subsequently, I’m reading The Handmaid’s Tale.

Zen Mind, Beginner's Mind
Zen Mind, Beginner’s Mind (Photo credit: Wikipedia)

Admittedly, I have no idea how I choose books.  I’ve been using GoodReads, mainly to record when I finish a book, but a secondary motive is to deduce what kind of search I’m doing through bookspace.  It must be some sort of A* Search, but I don’t know what I’m using for a heuristic.  Hmm, it would be interesting to see the Markov chains on my book completions.  Bad AI jokes aside, this meta reasoning is a good segue back to ZMBM.

In the spirit of Zen, I freely admit I know nothing about it 😉  This book, which is a collection of talks from Shunryu Suzuki is a very powerful book.  Powerful in that I discovered that I share a lot of the core ideas of Zen.  These ideas are difficult to describe, which is why I think this book is written in a very simple, socratic style.  So, I’ll just list some of the key topics that resonated with me.

  • Beginner’s Mind.  This is the aspect of Zen I like the most, probably best explain with the following kōan.  To me, this is the way of constantly being a student and constantly learning.  Which, is even easier now that coursera is expanding!

Nan-in, a Japanese master during the Meiji era (1868-1912), received a university professor who came to inquire about Zen.

Nan-in served tea. He poured his visitor’s cup full, and then kept on pouring.

The professor watched the overflow until he no longer could restrain himself. “It is overfull. No more will go in!”

“Like this cup,” Nan-in said, “you are full of your own opinions and speculations. How can I show you Zen unless you first empty your cup?”

  • Imperfection.  “We should find perfect existence through imperfect existence.” -ZMBM.  This concept should be readily accepted among mathematicians, computer scientists and fans of Kurt Gödel and his incompleteness theorem.  Very loosely stated, the theorem states that there are true statements in a system (number theory specifically) that are unprovable.  Like: “this sentence is not provable.”  It’s hard to understate the significance of Gödel’s work as it proved that mathematic number theory is incomplete, which was pretty much like Copernicus saying the Earth is round.  But I think Gödel’s theorem is a very Zen idea and I see no contradiction in perfect imperfection.
  •  Monkey Mind vs Simple Mindedness.  I find this funny expression of reverse personification very apt.  Again a quote from ZMBM, “You are just wandering around the goal with your monkey mind.  You are always looking for something without knowing what you are doing.”  With more and more demands for our attention, it’s nice to simply focus on one thing.  Whether it be writing, reading, sitting, eating or coding, that is the activity to focus on, right then.  It’s a sharp contrast to our multitasked society and again, this is consistent with the Unix way.

A Unix novice came to Master Foo and said: “I am confused. Is it not the Unix way that every program should concentrate on one thing and do it well?”

Master Foo nodded.

The novice continued: “Isn’t it also the Unix way that the wheel should not be reinvented?”

Master Foo nodded again.

“Why, then, are there several tools with similar capabilities in text processing: sed, awk and Perl? With which one can I best practice the Unix way?”

Master Foo asked the novice: “If you have a text file, what tool would you use to produce a copy with a few words in it replaced by strings of your choosing?”

The novice frowned and said: “Perl’s regexps would be excessive for so simple a task. I do not know awk, and I have been writing sed scripts in the last few weeks. As I have some experience with sed, at the moment I would prefer it. But if the job only needed to be done once rather than repeatedly, a text editor would suffice.”

Master Foo nodded and replied: “When you are hungry, eat; when you are thirsty, drink; when you are tired, sleep.”

Upon hearing this, the novice was enlightened.

Zen has always been a part of hacker culture and especially the use of kōans, but until ZMBM I’ve never read primarily on the topic.  However, I don’t think that you have to be hacker to enjoy the book, although it certainly helps.

Hacker Emblem made with Zen Brush. I think Zen is probably the most marketable religious moniker. Not surprisingly, there are no products called “Christian Brush” or “Brush of Zoroastrianism” even. The new God and Kings expansion of Civ5 has been increasing my religious exposure.

AI Sportswriters brewing coffee 1890s style

This months WIRED magazine, which I insist on receiving by mail, had some great articles.  I also read books made out of paper, so if you are Generation Y or later you may just want to go to WIRED website and read these articles for free.

Anyway, onto my WIRED roundup with: Fewer Voters, Better Elections by Joshua Darvis.  Scrap the one vote per person system and run it like clinical trials where 100,000 people are randomly selected to vote.  This is certainly one way to implement voting reform… Personally, I think it would be interesting to have a different representative system.  Currently, congressional representatives in the U.S. are elected based on a geographical area, with the idea being that particular elected official accurately represents his or her

Almost All the Wired Magazines Ever Published
I don’t consider myself a hoarder, but I do keep my WIREDs.  This is not my collection, but maybe one day…(Photo credit from flickr: outtacontext)

constituents based on location.  But what about if we had representatives based on profession?  I feel that I agree with more software engineers than I do my neighbors.  Passing thought experiments for sure as I doubt any reform is up-and-coming in the voting arena.

In a short product review, apparently the Bodum Bistro 11001 Coffeemaker is the thing to get these days.  Me, I’ve switched to a french press.  Mainly out of necessity since in my current living arrangement, I do not have a counter.  Essentially coffee makers are expensive heating elements.  They look nice, but basically they drip water and then keep it hot.  So, $250 seems a bit steep for me when there are cheaper ways to heat water.  I also use a burr grinder and keep my coffee in a mason jar.  I’m suddenly realizing I’m living in the 1890s, or in Portland.

Lastly, Steven Levy, of Hackers fame, writes of rise of AI in sports reporting in the Rise of the Robot Reporter.  As I learned from my Game AI class last quarter, there is a lot of active research in AI generated narrative (stories).  In the game world, this allows games like Skyrim to have unlimited quests and to be never-ending (story! sorry… couldn’t resist. Where are the actors in that movie now?!).  The idea with the robo-reporter is that for sports stories, which are very data-centric, the AI would generate the post-game article.  Once the AI is aware of the rules of the game, it would then know what plays were pivotal and be able to detect the turning point of the game.  The story would then be written prior to the teams shaking hands.

Narrative generation is not yet human-quality, so there is no near-term fear that robots will take over sports journalist jobs.  However it provides a great starting point for writing the article.  But what I find more interesting is its applicability into video games.  Imagine an online game, I’m thinking a MMORPG type, where battles won and lost are documented by in-game newspapers, written by AIs.  Did you just make the leader board?  You can read a detailed article about it in the Daily Paper.  This could even be provided as paid downloadable content.  Everybody has a newspaper from the day they were born, but how about a copy of paper from Skyrim on that day?

Now, if I could only find a way to get my hands on the new german WIRED. Maybe when I go to Germany in June I’ll have to hunt down a copy…

Now AIs have all the fun: they play and create the game!

A new AI system, called Angelina is extending procedural content generation to create an entire video game. As part of Michael Cook’s PhD, from Imperial College of London, he developed Angelina, which randomly creates the level design, the enemies, the enemy movements and combat tactics, and the power-ups.

Ok, not everything is generated right now. The music and graphics are human-made, but procedural generated techniques for generating music and graphics do exist. As the New Scientist article hints, what’s to stop an artist from using Angelina for pushing out a new game every 12 hours and posting it to the App Store… A game generated from

Bill Gosper's Glider Gun in action—a variation...
Video Games beget Video Games via Wikipedia

Angelina is available online to play.  It’s pretty impressive.  It’s no Half-Life, but remember this was automatically generated!  Now, if there was a video game that created video games, we’d have a practical example of a self-reproducing machine besides Conway’s Game of Life.

And then there is this video, by Quantic Dream that primarily shows the improvements in near-human CG animation. It’s stunning visually, but it’s also a gripping vignette. Showing the singularity moment when AIs become self-aware. When this happens, I think they will make more than scrolling 8-bit games!

Lastly, I found an interesting paper on Automatic Quest Generation. In this paper, Jonathon Doran and Ian Parberry survey 3000 quests from various online games like World of Warcraft and categorize the type of quest. They then go own to create a set of rules (a grammar for those CS-types reading) to produce the quest procedurally. Those quests can get boring fast, and I’m not surprised to find out that most have the form:

    • 〈 goto 〉 kill | i.e. Goto Place X, kill thing Y
    • 〈 goto 〉 〈 get 〉 give | i.e. Goto Place X, get magic potion Y, give it to NPC Z.

At some point while playing WoW (a few years ago…), I stopped reading the actual quest description (i.e. the story) just to see the lists of tasks I had to accomplish.  It was at that point that I also stopped finding the game fun and stopped playing.  So if designers focus on a good main story, they can offload small side quests to the AI.  After reading this paper and watching associated video, I think I’m going to incorporate a subset of their grammar into my game project, and combine it with some player modeling. I can’t give away too much to my potential test subjects, after all, there will be cake.

Game AI as Storytelling

I created a short screencast based on a very interesting paper on Game AI as Storytelling, which was produced from Georgia Tech’s Intelligent Narrative Computing group.  Imagine if you lived in the world of Little Red Riding Hood (or Rotkäppchen for purists), where you can kill random characters.  How many variations of the story can there be?  Well, if you limit yourself to 5 killings, there are over 1300 branches.  I won’t give too much away, but some stories involve a fairy and a new character called Grendel…  My short presentation hardly does the paper justice, but serves as a very high-level overview.

The screencast is in two parts: part 1 and part 2.  My mic sensitivity was set too high for part 1, so please make sure your volume isn’t too loud!  Here are my slides, which I created with the LaTeX Beamer class.  This was my first presentation made with Beamer, and it was a great experience.  Forget PowerPoint!  Beamer allows me to draft a presentation from pure text, in Emacs, and use LaTeX.  Brilliant!

So, I tormented some friends with playing an interactive fiction game with a drama manager based off of this published work.  It’s a subset of what the authors researched.  I decided to model player frustration, and as you can see, this is a frustrating game, even with hints! 🙂  To be fair to the player, the cause of most of the frustration was in the input system (natural language processing), which I didn’t try to resolve.

1200 Cycles is about 1 minute, the player played the game for about 1 hour

The next step is extend this with some procedural generated subplots, ala Skyrim.  If anybody wants to be a test subject (think Aperture Science), let me know and I’ll try to package up the game and instructions for returning the data.  There will be cake.

If only engineering was like nethack…

I recently read two excellent books on working in engineering teams. Before you shrug them off, they actually are very well written, in fact one of them was awarded the Pulitzer Prize. The books are Soul of a New Machine and Dreaming in Code.

Dreaming in Code is an expose that shows why software is hard. It describes the Chandler project and how they set out to create an outstanding piece of software, and how things went so terribly wrong. As a software engineer, the book is both painful and inspiring. But if you wondering why even today, you have parts of your digital life on work computer, parts at home, and parts on the go, this is would be a good book for you.

The other book, Soul of the New Machine, at its core is about what motivates an engineering team to create something. In this book, the team was trying to create the best computer available circa 1980. A bit more hardware focused, but there are insights for any team of people who set out to create something new. What drives somebody to work endless hours without extra pay and the detriment to health and family?

So, since I’m on a non-fiction technology kick, this is what my reading queue looks like:

I have a queue for fiction as well. I find that when I have too much going on, I can’t really get into the story so I switched to non-fiction.

Lastly, we are currently studying Procedural Content Generation in my Game AI class, which is basically the ability for the game to create its own stories / content. Skyrim, the new game in the Oblivion series, is doing this such that the game is “endless.” It also uses a technique to generate the foliage, since that would take too much more for a single human designer.

This is nothing new of course, nethack has done this for years. And while I have known about nethack and played it once or twice before, I picked it up again and realized it is very good. While it looks rudimentary, it is quite rich with features, rules and player

iteration. Each game is randomly generated, and it is challenging. A modern equivalent that one can play on the iPhone is 100 rogues, which I talked about last post.  Somehow I nethack in progressfind myself playing more and more of it, but it could just be because I have a problem set due…

Forget Skynet. How about an AI that helps you win video games!

This is how I saved my game in the 1980s:

  1. Pause the game, by hitting one of the five buttons on a controller.
  2. Turn off the TV.
  3. Do NOT, I repeat, DO NOT power off the console under any circumstances.

Video games back then were continuous story arcs.  “Saving” the game (as described in

Zork I cover art
Image provided by a Grue at Wikipedia

the function save80s above), risked losing hours of “work” if somebody bumped the Nintendo.  But compared to today’s video games, there was for me, a much stronger sense of accomplishment by beating Super Mario Brothers, or Zelda, or Zork.  So agrees Keith

Burgun, designer of the mobile game 100 Rogues in this podcast.

100 Rogues is “an arcade-style dungeon crawling adventure,” but also, it is defiant to modern games in a crucial way: in a game instance, the player may not win.  Most of the time, the player will die and must restart the game; there is no saving.  In fact, saving in 100 Rogues is very 80s-like in that it can only be paused by nature of switching tasks as provided by iOS.  I’m sure that this is much to the chagrin of the author, who would rather see the game finished by death or victory.

While Keith ideologically stands by a gameplay theory that player skill, not the player’s character skill, must increase for forward game progress, the commercial world designs games to be fun for most users.  Unfortunately for Keith, I believe that most gamers would not find his games fun, mainly because his games are designed to be difficult and require skill.  In this case, skill is how well one can play the game.

So, when I read this Ars Technica article, I was not surprised by the demand for in-game hint systems.  After all, this would have made Myst a lot easier, especially because I can remember clicking each pixel trying to see if there was an intractable object I was missing… UGH!  However, I was surprised in my game AI course to discover that there is significant academic research to solving this very problem.

One such idea involves the concept of the game constructing an emotional player model to detect when the player is frustrated / lost / stuck.  Once detected, the AI offers a hint.  The technically interesting piece here is how to detect when the player is frustrated and how to learn to detect when the player is frustrated through machine learning.  The creatively interesting piece is how to design subtle hints, since some players may not like the fact that the game has declared them “hint-worthy.”

I think this keeps Keith Burgun up at night.  About to die in Halo?  All of a sudden, all sorts of weapons drop in-front of you.  This is game adaptation and it is the next step up from hints.  Does this make the game more fun?  For most, it would seem so.  For those that insist on solving crosswords without glancing and the answers, probably not.  In my current game AI project I’m trying to develop a hint-AI for an interactive fiction (very limited) game.  I’m hoping the players will find it fun, but I hope to design in such a way that it’s still enjoyable and won’t cause the Keith Burguns of the world to send a terminator to assassinate me 🙂

Should peasants think for themselves?

One of the things I find interesting about artificial intelligence (A.I) is its intersection with philosophy.  I read that Blizzard uses some sort of “think” routine for each of the units in its Real Time Strategy (RTS) games (Warcraft, Warcraft II, Starcraft, etc…)  So, while I’m sitting here trying to design different A.I. models for a project, I found myself considering whether I should incorporate a similar think routine for the units in the game.  The idea being that the main A.I. manager would ask the units “what do you think you should be doing?”

It's not that I don't trust you...
It's not that I dont' trust you...

What would a peasant want to do?  Well, in the game there are buildings to build, wood to harvest and gold to collect, but wouldn’t he rather just sit there?  So, I concluded that peasants shouldn’t think for themselves and perhaps there should be a manager thinking for the peasants.

Another philosophical observation occurred when I was working on a Pac-man game.  In a certain implementation, Pac-man would always work to maximize his score.  So, if Pac-man realized, through heuristic search, that he was trapped and would inevitably die, he would dive-bomb a ghost, as each second alive decreased his score, like a good utilitarian.