Markov Chain

A while back, Hackers at Berkeley hosted a small session on Artifical Intelligence, introducing (among other things) the Markov Chain. I decided to implement my own for random word/name generation, using single letters as states and training on about 22k English words (wlist_match10 at http://www.keithv.com/software/wlist/). This gave me some decent results:

  • rinamars
  • reoroly
  • alatostic
  • miglly
  • lind
  • meral

And some less decent results:

  • swr
  • shhepulin
  • ananesigludive
  • oniprderitoreredeen
  • sbrbsctris
  • sifaralerivimasststisondizagres
  • atequlyngekeds

I also wanted to visualize these connections in a way that quickly portrayed information, so I took my old friend the Particle System and connected all the letters up according to how high the probabilities were.



See the sketch here.

 

It’s a bit messy, but you can see several interesting features of the English language. The most striking thing was that the most common vowels (a,e,o) always float towards the center of the system. You can shake up the mass but the sheer number of relatively balanced connections makes the vowels only stable near the center.