DM 101: Intro to Data Mining

There are lots of books and kits about doing science experiments with kids.  Things like making a stink bomb out of a ping pong ball or a paper-mache volcano.  I also noticed my kids (2nd and 4th graders) are already bringing home math homework with probability and statistics and I’ve, umm, been learning quite a bit.

So I’m starting this “DM 101” series (for “Data Mining 101” in case it’s not obvious) to be kind of like a “365 Chemistry Experiments” except it’s with data, spreadsheets, and computers.

Kids love computers – laptops, Wii’s, DSi’s, cell phones.  My son wants a cell phone not to call people but so he can check the radar.  And to love computers is to love data science.  Watch your kids the next time they’re playing Pokemon – they’re analyzing strength and weakness ratings, and running thousands of epochs on their Personal Neural Nets (PNN) analyzing attack probabilities.  Reminds me of the countless hours I spent playing Dungeons and Dragons [wiping away tear].

So my goals in DM 101 are:

  • Use real data – sports, astronomy, weather
  • Depending on who you talk to, data mining is anywhere from 50-90% data preparation.  I’ll have details about how to get the data you’ll be mining.
  • Make decisions and predictions with this data and see if they come true
  • Explain terminology – so like above an “epoch” in data mining is a training example and you usually need a lot of them to train a neural net (artificial or personal).

Through marriage I am blessed with many non-technical friends.  They are copy writers, book editors, marketing job guys, and theater types.  They generally use Macs and kick my butt in games like Scrabble.  They will also be the target audience.  If they can understand then anyone can….

Which brings me to one final note – most of the software I use will be open source.  I say most because I’m probably going to use Microsoft Excel.  I’ve used OpenOffice’s spreadsheet and it’s great – use that if you don’t have or want Excel.    Other than that I’ll just be using programming languages like R, Perl, and Python.

Good luck…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: