GLM, GLM, GLM

Love this interview.  I didn’t realize Jeremy Howard, Chief Scientist at Kaggle, comes from an insurance background.  He gives out some nuggets of info on predictive modeling in insurance. He makes a point I’ve been shouting from the rooftops to whatever actuaries that will listen – predictive modeling of tough problems is so much more than GLM! The best overall model usually is a combination of techniques. As I talk to people in the actuarial world, the general feeling seems to be predictive modeling starts and ends with GLM, but it’s just once piece of the puzzle.

Another piece I often see missing in the actuarial world is unsupervised techniques – clustering, principal component analysis (PCA), Self Organizing Maps (SOM) – to create new variables which feed downstream.  I’m often surprised  how some crappy seeming cluster or principal component gets identified as an important variable by a  downstream algorithm.

Finally, he makes a distinction between Big Data and Analytics.  So often it’s implied that they go hand-in-hand – just look at job postings. Show me a listing for Hadoop that doesn’t also say you must know statistics or machine learning (ok, maybe there are some, but there aren’t many). Big Data is about scattering your data across multiple machines and is an engineering problem.  It’s like saying I have to be a DBA (and know about backups, replication, security, data governance and blah blah blah) just to write some SQL.

Advertisements