Analytics in the work place

The other day I had a demo from our research group on Attensity.  After some successful text analytics projects using Perl and R (and one that just didn’t work because of large volumes of claim notes), we’re looking for something that can handle higher volumes.

We were told that due to its complexity and cost of licenses, we would not be allowed to use Attensity ourselves but rather would have to work thru the research unit.  Obviously, this pained me, but if that is the most cost effective approach then I’m all for it.

But I don’t think this is cost effective at all and here’s why.

Analytics today is still a youngster in the corporate world and the way we do it now is comparable to the way IT was done years ago.  It used to be that if you wanted any kind of programming done you had to go to the IT group, your work was prioritized (i.e. you made and presented some kind of cost benefit analysis), and if there were enough resources your project slogged forward.  By the time it was done, the resulting software may or may not have met your needs (which might have changed since you started).  Hence the Agile craze, but I digress.

Compare IT of yesterday to today.  I can do 99% of my job with the tools on my desktop – as much as possible on the back-end with Oracle and SQL Server, and then whatever else needs to be done with Excel/VBA, SAS (technically not on my desktop, but whatevs), Perl, and R.

Imagine if I had to go to IT every time I needed a new Excel macro.  The time (cost) to get work done would be outrageous.  Sure there ends up being a lot of customer created “applications” out there, some of which get turned over to IT (much to their horror).  But what’s often forgotten is that only the top, must useful, processes ever make it to “the show”.  IT may have to figure out how to turn Access SQL into Oracle SQL, but so what – think of all the savings – the business analyst’s job is practically done.  And IT never had to spend one second on the 80-90% of processes that either serve their purpose for a short time, or are perhaps more experimental in nature.

So that brings us back to analytics today.  At some point some kind of analytics will be a tool included for free with Windows, just like VB Script is today.  And every reason you can give that this won’t happen has had a parallel reason in IT:

  • It’s too complicated (writing windows apps used to complicated, now its easy)
  • They’ll make mistakes – yup, they will.  But some of it will make it to The Show.
  • The software is too expensive – there are already free tools out there.

I’m not saying Enterprise Miner is going to be included in Windows 10.  But how about a simple decision tree?  While not the most powerful technique, I like to do a quick CART tree just to visualize some of the relationships (linear or not) in a dataset.  Really, you could say that Microsoft is already including data mining for very little additional cost in SQL Server.

The reason I know this to be true is innovation.  There’s no way you can innovative with analytics by having to go thru some research unit.  The nature of innovation is such that 90% of what you try is going to suck.  As you get better maybe you can bring that down to 85% (ok, maybe 88%).  Nobody is going to fund a project that has a 90% chancing of sucking, thus the whole thing of having to go to a research unit to innovate will never last – either the practice or the company will end.

Luckily, our company is also carefully opening up on its use for Free and Open Source Software (FOSS).  Which is why we’re looking at using GATE for our large project.