Some Python to extract NCAA Team stats from

It’s that time of year.  In prep for an NCAA bracket clustering project I’ve had inthe back of my head for the past year, here’s a snippet of Python to extract the team stats from  Thank you BeautifulSoup!

<pre>import urllib2
 import csv
 from bs4 import BeautifulSoup
 sites = {
 "BigTen" : "",
 "BigEast": "",
 "ACC": "",
 "Pac-12": "",
 "Big-12": "",
 "MWC": "",
 "SEC": "",
 "Atlantic-10": "",
 "MVC": "",
 "WCC": "",
 "CUSA": "",
 "WAC": "",
 "Horizon": "",
 "BigWest": "",
 "MAC": "",
 "MAAC": "",
 "Sun-Belt": "",
 "Patriot": "",
 "Colonial": "",
 "Ivy": "",
 "OVC": "",
 "America-East": "",
 "Summit": "",
 "Northeast": "",
 "Southern": "",
 "Atlantic-Sun": "",
 "Southland": "",
 "Big-Sky": "",
 "Big-South": "",
 "MEAC": "",
 "Great-West": "",
 #"Independent": "",
 "SWAC": ""


f = open('ncaa_data.csv', 'w')

f.write("Conf, Rk, School, Conf_W, Conf_L, Conf_Pct, Over_W, Over_L, Over_Pct, PPG_Own, PPG_Opp, SRS, SOS\n")

for item in sites.keys():
 soup = BeautifulSoup(urllib2.urlopen(sites[item]).read())
 print "Processing ", item, "..."
 for row in soup('table', {'class' : 'sortable stats_table'})[0].tbody('tr'):
 tds = row('td')
 if len(tds) >= 12: #some conferences like Sun-Belt have two tables and an extra header - this skips those rows
 f.write(item); f.write(",")
 f.write(tds[0].string); f.write(",")
 f.write(tds[1].find('a').string) ; f.write(",")#need to extract anchor text
 for x in range(2,12):
 f.write(tds[x].string); f.write(",")



One Response to Some Python to extract NCAA Team stats from

  1. Pingback: NCAA 2013 Sleeper Report | DMFunZone

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: