Some Python to extract NCAA Team stats from Sports-reference.com
2013/03/14 1 Comment
It’s that time of year. In prep for an NCAA bracket clustering project I’ve had inthe back of my head for the past year, here’s a snippet of Python to extract the team stats from sports-reference.com. Thank you BeautifulSoup!
</span> <pre>import urllib2 import csv from bs4 import BeautifulSoup sites = { "BigTen" : "http://www.sports-reference.com/cbb/conferences/big-ten/2013.html#standings::none", "BigEast": "http://www.sports-reference.com/cbb/conferences/big-east/2013.html#standings::none", "ACC": "http://www.sports-reference.com/cbb/conferences/acc/2013.html#standings::none", "Pac-12": "http://www.sports-reference.com/cbb/conferences/pac-12/2013.html#standings::none", "Big-12": "http://www.sports-reference.com/cbb/conferences/big-12/2013.html#standings::none", "MWC": "http://www.sports-reference.com/cbb/conferences/mwc/2013.html#standings::none", "SEC": "http://www.sports-reference.com/cbb/conferences/sec/2013.html#standings::none", "Atlantic-10": "http://www.sports-reference.com/cbb/conferences/atlantic-10/2013.html#standings::none", "MVC": "http://www.sports-reference.com/cbb/conferences/mvc/2013.html#standings::none", "WCC": "http://www.sports-reference.com/cbb/conferences/wcc/2013.html#standings::none", "CUSA": "http://www.sports-reference.com/cbb/conferences/cusa/2013.html#standings::none", "WAC": "http://www.sports-reference.com/cbb/conferences/wac/2013.html#standings::none", "Horizon": "http://www.sports-reference.com/cbb/conferences/horizon/2013.html#standings::none", "BigWest": "http://www.sports-reference.com/cbb/conferences/big-west/2013.html#standings::none", "MAC": "http://www.sports-reference.com/cbb/conferences/mac/2013.html#standings::none", "MAAC": "http://www.sports-reference.com/cbb/conferences/maac/2013.html#standings::none", "Sun-Belt": "http://www.sports-reference.com/cbb/conferences/sun-belt/2013.html#standings::none", "Patriot": "http://www.sports-reference.com/cbb/conferences/patriot/2013.html#standings::none", "Colonial": "http://www.sports-reference.com/cbb/conferences/colonial/2013.html#standings::none", "Ivy": "http://www.sports-reference.com/cbb/conferences/ivy/2013.html#standings::none", "OVC": "http://www.sports-reference.com/cbb/conferences/ovc/2013.html#standings::none", "America-East": "http://www.sports-reference.com/cbb/conferences/america-east/2013.html#standings::none", "Summit": "http://www.sports-reference.com/cbb/conferences/summit/2013.html#standings::none", "Northeast": "http://www.sports-reference.com/cbb/conferences/northeast/2013.html#standings::none", "Southern": "http://www.sports-reference.com/cbb/conferences/southern/2013.html#standings::none", "Atlantic-Sun": "http://www.sports-reference.com/cbb/conferences/atlantic-sun/2013.html#standings::none", "Southland": "http://www.sports-reference.com/cbb/conferences/southland/2013.html#standings::none", "Big-Sky": "http://www.sports-reference.com/cbb/conferences/big-sky/2013.html#standings::none", "Big-South": "http://www.sports-reference.com/cbb/conferences/big-south/2013.html#standings::none", "MEAC": "http://www.sports-reference.com/cbb/conferences/meac/2013.html#standings::none", "Great-West": "http://www.sports-reference.com/cbb/conferences/great-west/2013.html#standings::none", #"Independent": "http://www.sports-reference.com/cbb/conferences/independent/2013.html#standings::none", "SWAC": "http://www.sports-reference.com/cbb/conferences/swac/2013.html#standings::none" } f = open('ncaa_data.csv', 'w') f.write("Conf, Rk, School, Conf_W, Conf_L, Conf_Pct, Over_W, Over_L, Over_Pct, PPG_Own, PPG_Opp, SRS, SOS\n") for item in sites.keys(): soup = BeautifulSoup(urllib2.urlopen(sites[item]).read()) print "Processing ", item, "..." for row in soup('table', {'class' : 'sortable stats_table'})[0].tbody('tr'): tds = row('td') if len(tds) >= 12: #some conferences like Sun-Belt have two tables and an extra header - this skips those rows f.write(item); f.write(",") f.write(tds[0].string); f.write(",") f.write(tds[1].find('a').string) ; f.write(",")#need to extract anchor text for x in range(2,12): f.write(tds[x].string); f.write(",") f.write("\n") f.close
Pingback: NCAA 2013 Sleeper Report | DMFunZone