Sunday, November 10, 2013

Rankings explanation

In my BYU-Wisconsin glog comments, Brian made the point that S Carolina was way overrated purely because of their preseason rankings.  He's right (of course), and so I was thinking about how to do some sort of rankings that was purely based on this season's accomplishments and not some historic or conference bias.  I thought of the RPI which is used in basketball to some extent, so I decided to make my own.  I don't particularly trust the BCS "computer" rankings because the correlation of the AP rankings to the computer rankings is too high, so there HAS to be some sort of bias in their algorithm.

For my job, I do lots of data analysis, so this was right up my alley.  I found a website that has all of the FBS teams and results for the season, pasted it into Excel, wrote a macro to get it into database format, then imported into SQL Server. 

First, I did an initial calculation based on records and margins of victory.  This would change week to week, for example after week 1, BYU's loss to Virginia wasn't that bad because Virginia was 1-0, but as Virginia kept losing and losing, it got worse and worse.  The initial calculation would be the driver for strength of schedule (SOS). Each team was assigned a factor.

I then did the same analysis, but this time adding the SOS factor as the key driver.  Each game was assigned a value.  The highest value was FSU's victory over Clemson, and the lowest was South Florida's big loss to McNeese St (FCS team).  Each FCS team was considered equal to the worst FBS team, which I know isn't terribly accurate, but you have to make assumptions when you don't have data.

Then I averaged all of these values and got the rankings.  You'll see that my rankings are not too different than this week's BCS rankings.  The main noticeable difference is that FSU is #1, but the biggest difference is LSU, who is 32nd in my rankings.  S Carolina is only off by 3, UCLA is overrated by 7 spots, and ASU is underrated by 8 spots.  Overall, this is the difference by conference for top 25 rankings:


SEC 24
ACC -3
Pac 1
Big 10 -3
NA -2
Big 12 -5

You can see that the SEC, as I suspected, is overrated by an overall 24 spots, LSU the biggest offender.  Most of the other conferences are pretty accurate, although the Big 12 is slightly underrated.  "NA" is everybody not in those 5 conferences.

I was working on the data through week 11 for most of my analysis, but about 2 hours ago the website was updated.  Through week 11, BYU would have been ranked 17th, and Wisconsin 18th.

In full disclosure, I did add *a little* conference bias.  I gave a slight edge to all teams from the major 5 conferences listed above for SOS, plus Notre Dame and BYU.  So this actually ends up being advantageous for the people playing these teams, not the actual teams.  I thought it helped reflect reality, because the worst team in the SEC is much better than the worst team in the smaller conferences.  But I didn't pick and choose the teams (aside from ND and BYU, since they are independent) who I gave the boost to, I just did it for all teams in those conferences, and it was all the same level of boost (1 factor point for the SOS - not that it means anything to you).

One major thing I have not yet considered is home field advantage, but I'm going to call it quits for this week because I've been doing this for almost 5 hours.  Hendrik's at his Aunt Amber's, Mel and Adelaide are in Texas, and I don't have much to do.  Let me know your thoughts - I'm open to suggestions for improvement, and would even share my code.

5 comments:

kurt said...

Seriously though, this is actually really cool and I really think it is an improvement upon any rankings college football actually uses. As of now I can't think of any improvements, I do know that it is extremely difficult to truly justify just how good certain football teams are. This is precisely why only a playoff can legitimately discern who any year's best team is.

Tyler Hansen said...

I would be interested in seeing the code.

phil said...

can i see the code too? what other data points are there?

Brian said...

That is pretty cool. I've seen some other computer rankings and they're pretty similar; but to have one that my own flesh and blood has come up with is awesome. One thing that is different is that you said you implement margin of victory, which probably accounts for why the Fla. St.-Clemson game was the best victory. They were talking about the most impressive victory during the Stanford-Oregon game. I thought Stanford's victory was more impressive, because Clemson was (as usual) totally overrated. In fact, they still are. The only real reason they're up as high as they are is the fact they beat Georgia, which isn't even ranked.

I wonder how the rankings would look had you not included that bias for teams from the Big 5 conferences (and BYU and Notre Dame)?

Paula said...

So has this become a purely sports blog?