Friday, February 5, 2010

How do you seed the autobids?

Each year I get to the autobids, and am mystified. Last year, out of the 21 bottom autobids (those that would have either been in the last 4 in, or out, had they not won their conference tourney), I missed by 1 line on 10, and by 2 lines on 4. If I had randomly assigned the teams, I could have done just as well. How do I rank these teams, who have woefully little to show in their resumes?

To answer this question, I went to my trusted ally - statistical methodology. I entered data for the past 2 seasons, for those autobids who wouldn't have made the tourney on their own merit. (I did include 07-08 Davidson, and 08-09 Utah State, Temple, and VCU, because I'm not convinced that the committee would have included them had they lost). After running a linear regression with the seed as the dependent variable, here's the Model for Selection Committee Autobid Seeding:

SEED = 12.199 - .036 (Wins - Losses) - .076 (Conference Wins - Losses) -.707 (Elite Wins) + .029 (Non-Elite Top 50 Wins) - .063 (RPI 50-100 Wins) + .011 (Losses outside the Top 100) + .007 (RPI Rank) - .043 (Finish Within Conference) + .112 (Ranking of Conference)

I'd like to take a second to note here that out of all those variables, only the Constant and Conference Rank are significant. That is, for all other variables, there is greater than a 5% chance that that variable actually has the opposite effect on seeding, or does not affect the seeding at all. This tells me that the current amount of data is too little - over the next couple of weeks I will work on adding a few more seasons of data.

I also tried using RPI 100-200 Wins, Road/Neutral Wins - Losses, and Last 12 record, but they were all showing up as "The better the team did in this category, the lower seed they received", which tells me that those categories have little to no predictive value.

However, I did spend a lot of time on this, and I'd like to have something to show for it, so I'll go ahead and analyze it anyway. So what does this model tell us? Well... the obvious. A little guy knocking off an elite team (which autobid teams did only 6 times in the last two seasons) bumps them up about a seed line. A team with an RPI of 110 can be expected to be a seed line above a team with an RPI of 220 (Note that the RPI coefficient is only so low because the variable takes on such large values). If you win the WAC (the 11th ranked conference, by Pomeroy rankings), you can expect that to count for two 2 seed lines above the SWAC champion.

Who you lose to means little - even if all of your losses were to teams with an RPI over 100, you wouldn't expect it to change your seed. 10 losses to teams in the top 100 is worse than 5 losses to teams outside the top 100, simply because they lost that chance to pad their win count. This isn't evidence in support of scheduling crappy teams (I'm looking at you, Brey) - the RPI hit from scheduling all bad teams negates the benefit of piling up wins.

I'm not sure exactly why Non-elite top 50 wins causes seed to go up. I suspect that means that when the committee is seeing teams that they are seeding below the 7th line show up as top 50 wins, they discount those wins (Big difference between Kansas and Siena). Instead of counting them as a "half-win" in the 1-50 column, though, they just disregard them altogether. This may be completely false, but its an interesting theory - I know when I see that one of a team's top 50 wins is against Wichita St., I tend to not give them as much credit.

Summary: To get a better seed, you can A) beat elite teams, B) pile up wins, C) lower your RPI, or D) get a better conference. Sigh. As I said earlier, the obvious. Side Note: D might actually be a case of mixing causal relationships - its possible (and indeed probable) that the same factors that make a team more likely to get a good seed also raise the conference ranking.

No comments: