Saturday, April 30, 2016

Statistics Update, and a note on subbing patterns

I really should be working on the cutting tree, but with my coaching future in question and the high school season in full swing I have tabled that to work on some other things.

I have continued collecting data on what I think is are some novel statistics.  After a year of tracking these things, I have actually gotten a somewhat reasonable set of data and I feel like conclusions can start to be made.

The first one I've talked about before: it is percentage of possessions that result in an unforced error (%UE).  Conceptually this is "bad," but not necessarily directly correlated with score.  In a windy game the %UE might go up very high, even if the teams are really good.  Unlike "breaks" or "offensive efficiency," you can lose a game where you have a better %UE.  Categorizing %UE is a little subjective, but I have been using the rule of if the defense gets a hand on it then it was "forced." Everything else falls in "unforced" and is counted.  Paideia's average %UE for the season so far is ~47%, which is pretty good.  Basically we give up the frisbee just under half of the time we have it . . . sounds bad, but when compared to individual games where the number is 60+% this is fine.

Since I had enough data to try this, I have wondered what this does correlate with.  Anecdotally we would assume that this correlates with wins because the lower your %UE the less you are "screwing up."  But I decided to take it one step farther and run a correlation with the final point differential of the game.  As I get more games I have been updating the data, and it currently has an R^2 value of 0.385.  Honestly I don't know if that is "good," and everything I read tells me that just looking at the R^2 value isn't enough to deem good or bad anyway.  But the important thing that I note is how that number goes up.  In general, aside from the data collected at a windy tournament, the correlation coefficient is getting larger as I enter more games and we play in more tournaments.   I'll keep taking the data, and seeing if %UE is actually a reasonable indicator of team success (without being trivial because it IS team success . . . looking at you "breaks").

I also have continued to play around with number of possessions per goal (PPG).  There isn't as much useful data there.  I have lots of good  number for my team, including a drubbing by Amherst where we played well (-9, but a good -9) and had a 6.2 and a horrible game against Grady (-2, but we played like shit) and also had a 6.2.  PPG isn't useful in a single game because it is just the score.  If you win PPG you won that game.  But I am curious if in general teams with lower average PPGs beat opponents with larger average PPGs.  In order to figure that out we would need more data from multiple teams, not just the data from one.

Lastly I have started using the score sheets from the past two years worth of Paideia games to build a win probability matrix.   Basically my win probability (WP) is the likelihood of the team with X winning when the score is X-Y. To expand this beyond one team I had the volunteers for Paideia Cup track the order of scores so they could be entered.  So far the only parameter of win probability that is being tracked is current score state, but eventually with enough data it might be expanded to more parameters.  The goal is to be able to figure out which points are, on average, important.  Anecdotally, after a few games were entered, the likelihood of winning games when you were up 9-7 was 0% (again, only a few games) and if you were ahead 10-6 it was 100%.  That means at 9-6 this point "really matters" and maybe it is worth brining in your best players.  Obviously those numbers will change when more games are entered, which is what I am doing for much of May.

Impact on subbing patters is the end goal of WP is to help me inform subbing patters.  I did something similar last year and it made me realize that leaving our "best" players on the field for three points was a waste.  I've used that to adopt a subbing style that keeps people rested, gives new players more responsibility and hopefully bends development curves upwards with a goal of stabilizing the 4 year sinusoidal graph that is "team quality."  But using that pattern had gotten us four straight losses to in-town rival Grady High School.  We knew we would play them in the State Championship so we practiced and implemented a different subbing pattern that looked more like a club/college offense/defense system.   We won all of our games by 9+ en route to a state championship.

I don't want to use that system all of the time, because it means some people are never on the field for an offensive point, and I think that hurts their development.  But having that information is interesting when coupled with WP numbers.  Maybe there is a time when a shift happens from a more team oriented subbing pattern to a more success driven pattern?  Figuring out when that switch should happen feels just as valuable as figuring out that our "best" players have a terrible conversion rate when they have been on the field for 3 points.

2 comments:

Mallory Stoker said...

I love your use of data to figure out Ultimate strategies.

I work in Operations Research and would love to use your data and help out with the analysis. Do you have any interest in sharing?

Martin said...

Um . . . sure? I don't know if I really have enough data to pull any statistical significance. I mostly tool around with small datasets trying to get a feel for what might be worth putting more time into and actually collecting enough data.

But back to the point. If you would like the data just give me a way to get you the .csv files and I'll send it your way.