Thursday, April 09, 2015

Looking at numbers

I have a job for the Atlanta based AUDL team the Atlanta Hustle.  I think my title is technically "advance scout" which basically means I'm going to look at film and try to give assessment docs to the coach so he can map out practices and strategy.

Anyway, the GM is a numbers guy, so I think I'll need some numbers/video to back up my assessments.  Good thing the later is my strong suit.  The former I need some work on, so that is what I have been doing recently.

I have a poor relationship with ultimate statistics.  I loved the work that Ultiworld did a long time ago through tracking every flipping pass in NexGen games.  But that app is dead, and I don't really know to do with a chart that shows me a team's scoring probability based on field position.  I guess it could expose that a team is particularly weak in the coffin corner, but in general I don't know what else I can get from that information.

So this high school season, since I am a sub-called, I decided to play around with statistics for the team.  We haven't been having our most successful year so I started tracking the number of unforced errors and the number of possessions.  It hasn't really righted the ship, but it is interesting what the numbers show.

First a bit on "unforced errors."  In order to get around the tricky subjectivity problem I decided that if the defense doesn't touch the disc it is an unforced error.  There are some strange things that fall into that category that make the numbers hard to really analyze.  For example a punt in the wind is considered an unforced error, as is a high stall punt that no one touches.  Also a jump-ball that the defense doesn't actually touch (but they clearly influenced the play) counts as an unforced error.  It would make sense to fix some of these issues, but then we get into the subjectivity of "was that a punt or a huck too far?"  Or "how much did the defender actually influence that drop?"  These are things that I want to avoid so I'm keeping it pure.

Basically unforced errors are possessions that end with you giving away the disc, which means you aren't making the defense take it from you, which is a bad thing . . . right?  So I took the number of unforced errors,  divided by the number of possessions and we now have UE% which tells us the percentage of possessions that end in unforced error.

For Paideia that number was frighteningly high (~50% or more at times) which pointed out how we were really beating ourselves.  If we could improve on that number then we would at least be asking more of our opponent.  I don't want to get into Paideia's season, but it has been having a good impact. One other thing I was able to track was what I am calling our "Conversion Rate" which is just the number of possessions divided by the number of scores.  An average of this over multiple games tells us how many times we need the disc (on average) in order to score.

Here is where I feel like I got into something that was useful, tracking possessions.  In the past many teams have been concerned with offensive holds and defensive breaks.  But a defensive break that requires 4 possessions to score isn't the same as break that only takes one.  I think moving away from line-based statistics and moving toward possession based statistics will offer some new insights to analyzing the sport.

After playing with this for a little bit I wondered what was a reasonable UE%?  Did it change per level of play?  So I set off looking at college games from this season.  I have made it through just over 20 games and here is what I have found.

The average UE% of the games that Ultiworld has filmed is around 30.79%.  The funny thing is that the average for a winning team isn't any lower than that of a losing team (30.75% vs 30.83%).  What is even more interesting is winning the UE% battle isn't a good indicator of success.  Plenty of teams have won their games despite having a worse UE% than their opponent.  I guess this would speak to the idea of there being "good" turnovers.  This metric still gives us a glimpse of how many times on average a good/elite college team will just give you the disc back.  Looking at the similar numbers for the college women's game and the club games might provide more support for what we assume (better levels of play make fewer "mistakes").

The real insight came from conversion rate.  First of all, looking at the conversion rate of a single game is boring.  Because possessions for each team are never more than one away from each other if you win the game you won the conversion rate.  This is one of the things I hate about certain statistics like "breaks" and "turns."  Guess what, if you get more breaks than your opponent you won the game.  If you commit fewer turns, you won the game.

But this metric did offer some insight over a number of games.  For example, there seems to be a clear line between the best teams and the next step down.  Elite teams (Pitt, Oregon, UNCW) have a conversion rate that is typically sub 2.00.  Other teams tend to operate above that mark, with some of the worst being as high as 3+ (which is where my high school team operates at times).  It is no surprise that the average for winning teams would be lower than that for losing teams.  Conversion Rate basically tells you the number of possessions you need to score (on average) so winning means a lower number. The winning teams have an average of 2.06 while the losing teams have an average of 2.47.  There are some teams that lose with a CR below 2.  Those are typically good, or at least efficient, games.

What I'm curious about now is how good a predictor average CR for a pair of teams is for the game's outcome.  In general does the team with the lowest CR win future games.  How should standard deviation of CR play into that calculation.  Washington has a poor CR (2.41) but was able to post one sub-2.0 number.  Could they get hot and beat an elite team by putting up their best efficiency number (1.83).  Pitt (1.76!) has a fairly stable CR, so the likelihood of getting a "bad" game out of them seems low.

I feel like there is some room for innovation there.  Given enough data we could look at the effect a "good defensive team's"impact on opponent's CR as compared to that opponent's average CR.  Anyway, I have to go.  Hopefully I didn't ramble too much.


Matt said...

The CR statistic seems interesting but your sampling method may be biased towards better teams. You'll probably need to look at a format where every game is filmed to correct the results.

Since you're only looking at filmed games you only have two types of games - "interesting" pool play games and the end of bracket play.

The problem with the "interesting" qualifier is it hurts the lower tiered teams. So, for example, Texas will only be filmed (during pool play) if they play against a top team like Pitt. However Pitt will be filmed if they play against a top 20 team or another top 10 team.

Bracket play is even worse. If you win you are more likely to be included in the sample. So at the Stanford Invite, Washington didn't get their final game filmed but Pitt did. This means if your team has a bad game it doubly effects your CR as you don't get the second data point.

Martin said...

Good point, but I can only work with what I've got. There is definitely a bias, not only because of the choice of game filmed but also the events the filming occurs. I will get almost no data from conferences, and likely little from regionals. The same goes for smaller tournaments. But this is a beginning and we can see where it goes from here. I will reserve making any large declarative statements until the sample size is large enough.

Flo said...

If you are looking at a teams CR over several games, you should also look at opponents CR over several games as a statistic. You collected the data, you might as well look at it this way.

A low CR means some combinations of two things: A good offense and a defense that creates turn overs in advantageous positions.

A high OCR means good defense and an offense that mostly turns it over close to the red zone (and maybe good pulls).

The Carr said...

My favorite stat to record is touches- gives you great individual stats like completion % and usage %. However, my team doesn't have a designated stat taker, so touches are too difficult for me to record while coaching.

Possessions, however, are pretty easy to keep track of while still coaching and calling lines. I love calculating conversion %, and it's a very telling stat for me to use to look at how my O and D lines are performing. I know that even if my O line didn't get broken in a game, if we only converted on 40% of our possessions, we were in trouble if we played a team with a more efficient offense. It helped me keep my players grounded and focused on our performance regardless of the game score.

In a situation where you know the conversion rates of both O and D lines of both teams about to play, those stats would help you predict the type of game you were about to see. Say a good team has an O-line with a relatively high conversion rate and they're playing a lower seeded team that has a D-line with a low conversion rate- that would set off the upset alert: A team with a high risk offense facing off against a D that might not be great at getting turns, but is good at converting turns into breaks.

Oscar Iguaro said...
This comment has been removed by a blog administrator.