"They suggest G42D8 increases the probability of winning by over 12%, while more realistic calculations show the strategy increases the chances of winning by less than 1% (at best!)"

You're conflating two very different things here. The 12% is based on a bunch of assumptions, and only considers the case where the team down 8 after the 6pt TD will score a second touchdown and the other team will not score. From what I can tell the "less than 1%" from Albert's work is an average of total win probability added across a large range of circumstances.

When she looks at a specific case that most closely matches the scenario the G42D8 advocates are referring to, her work shows a 15% increase in win probability. It looks like the dynamic programming approach strongly supports G42D8.

As I argue in a response to T Coddington, relative risk is a sketchy concept that needs to be used carefully. I double my odds of winning the lottery if I buy two tickets. But that's not an argument to buy two tickets.

It's hard to take you seriously when you say "G42D8 madness is flat-out wrong" and then essentially concede that it is correct before adding some general caveats.

This week's discussion came after the TB-Det game where the situation ended up matching with Albert's last slide (4 possessions remaining). If I can choose between a lottery ticket with an 11.2% chance and one with a 12.9% chance (a better lottery analogy) then choosing the second is absolutely the right choice.

If you want to try to pump the brakes on the vibe of certainty that surrounds football analytics, I think you really failed here. The analytics boosters tend to be a lot more rigorous than this argument and while there is absolutely a vibe of overconfidence that undersells the amount of uncertainty in most numbers, that is very much a product of fighting battles against those who are strongly anti-math (as Udit Ranasaria described). If you want to influence those actually favorable to analytics then IMO you need to pick a better target than G42D8.

I can't seem to get the embedded presentation on Laura's blog into a readable size, but in the text she mentioned going for 2 increased your odds from 11.2% to 12.9%... that would imply you have increased your chances of winning by (12.9-11.7)/11.7 = 15%, no? Still a far cry from 65%, and does not include all of the other factors you mention, but am I missing something?

Speaking of the psychological factors, it would be interesting to model the tradeoffs:

- making the 2 pt conversion increases your momentum

- making a 1 pt conversion also increases momentum, but not to same degree

- missing either is probably a bit demoralizing

Sorry for long note, but what also occurs to me is that all of these "the analytics say" during football use population statistics which may be very different than the specific teams in question. Suppose you have a great short yardage offense & your opponent has a weak short yardage defense. Obviously the probabilities change dramatically from the reverse situation.

1. even if you just care about relative improvement, the "brain dead argument" that people have been pushing with the flow chart I sent argues that you get a (62.5-50)/50 = 25% improvement in chances. So it's smaller than what they say.

2. Relative risk is a sketchy concept that needs to be used carefully. I double my odds of winning the lottery if I buy two tickets. But that's not an argument to buy two tickets.

3. I totally agree that every team has their own statistics. I think they use them! But their in-game sample size is then 32x smaller, so their uncertainty is roughly 6 times larger. This is why you can't rely on too much statistical thinking for football decisions. Good decision makers and coaches can judge these "subjective risks" without relying on statistical calculations.

On number 3, the sample size is way smaller than that because I'm guessing it makes a big difference if you're playing the Rams and Aaron Donald is out injured, or playing but turned his ankle in the 1st quarter and less than 100% etc. In these decisions (and most non-trivial decisions in life?) , analytics are probably a great tool to quickly put you in the solution space of "not doing something really stupid", but will rarely point to an "easy & obvious" optimal strategy.

yes, I don't want to write about analytics anymore, but my main addendum would be exactly what you say. my personal argument for this sort of analysis is to show that G42D8 is not a terrible idea. In fact, your odds of winning might be marginally better if you execute correctly. Explaining that through a model, even an imperfect model, is valuable for coaches. But deciding what to do in any given moment requires much more context than a single abstract calculation.

Most teams use a "RED", "YELLOW", "GREEN" bucketing system so coaches can make decisions after the raw numbers give them an idea.

The other part of the conversation that is lost is that "analytics bros" (yea I am one) speak loudly/prominently on the subject because the league averages are pretty far off from models (although theyve been moving towards each other over the last 5 years), then we will harp less on individual decisions and be much more accepting of coaches making circumstantial decisions as potentially optimal. But given the league-wide skew, its hard to feel confident that the average SMEs opinion has the numbers right.

I believe the increase is 62.5%-50%=12.5% in the commonly advocated case. As Y Olej points out above, her work supports the same conclusion (and fwiw, no one is arguing that the total magnitude of WP effect is large because it only matters in an unlikely situation).

As someone who has worked for an NFL team, they do work to use conversion rates for their own team and adjust the math. The problem is that the NFL think-tank moves glacially slowly and people like to overemphasize the impact that that will make a huge difference. Errors with regards to team quality should go both ways, so the league wide mean for 4th down and 2pt decisions should be somewhat centered around the mean rate models would suggest but it's a far cry from that. Of course networks are going to generalize and shallowly analyze each decision because it's complex and only interesting to what's on the screen.

I don't know the scoring rules, but let me spell out the argument as I understand it. There's some probability p that the team making the choice will recover the ball and score another touchdown. If that doesn't happen, with probability 1-p, the team loses whichever choice is made. So, if a strategy increases your probability of winning conditional on scoring by q, the actual increase is only pq, which is << q if p << 1. In particular, if p < 0.08, then q = 0.125 implies pq<0.01

But I don't see that this proves much, in the absence of an argument that choosing the 2-point strategy reduces your chance of another TD. A game is made up of lots of factors few of which, taken alone, will change the odds by more than 0.01. And bad choices might psych the opposition, in which case they help, but it must be at least as likely that they will increase confidence.

As Damon Runyon put it "The race may not be to the swift, nor the battle to the strong, but that's the way to bet"

I agree with you that G42D8 will definitely not increase your win probability by 12%, as this 12% is only conditional on a lot of other things going your way (i.e. getting a stop and scoring a TD on the next drive).

However, I disagree with the following:

"Under completely unrealistic assumptions, G42D8 improves the chances of winning by less 1%. If there is any uncertainty in these calculations at all, then you might even be putting yourself at a disadvantage!"

This quote conflates the **magnitude** of the difference in win probabilities between two strategies, and our **confidence** about which is the right strategy. We can be very confident in our decision, even if the magnitude is small.

As an example, suppose you are down 21 midway through the fourth quarter. You face a choice between (i) try to score, and (ii) kneel out the game. Under both strategies, your chance of winning is very small -- let's say under 1%. Then the gain in win probability from trying to score is less than 1%, but it is nevertheless very clear that the best strategy is to try to score, rather than take a knee.

I think a similar thing is happening here (though it is a bit more subtle): G42D8 is the better strategy for almost all reasonable assumptions about success probabilities, but in an absolute sense, it doesn't increase your win probability all that much because you are down 8 and need a stop and a score to have any chance.

edited Jan 24"They suggest G42D8 increases the probability of winning by over 12%, while more realistic calculations show the strategy increases the chances of winning by less than 1% (at best!)"

You're conflating two very different things here. The 12% is based on a bunch of assumptions, and only considers the case where the team down 8 after the 6pt TD will score a second touchdown and the other team will not score. From what I can tell the "less than 1%" from Albert's work is an average of total win probability added across a large range of circumstances.

When she looks at a specific case that most closely matches the scenario the G42D8 advocates are referring to, her work shows a 15% increase in win probability. It looks like the dynamic programming approach strongly supports G42D8.

As I argue in a response to T Coddington, relative risk is a sketchy concept that needs to be used carefully. I double my odds of winning the lottery if I buy two tickets. But that's not an argument to buy two tickets.

It's hard to take you seriously when you say "G42D8 madness is flat-out wrong" and then essentially concede that it is correct before adding some general caveats.

This week's discussion came after the TB-Det game where the situation ended up matching with Albert's last slide (4 possessions remaining). If I can choose between a lottery ticket with an 11.2% chance and one with a 12.9% chance (a better lottery analogy) then choosing the second is absolutely the right choice.

If you want to try to pump the brakes on the vibe of certainty that surrounds football analytics, I think you really failed here. The analytics boosters tend to be a lot more rigorous than this argument and while there is absolutely a vibe of overconfidence that undersells the amount of uncertainty in most numbers, that is very much a product of fighting battles against those who are strongly anti-math (as Udit Ranasaria described). If you want to influence those actually favorable to analytics then IMO you need to pick a better target than G42D8.

I can't seem to get the embedded presentation on Laura's blog into a readable size, but in the text she mentioned going for 2 increased your odds from 11.2% to 12.9%... that would imply you have increased your chances of winning by (12.9-11.7)/11.7 = 15%, no? Still a far cry from 65%, and does not include all of the other factors you mention, but am I missing something?

Speaking of the psychological factors, it would be interesting to model the tradeoffs:

- making the 2 pt conversion increases your momentum

- making a 1 pt conversion also increases momentum, but not to same degree

- missing either is probably a bit demoralizing

Sorry for long note, but what also occurs to me is that all of these "the analytics say" during football use population statistics which may be very different than the specific teams in question. Suppose you have a great short yardage offense & your opponent has a weak short yardage defense. Obviously the probabilities change dramatically from the reverse situation.

All valid points. A few responses here:

1. even if you just care about relative improvement, the "brain dead argument" that people have been pushing with the flow chart I sent argues that you get a (62.5-50)/50 = 25% improvement in chances. So it's smaller than what they say.

2. Relative risk is a sketchy concept that needs to be used carefully. I double my odds of winning the lottery if I buy two tickets. But that's not an argument to buy two tickets.

3. I totally agree that every team has their own statistics. I think they use them! But their in-game sample size is then 32x smaller, so their uncertainty is roughly 6 times larger. This is why you can't rely on too much statistical thinking for football decisions. Good decision makers and coaches can judge these "subjective risks" without relying on statistical calculations.

💯

On number 3, the sample size is way smaller than that because I'm guessing it makes a big difference if you're playing the Rams and Aaron Donald is out injured, or playing but turned his ankle in the 1st quarter and less than 100% etc. In these decisions (and most non-trivial decisions in life?) , analytics are probably a great tool to quickly put you in the solution space of "not doing something really stupid", but will rarely point to an "easy & obvious" optimal strategy.

yes, I don't want to write about analytics anymore, but my main addendum would be exactly what you say. my personal argument for this sort of analysis is to show that G42D8 is not a terrible idea. In fact, your odds of winning might be marginally better if you execute correctly. Explaining that through a model, even an imperfect model, is valuable for coaches. But deciding what to do in any given moment requires much more context than a single abstract calculation.

Most teams use a "RED", "YELLOW", "GREEN" bucketing system so coaches can make decisions after the raw numbers give them an idea.

The other part of the conversation that is lost is that "analytics bros" (yea I am one) speak loudly/prominently on the subject because the league averages are pretty far off from models (although theyve been moving towards each other over the last 5 years), then we will harp less on individual decisions and be much more accepting of coaches making circumstantial decisions as potentially optimal. But given the league-wide skew, its hard to feel confident that the average SMEs opinion has the numbers right.

I believe the increase is 62.5%-50%=12.5% in the commonly advocated case. As Y Olej points out above, her work supports the same conclusion (and fwiw, no one is arguing that the total magnitude of WP effect is large because it only matters in an unlikely situation).

As someone who has worked for an NFL team, they do work to use conversion rates for their own team and adjust the math. The problem is that the NFL think-tank moves glacially slowly and people like to overemphasize the impact that that will make a huge difference. Errors with regards to team quality should go both ways, so the league wide mean for 4th down and 2pt decisions should be somewhat centered around the mean rate models would suggest but it's a far cry from that. Of course networks are going to generalize and shallowly analyze each decision because it's complex and only interesting to what's on the screen.

Thank you for spelling out the assumptions behind these probabilities.

I don't know the scoring rules, but let me spell out the argument as I understand it. There's some probability p that the team making the choice will recover the ball and score another touchdown. If that doesn't happen, with probability 1-p, the team loses whichever choice is made. So, if a strategy increases your probability of winning conditional on scoring by q, the actual increase is only pq, which is << q if p << 1. In particular, if p < 0.08, then q = 0.125 implies pq<0.01

But I don't see that this proves much, in the absence of an argument that choosing the 2-point strategy reduces your chance of another TD. A game is made up of lots of factors few of which, taken alone, will change the odds by more than 0.01. And bad choices might psych the opposition, in which case they help, but it must be at least as likely that they will increase confidence.

As Damon Runyon put it "The race may not be to the swift, nor the battle to the strong, but that's the way to bet"

I agree with you that G42D8 will definitely not increase your win probability by 12%, as this 12% is only conditional on a lot of other things going your way (i.e. getting a stop and scoring a TD on the next drive).

However, I disagree with the following:

"Under completely unrealistic assumptions, G42D8 improves the chances of winning by less 1%. If there is any uncertainty in these calculations at all, then you might even be putting yourself at a disadvantage!"

This quote conflates the **magnitude** of the difference in win probabilities between two strategies, and our **confidence** about which is the right strategy. We can be very confident in our decision, even if the magnitude is small.

As an example, suppose you are down 21 midway through the fourth quarter. You face a choice between (i) try to score, and (ii) kneel out the game. Under both strategies, your chance of winning is very small -- let's say under 1%. Then the gain in win probability from trying to score is less than 1%, but it is nevertheless very clear that the best strategy is to try to score, rather than take a knee.

I think a similar thing is happening here (though it is a bit more subtle): G42D8 is the better strategy for almost all reasonable assumptions about success probabilities, but in an absolute sense, it doesn't increase your win probability all that much because you are down 8 and need a stop and a score to have any chance.