Hello, sports analytics apologist here. I saw your article referenced in a Defector article so thought I'd chime in.
First, as others have mentioned, I think this kind of observational analysis is super important to pressure-test sports analytics maxims like G42D8. Often times, these analytics ideas do not really incorporate any intangible and/or psychological factors that may be present in sports (momentum, clutch, chemistry, etc). I do think you are on to something but did have a few comments:
(Semantics point): G42D8 is not really based on modeling and empirical evidence, but rather math with strict assumptions. If you assume your OT win rate is 50% AND that you have enough time for exactly only one more offensive possession in regulation, you really only need your 2P conversion success rate to be ~35% in order for it to be worth it on paper. I read your previous article, so yes I know that the chances of winning the game are extremely low no matter what. G42D8 is not a silver bullet, but rather a last-ditch effort to improve win probability as much as possible. The reason I point this out is that the assumptions do some heavy lifting in the "G42" argument.
How many of the 17 wins in your "Kick" dataset did the comeback team have multiple regulation possessions after the first TD? G42D8 assumes that the comeback team only has time for one more possession in regulation. If comeback team scores the TD+XP, gets ball back and scores another TD+XP, and gets ball back again and kicks a FG, then I would argue that there was too much time on the clock for this truly to be a G42D8 scenario. The only outcome in which "Kick" would be better is if they make both XPs to send the game to OT, and then they win in OT (while in the counterfactual, they miss both 2P conversions so game does not reach OT).
Perhaps the argument is that analytics people are over-applying G42D8, and in cases where the comeback team has more bites at the apple. I have definitely seen Seth use it where there was clearly too much time in the 4th. I agree that there should be more nuance about when in the fourth quarter it actually applies. Or maybe it's that you cannot reliably predict how many more possessions you have left in a game, which makes G42D8 moot.
Another key assumption is that the probability of getting the ball back without surrendering points is independent of the decision to go for 2 or kick. In reality, missing a 2-point try might affect defensive morale or aggressiveness, just as converting it could give a psychological boost. That’s an interesting empirical angle to explore. Similarly, it would be interesting to analyze whether the comeback team is truly 50% in OT. Could be they are more likely to win because of something like momentum. Could be they are less likely to win because they are probably the inferior team.
Sample sizes are small, but they are always small in football so I'm not going to dismiss your analysis for that reason. It does mean though, as you mention, that we kind of have to look at this anecdotally, and understand the contexts of each game, which is why I asked about the context of the 17 wins earlier. At the end of the day, I just hope more teams continue to go for 2 so we can get more data!
I mean, I like it when teams go for 2 because it's way more fun than extra point attempts! The best part about *football* analytics is that it always argues for less conservatism and more chaos. That's definitely not the case across sports.
But that's also probably why the football analytics models are fundamentally broken. It's similar to how, if you just maximize pure expectation in gambling, you'll go bust. A more conservative "Kelly-style" betting strategy is better in the long run.
I would not be surprised if missing the two-point conversion dooms your chances of getting the ball back. This is certainly consistent with the small data we have.
And to your question, all of the 17 wins after going for 1 down 8 are a little different. Here are the games:
Take the Broncos game, where they (a) stopped the eagles, (b) scored and went for 2 successfully. (c) stopped the eagles again (!) and (d) kicked a field goal. In this case, they stopped the opposing team from scoring twice in regulation. The 2025 Broncos are a weird team.
But then there's this Browns-Ravens game from 2023, where the browns go for 1 down 8, and then get a pick 6 on the next drive and miss the extra point, but manage to get the ball back and get a field goal.
These are both just to say, the base model for the whole G42D8 thing is so niche that it can't control decision making. So many things can happen in football that you can't latch your decisions to one hypothetical that has never actually occurred.
Regardless, thank you for the very thoughtful comment. I think we agree more than we disagree.
Is it possible to look at cases only where the trailing team scored the two regulation TDs required to fully test the theory? I.e., strip out times where they went for two/kicked the XP but then never scored again anyway.
Could you tell me what you think this analysis would show? The decision of kicking the XP or going for 2 happens before you know how the defense will respond on the next drive.
Great post. I would lean more to this being a case of insufficient sample size or possibly selection bias (presumably teams that are down by 14 are more likely to have weaker offenses and thus poorer 2P conversion rates?). I think the broader point of confirming analytics with actual data is absolutely valid, though.
I'd be interested in the "down 5" and possibly "down 4" data - the original 538 post called those out also as places where teams should be more aggressive with the 2P vs 1P decision, also.
FWIW: there is only one game in the past ten years where a team was down 5 in the 4th quarter and kicked an extra point. Sometimes it's obvious that you should go for 2!
Have there been cases where the team trailing by 14 scored twice, and then failed on *both* 2-point attempts, costing them an opportunity to go to OT? Then I can buy there's a downside to go for 2 down 8.
I don't see why this would justify the strategy. Yes, there are no examples of teams losing by 1 after failing both two-point attempts. But in each of the 18 times that teams missed the first attempt, all of them lost. Obviously, the sample size is too small to generalize from here, but the strategy hinges on it being possible to get to overtime if you miss the first 2PA, and we have zero evidence that this ever happens.
"But in each of the 18 times that teams missed the first attempt, all of them lost. "
As you point out, A) That's a tiny sample size, and B) They lost because they failed to score a second touchdown. Kicking the extra point the first time does not change that. They would have still lost.
And sure, maybe the instances where the team went for two, got it, and won, had a little luck on their side, but comebacks from multi-score deficits often do involve luck! And in the case where the opposing team missed the field goal: If they had made the field goal, you lose no matter what (where you went for two to get a 1 point lead or tied the game by kicking 2 XPs. Hard to make the argument for 2 XPs to be the optimal strategy in that case, since you either lose, or go to overtime (where you still might lose).
Being down by 14 points in the 4th quarter is a bad position to be in, and scoring two unanswered touchdowns is unlikely most of the time. I can't really get too worked up about the 1 vs 2 point decision, since you're in a "we're desperate and have nothing to lose" situation.
The thing that really annoys me is not G42D8 in the 4th, it's when coaches G42D8 in the 3rd or even the 2nd quarter. It's far too early to think about tying up the score but most coaches can't resist going for 2 as if they are assuming there will be no more scores for the rest of the game.
the coaches who chose G2D8 are not a fair random sample; it is not the case that 10% of the time the team behind gets sorted into the experiment arm and goes for it. There are (presumably) a couple guys who in this situation have decided to do it and generally do, given the opportunity. If those guys have shitty teams, you'd expect them to fail more.
My man, it has worked exactly twice, and I have described the cases in detail. It is clearly not a random sample, and the G42D8 strategy working is an aberration not a statistical regularity.
Hello, sports analytics apologist here. I saw your article referenced in a Defector article so thought I'd chime in.
First, as others have mentioned, I think this kind of observational analysis is super important to pressure-test sports analytics maxims like G42D8. Often times, these analytics ideas do not really incorporate any intangible and/or psychological factors that may be present in sports (momentum, clutch, chemistry, etc). I do think you are on to something but did have a few comments:
(Semantics point): G42D8 is not really based on modeling and empirical evidence, but rather math with strict assumptions. If you assume your OT win rate is 50% AND that you have enough time for exactly only one more offensive possession in regulation, you really only need your 2P conversion success rate to be ~35% in order for it to be worth it on paper. I read your previous article, so yes I know that the chances of winning the game are extremely low no matter what. G42D8 is not a silver bullet, but rather a last-ditch effort to improve win probability as much as possible. The reason I point this out is that the assumptions do some heavy lifting in the "G42" argument.
How many of the 17 wins in your "Kick" dataset did the comeback team have multiple regulation possessions after the first TD? G42D8 assumes that the comeback team only has time for one more possession in regulation. If comeback team scores the TD+XP, gets ball back and scores another TD+XP, and gets ball back again and kicks a FG, then I would argue that there was too much time on the clock for this truly to be a G42D8 scenario. The only outcome in which "Kick" would be better is if they make both XPs to send the game to OT, and then they win in OT (while in the counterfactual, they miss both 2P conversions so game does not reach OT).
Perhaps the argument is that analytics people are over-applying G42D8, and in cases where the comeback team has more bites at the apple. I have definitely seen Seth use it where there was clearly too much time in the 4th. I agree that there should be more nuance about when in the fourth quarter it actually applies. Or maybe it's that you cannot reliably predict how many more possessions you have left in a game, which makes G42D8 moot.
Another key assumption is that the probability of getting the ball back without surrendering points is independent of the decision to go for 2 or kick. In reality, missing a 2-point try might affect defensive morale or aggressiveness, just as converting it could give a psychological boost. That’s an interesting empirical angle to explore. Similarly, it would be interesting to analyze whether the comeback team is truly 50% in OT. Could be they are more likely to win because of something like momentum. Could be they are less likely to win because they are probably the inferior team.
Sample sizes are small, but they are always small in football so I'm not going to dismiss your analysis for that reason. It does mean though, as you mention, that we kind of have to look at this anecdotally, and understand the contexts of each game, which is why I asked about the context of the 17 wins earlier. At the end of the day, I just hope more teams continue to go for 2 so we can get more data!
I mean, I like it when teams go for 2 because it's way more fun than extra point attempts! The best part about *football* analytics is that it always argues for less conservatism and more chaos. That's definitely not the case across sports.
But that's also probably why the football analytics models are fundamentally broken. It's similar to how, if you just maximize pure expectation in gambling, you'll go bust. A more conservative "Kelly-style" betting strategy is better in the long run.
I would not be surprised if missing the two-point conversion dooms your chances of getting the ball back. This is certainly consistent with the small data we have.
And to your question, all of the 17 wins after going for 1 down 8 are a little different. Here are the games:
https://stathead.com/sharing/DQ3Un
Take the Broncos game, where they (a) stopped the eagles, (b) scored and went for 2 successfully. (c) stopped the eagles again (!) and (d) kicked a field goal. In this case, they stopped the opposing team from scoring twice in regulation. The 2025 Broncos are a weird team.
But then there's this Browns-Ravens game from 2023, where the browns go for 1 down 8, and then get a pick 6 on the next drive and miss the extra point, but manage to get the ball back and get a field goal.
These are both just to say, the base model for the whole G42D8 thing is so niche that it can't control decision making. So many things can happen in football that you can't latch your decisions to one hypothetical that has never actually occurred.
Regardless, thank you for the very thoughtful comment. I think we agree more than we disagree.
Is it possible to look at cases only where the trailing team scored the two regulation TDs required to fully test the theory? I.e., strip out times where they went for two/kicked the XP but then never scored again anyway.
Could you tell me what you think this analysis would show? The decision of kicking the XP or going for 2 happens before you know how the defense will respond on the next drive.
If you don't stop them from scoring and get the ball back then in does not matter which one you picked.
I guess my thought is that if you fail to score the second TD, there's no way of knowing whether the first XP/2PT decision was "correct" or not.
That's exactly right. G42D8 assumes:
1) You stop the opposition and get the ball back
2) You score the potential GW/GT touchdown at or near the end of the game.
Unless both of those conditions are met then there is no difference in outcome between G42D8 or just kicking the XP.
Ben,
Great post. I would lean more to this being a case of insufficient sample size or possibly selection bias (presumably teams that are down by 14 are more likely to have weaker offenses and thus poorer 2P conversion rates?). I think the broader point of confirming analytics with actual data is absolutely valid, though.
I'd be interested in the "down 5" and possibly "down 4" data - the original 538 post called those out also as places where teams should be more aggressive with the 2P vs 1P decision, also.
FWIW: there is only one game in the past ten years where a team was down 5 in the 4th quarter and kicked an extra point. Sometimes it's obvious that you should go for 2!
I'd also be curious about those down 4 or 5 results. I'll need to run those.
Maybe I should attach this to the main post, but working with stathead is super easy and very inexpensive.
If you go here: https://stathead.com/football/play_finder.cgi
The relevant filters are
Game Filters:
Game result: Won/Lost/Any
Play Filters:
Quarter: 4th
Play Type: Extra Point/Field Goal
Scoring Margin: Between -8 and -8
Have there been cases where the team trailing by 14 scored twice, and then failed on *both* 2-point attempts, costing them an opportunity to go to OT? Then I can buy there's a downside to go for 2 down 8.
I don't see why this would justify the strategy. Yes, there are no examples of teams losing by 1 after failing both two-point attempts. But in each of the 18 times that teams missed the first attempt, all of them lost. Obviously, the sample size is too small to generalize from here, but the strategy hinges on it being possible to get to overtime if you miss the first 2PA, and we have zero evidence that this ever happens.
"But in each of the 18 times that teams missed the first attempt, all of them lost. "
As you point out, A) That's a tiny sample size, and B) They lost because they failed to score a second touchdown. Kicking the extra point the first time does not change that. They would have still lost.
And sure, maybe the instances where the team went for two, got it, and won, had a little luck on their side, but comebacks from multi-score deficits often do involve luck! And in the case where the opposing team missed the field goal: If they had made the field goal, you lose no matter what (where you went for two to get a 1 point lead or tied the game by kicking 2 XPs. Hard to make the argument for 2 XPs to be the optimal strategy in that case, since you either lose, or go to overtime (where you still might lose).
Being down by 14 points in the 4th quarter is a bad position to be in, and scoring two unanswered touchdowns is unlikely most of the time. I can't really get too worked up about the 1 vs 2 point decision, since you're in a "we're desperate and have nothing to lose" situation.
The thing that really annoys me is not G42D8 in the 4th, it's when coaches G42D8 in the 3rd or even the 2nd quarter. It's far too early to think about tying up the score but most coaches can't resist going for 2 as if they are assuming there will be no more scores for the rest of the game.
the coaches who chose G2D8 are not a fair random sample; it is not the case that 10% of the time the team behind gets sorted into the experiment arm and goes for it. There are (presumably) a couple guys who in this situation have decided to do it and generally do, given the opportunity. If those guys have shitty teams, you'd expect them to fail more.
My man, it has worked exactly twice, and I have described the cases in detail. It is clearly not a random sample, and the G42D8 strategy working is an aberration not a statistical regularity.