Learning from losers
Games are always a microcosm, and that's why I'm hooked on writing about sportsball.
I never meant to start a blog series on American Football. Though the television ratings suggest otherwise, not everyone in the world is a fan of this stupid sport. I tried to break myself of the habit by strapping myself to an uncomfortable chair in front of a tiny panel playing NFL RedZone for 6 hours. The aversion therapy was exhausting, but I wasn’t cured. I blame Zac Taylor. His incompetent Cincinnati Bengals went for two down eight and lost to the Chicago Bears in the most spectacular way.
The Bengals were winning by three midway through the third quarter. They then managed to give up seventeen unanswered points in the next twelve minutes of gameplay. Nice job, team. 40-year-old elite quarterback Joe Flacco responded to the deficit by marching the Bengals down the field and throwing a back-breaking interception to Tremaine Edmunds, who ran it back 96 yards for a touchdown. That should have been the end of this stupid game. But the faceless replay assistant—a remote office of teleoperators that helps guide the NFL’s AI referees—inexplicably ruled that Edmunds was down by contact. Boo. The demoralized Bears almost threw in the towel.
On the next drive, Flacco throws a touchdown to bring the Bengals to within 8. They go for two and convert. Tired of explaining the G42D8 reasoning, the announcers just declared, “The Bengals did the analytics play!” The Bengals then line up for an onside kick, a play with an absurdly low single-digit success rate. Only two had been successful thus far this season. The Bengals failed to let the ball bounce 9 yards, which should have resulted in a failed attempt to end the game. Undeterred, the replay assistant demanded entertainment. It declared the ball had touched the foot of one of the Bears and handed the ball back to the Bengals. The Bengals immediately score a touchdown and kick the extra point to take the lead.
With only a minute left to play, the G42D8 gambit was surely vindicated. This was a textbook execution of the strategy. It was time for your humble blogger to eat his hat.
Aha, but no. Let’s not forget, it’s the Bengals. On the next drive, Bears quarterback Caleb Williams throws a nice pass up the middle to his rookie tight end Colston Loveland. Loveland is immediately met by two Cincinnati defenders who forget how to tackle. He breaks free of their inept grip and runs 40 yards down the field for a touchdown. The Bengals get the ball back with 17 seconds to go. Joe Flacco throws an interception. The Bengals lose.
Well, there you go.
Eternal G42D8 booster Seth Walder was demoralized: “NFL Next Gen Stats gave Loveland a 0.2% chance to score a touchdown at the time of the reception.” The fun part of being an analytics guy is you can never be wrong. When you get data that questions one of your core takes, you say, “Look, low probability events happen sometimes.” On the other hand, when your weird advice from your weird model bears fruit, you sometimes overstep your bounds on a victory lap.
Now, the astute reader and dedicated football fan knows that I’m playing exactly Walder’s game. That I’m ignoring the data I don’t like. Because just last week, for only the third time in NFL history, the G42D8 move worked.
Amazingly, this game also involved the Bengals.
In full desperation mode to avoid a winless season, the New York Jets were getting throttled by the Bengals. They found themselves down by 15 in the 4th quarter. We then saw the following 15 minutes of absurdity. The Jets
Score a touchdown down 15
Go for 2 down 9
Give up another touchdown
Score a touchdown down 14
Go for 2 down 8
Injure Joe Flacco with a hard sack.
Score a touchdown by letting your running back pass
Kick the extra point
Have Zac Taylor call a run on 2nd and 10 from the Bengals’ 44 with 42 seconds remaining.
The Jets not only pulled off the G42D8 but screwed up the G42D9 by letting the Bengals score. I understand it worked out, but a trainwreck of plays was required to pull it off. I could add to the list of necessary events for the Jets to win:
One of your most beloved franchise players must pass away on the day of the game
The owner of the team needs to call your quarterback incompetent in a national media interview
Your desired starting quarterback has to be sidelined with a knee injury.
Your opposing team must be the Bengals.
But G42D8 lead booster Seth Walder was over the moon last week. The Jets passed the G42D8 test. He took to social media to explain the genius of the Jets (yes, the 1 and 7 Jets who this week traded away two of their best players).
“Decision analysis on the Jets go for 2, down 8 yesterday
WP Go for 2: 12.3%
WP PAT: 11.2%
“BUT KEEP IN MIND, the WP difference would be much larger if we knew at that moment Jets would score another TD and hold Bengals -- which is the only world where this matters. That’s why it’s a big deal!”
Yes, it does indeed turn out that probabilities are different if you know the future.
On the other hand, the fact that the win probabilities were so close defeats his own argument. The idea that 1% difference in win probability translates to an actual strategy is preposterous. We’ve already discussed that win probability is hacky infotainment nonsense. A 1% difference in a highly sketchy model is not actionable for a single high-stakes decision. Walder emphatically disagreed:
“A common retort to the down 8 for 2 is that the WP difference is small, so why make a big deal about it?
But this is why.
1. it’s a clear decision with no justification to kick the PAT.
2. while most of the time it won’t matter, the times when it does it will help a lot.”
Of course, there is no shortage of justifications for kicking a PAT. To list a few: Teams have a limited number of 2-point conversion plays. It is not clear that 2-point conversions have a constant probability of success. Missing a two-point conversion is demoralizing. No team has ever won when missing the conversion down 8. Even Walder himself agrees that the same play at different times can have disparate effects on psychological factors and opposing strategies.
I’m hooked on this stupid G42D8 play because it is a delightful microcosm of datafied thinking. Both sides accuse each other of bias. Both sides present counts and caveats. Both sides overanalyze single events to justify actuarial decisions. No side finds dispositive proof. Is the Jets’ win a statistical fact or a statistical outlier? We have 35 examples of the G42D8 strategy failing. We now have three examples of it leading to victory. We still need at least 7 more wins before we can apply the central limit theorem. Until then, we’re stuck with case studies and sports radio wars.

