Discussion about this post

User's avatar
Katie R's avatar

The Soviet Tank Problem is really hard to avoid in practice sometimes. I always go to an example from my lab where a team built a specialized object classifier on a mobile app. They made sure to train with all the objects on a variety of realistic background types: grass, sand, asphalt, etc. Then they had users test the app on the same background types, as well as new ones to see how it would generalize. Performance was good on the known backgrounds, except it dropped on grass. The reason? They trained it on grass in Maryland (green), but the users tested it on grass in southern California (brown).

Expand full comment
Tom Dietterich's avatar

I committed a version of the "Tank" error. We were developing a computer vision system to classify freshwater macro-invertebrates to genus. Images were collected by a robotic device where the specimen was dropped into an alcohol-filled container and photographed against a beautiful blue background. My colleagues collected 100 specimens each from 54 genera. Each specimen was put in its own vial, and the vials were put in boxes, sorted by genus. We hired some undergrads to do the photography. Naturally, they came in on Day 1 and opened box 1 and photographed everything in it. On Day 2, they did boxes 2 and 3, and so on. It turned out that each day, different bubbles formed in the alcohol around the edges of the visual field. These basically bar-coded the class. We got suspicious when the classifier was too accurate. To figure out what was going on, we constructed what would now be regarded as very simple visual explanations. These revealed that the classifier was looking at the bubbles and not the specimens. Face plant! I had made the "tank in the trees" error!

Fortunately, it was easy to mask out the bubbles. But the lesson I learned was one we teach in intro statistics: Always randomize everything you can think of. If we had randomized the order in which the specimens had been photographed, we would have broken the statistical link.

Expand full comment
16 more comments...

No posts