Nomological Networks
How might we validate the construct validity of a machine learning model?
Construct validity is inescapable in machine learning because we only need machine learning when we lack understanding. If I know that my “label” is a simply described, perhaps noisy, “function” of my features, then I don’t need machine learning. For example, if I believe that force is equal to a constant times the displacement of the spring, I can find the constant by attaching a few weights to the spring. I might use least-squares methods to estimate the parameters of my model, but I think we must assert that fitting a well parameterized model to observations isn’t machine learning. Machine learning is what we do when we don’t understand the mapping.
When I don’t understand the functional mapping between my features and my labels, I need to have some way to defend my construct that the labels should be predictable from my features. I need a qualitative model that connects the concepts to the observables, a way to defend this model, and a way to extract some semblance of what the labels mean. In most cases I’ve seen in vision, language, and robotics, this defense mostly vibes. However, we can get a little philosophical and try to get at a more rigorous defense of construct validity. Cronbach and Meehl describe a general modeling framework for validation called nomological networks.
A nomological network is a graph that describes why and how certain observations connect to certain concepts. They are far more expressive than the causal graphs you may have encountered when reading Judea Pearl. The nodes of the graph are entities in some theory. In Lecture 4 of Meehl’s class, he describes his ontology of theoretical entities:
Substances - At a particular level, these are your basic building blocks. Meehl argues the elements are substances. But you could say they are structures (see the next bullet).
Structures - these are just combinations of substances and structures. He gives the examples of neuron, brain, helium nucleus, and chair.
States - States describe properties of entities. “Jones is depressed.” “I am thirsty.”
Events - Events describe transitions between states. “The neuron spikes.” “The patient faints.” “I am experiencing a hunger pang.”
Dispositions - Describe the potential of entities to occupy states. They tend to end in ible and able: flammable, soluble, excitable.
The edges in the graph are laws that relate the entities. If I were to be pedantic, the nomological network is a hypergraph, as multiple entities can be linked together by laws. Meehl describes three of the most common laws as
Structural-Compositional - These laws describe how structures are formed, their component parts, and how they are arranged.
Functional-dynamic - These laws are what we might call efficient causes. For example, if you do x, then y will happen.
Developmental - These laws describe what happened to result in some occurrence. Meehl includes evolution, big bang cosmology, continental drift, and all of capital-H History in this bag of laws.
With these nodes and edges, the nomological network connects observations to theoretical constructs. This lets us tie together the abstract categories to our predictions and tests.
In the 1955 paper, Cronbach and Meehl describe the mercury thermometer and how it relates to heat. Why does the expansion of a mercury column have construct validity of hotness? First, we can certify that there is high correlation between hotness and the length of the column. But we can further justify this with a complex nomological network that involves “unobservable microevents” in kinetic theory that “explains the relation of mercury expansion to heat.” Our perceived hotness correlates with a measured length that is derived as caused by kinetic motion that we define as heat. It’s undoubtedly complicated when you try to write it all down, but we can get at the network connecting correlations to the theoretical entities.
If you want to validate a construct, you have to do an experiment to validate the nomological network. Validation is a scientific question. Your nomological network makes predictions about measurements. Hence, you validate your construct by severely testing your predictions.
If the predictions fail to be accurate, you have falsified your nomological network. Ha ha, just kidding. You haven’t. You’ve assuredly falsified something when your predictions are wrong, but what exactly? The nodes and edges in the nomological network are each logical propositions. The network is their logical conjunction. The experiment in which you test the network is theory-laden, just as all scientific experiments are. You need auxiliary theories to describe the mapping of the network to the prediction, theories of how your instruments work, assertions of ceteris paribus, and details of experimental conditions.
If you read along with my Meehl blogging of last summer, this procedure of unpacking experimental failure is the realm of Lakatosian Defense. When our predictions don’t pan out how we expected and fail to perfectly correlate with our expectations, it means that something is wrong with the experimental deduction, but pinpointing what is wrong is a complex mess. However, just because we get some evidence that our predictor is lacking, a single experiment rarely means that we just throw the predictor out.
To find what was wrong, we work our way up the chain of logic that conjoins the nomological network and the experiment. Did we set up the measurements correctly? Did we control for all possible confounders (or distribution shifts or domain shifts)? Is there something wrong with the data collection? Is there a bug in the code? Should I have added more inverse propensity skip layer decay to my gradient clipping? Only after this do we start to question how we need to change the innards of our prediction function. Maybe you adjoin more data and retest. Maybe you invent a new reasoning heuristic at test time. You can patch a predictor to account for every experimental failure.
But what if your predictions were right? What if you have built an LLM, and it aces the ACT and the MCAT? Well, you deem your construct valid and press on. This is the standard logical fallacy of science and engineering. All models are wrong, but those affirming the consequent are useful.
What if in engineering, “works” just means “doesn’t not work?” Or at least, not yet.
In other words, the "proof" of the pudding is in eating it :)
Yes, over a long period of time (economic incentives?), people who cling on to the Lakatosian defense, will indeed admit diminishing returns, a need for a paradigm shift, or even redefine their objective! What methods can accelerate this and enable more first principles architecting of ideas?
We have seen them all in the LLM world:
- Redefinitions: Reasoning is "thinking harder" before you answer.
- If only: Claiming that it is just around the corner, but we are hitting a scaling wall, we are running out of tokens, we are short a few trillion dollars
- Defending "Jagged Intelligence" as if it were a thing, leave alone intelligence.
It is difficult to get a man to understand something, when his salary depends on his not understanding it
Nits: You write that "substances are just combinations of substances and structures", which is probably not what you meant to write, and "states describe properties of entities," but entities are not part of your ontology. Perhaps you meant that states describe properties of substances?
More centrally, you write (to me, correctly) that in vision, language, and robotics, the defense of construct validity is mostly vibes, but isn't all this nomological networks stuff mostly vibes with extra steps? You've got a big complicated nomological network, but we can argue forever about whether you've got the *right* network, before running any experiments. Even if your predictions look good, it seems you've got the same problems with an LLM and a nomological network --- maybe you didn't give it the right test yet.