More Like One Out of A Million...

Ben Recht

Jan 26, 2024

We should be unsatisfied with unverifiable measurement.

Read →

1 Comment

Miguel

Jan 27, 2024Edited

Let me have a try at "dumping Polya’s Urn out on the table".

Let $r_i$ and $b_i$ be Boolean variables, for i=1,2,3, and consider the following Boolean constraints:

$\bigwedge_{i=1}^{3} \neg r_i \lor \neg_b_i$

$\land$

$\bigwedge_{i=1}^{3} b_i \lor r_i$

The assignments to the Boolean variables (functions mapping each of $r_i$ and $b_i$ to either 0 or 1) that satisfy the above model possible configurations of the balls in the urn (e.g. which specific ball has what color).

Now, we can lay out each of the possible 2^6 assignments in a table. We can do this (and handle much bigger systems) with state-of-the-art SAT solvers and model counting algorithms. We can count for instance, how many assignments satisfy a given property, e.g. r_1=1 for "ball 1 is true" in time linear in the representation of this table. Over the past 20 years or so, great advances have been made that this query can be done on a "compressed" table (e.g. d-DNNF or SDD) in time linear in the size of this compressed representation (which is worst-case exponential on the treewidth of the formula primal graph, where vertices are variables and we have an edge whenever two variables appear on the same constraint). That is as good as oracles for counting stuff efficiently can go in the real world (i.e. run by a Turing Machine-like device).

Now, big problem. How do weight those assignments we have painstakingly computed and stored in a compact manner? Since we know nothing, the "standard" approach is to assume that all assignments that satisfy the constraints are equally likely. I can hear folks sucking their teeth already :)

For the scenario where we pick a ball and turns to be red, the information acquired can be encoded as the following formula

$r_1 \lor r_2 \lor r_3$

What kind of algorithm can exhaustively enumerate all the possible consequences of this new piece of information? One simple thing we can do is the following:

1/ Count the assignments for the original formula (linear on the size of our compressed table)

2/ Compute all the assignments for our original set of constraints conjoined with the new piece of information given by the formula above, store them in our compressed "table" representation.

3/ Count the assignments for the formula accounting for the new information.

4/ If the number of assignments for the new formula are less than those for the original formula, we conclude that "something has been learnt". Actually determining exactly *what* has been learnt is hard!

The reasoning for observing a blue ball is analogous for the above. It is obvious we reach a fixed point after we have obtained the two different observations. Which as you write in the post, it is not very interesting.

The above can be reformulated into the theory of linear arithmetic, with similarly frustrating results (we obtain trivial lower bounds on the numbers of blue and red balls).

Having set a rigorous framework for counting (because I have never been convinced by this thing of going from English to random variables and probability distributions right away) I think the strategy for "defeating" Polya urn could be like this:

1. Guess a satisfying assignment for our constraints (this can be actually hard as in NP-hard for other systems than Polya's urn!).

2. Extract N balls recording their color. Compute the probability of that sequence being consistent with the guessed assignment (all assignments assumed to be equiprobable). If probability becomes 0, go to 1.

3. Store result and go to 1. If no new assignment available, continue.

4. Sort assignments according to probability, and pick the best one.

Claim: as N -> \infty (ha!) the higher is the probability the top-most assignment is the true one :-)

There is an obvious problem with the above is in Step 2. I am not sure how to actually do that. We would want something like https://www.cs.ucdavis.edu/~green/papers/pods07.pdf to tag assignments to our constraints with "likelihood votes" and determine (via a suitable query) when what has been observed is impossible under the assumed assignment (the specific color of each ball).

Expand full comment

arg min

More Like One Out of A Million...