Consider this puzzle :
Some scientist-types have a week to throw a party. They have 7 bottles of wine in their cellar. However, the vintage bottles have lost their labels. Of these, 1 bottle was rat poison, but which one ? To test this out, they volunteer their pet rats, but have only 3 such rats. Also it takes a rat about a week to die of the poison. They need to figure out a way to test all 7 bottles using just 3 rats, such that, at the end of the week they know exactly which bottle was rat poison.
Being good scientists they do the following :
1. Feed a sip of bottle #1 (B1) to rat #1 (R1).
2. Feed a sip of bottle #2 (B2) to rat #2 (R2).
3. Feed a sip of bottle #3 (B3) to rat #3 (R3).
4. Feed a sip of bottle #4 (B4) to rats #1 (R1) and #2 (R2).
Thus, the death of both rats R1 and R2 would point only to bottle B4, not to B1 or B2.
5. Feed a sip of bottle #5 (B5) to rats #1 (R1) and #3 (R3).
6. Feed a sip of bottle #6 (B6) to rats #2 (R2) and #3 (R3).
7. We can do one of two things with bottle #7,
a. Let there be love strategy : Don’t feed any rat from this bottle and if all rats survive, this bottle was the poison. This strategy is handy if you, somewhat, love your pet rats. Clearly, you don’t love them completely as you are choosing wine-for-a-party over their lives !
b. Kill ‘em all strategy : Feed a sip of bottle #7 (B7) to all 3 rats (R1, R2 & R3).
Note: We assume that even in small sips the poison is always potent.
Now, suppose B5 was the poison, at the end of the week R1 and R3 would be in rat heaven. An illustration of this whole experiment looks like this :
Solution : If there are n items out of which exactly one item is special then the optimal non-adaptive (all tests specified at the start) strategy is to test them in mixes which are defined by the binary representation of the number assigned to each item in, . The number of digits needed for this representation is arrived at as,
. For example, for 7 items are to be tested we need, at least,
tests. When these are put together we get a binary matrix of size
where the rows are the tests and the columns are the items to be tested. A 1 in a row implies that the item corresponding to that column is to be mixed in that test.
So, what was the point of all this ? We have arrived at a testing paradigm called group testing or pooling designs. It is a powerful idea that crops up in all sorts of places. It also lets us in on a fundamental limit of information collection. If we had, in general, at most d special items out of n (with all items equally likely to be special) and had to devise a pooled testing scheme, as described earlier, then we will need, at least, t tests, given by :
This is known as the information theoretic lower bound for the group testing problem. The derivation is as follows :
When there are at most d special items out of n, then the number of unique representations required of the outcome would be
. This sums up all the possible ways in which we could have the special items present in n, anywhere from
special items. Each case, would need a unique binary representation where the outcomes can be either 0 or 1 (failure or success). Therefore, the binary representation would require t digits :
But, we can approximate,
Using Stirling’s Approximation,
.
Therefore,
.
Now,
For large n, the
term dominates. So,
+ (some constant)
Therefore,
We are therefore left with a strict lower limit on how much compression can be achieved by using pooling designs instead of standard testing (which would require n tests always). The compression essentially comes from the fact that each negative result of a pooled test eliminated a whole chunk of items.
However, a few things to note here :
1. The tests are assumed to be free of error.
2. d has to be a small fraction of n to get useful compression. As d gets closer to n/2, the pooling designs become unnecessary.
3. The algorithms to create the actual testing designs are a whole different matter.
4. The problem is NP-hard (PSPACE actually) , but there are
approximate algorithms.
5. The best ones operate like, in terms of tests needed.
6. Even for d=2, the algorithm operating at the lower limit is an open problem.
References :
[1] Combinatorial Group Testing – book by Du and Hwang.
[2] Hirschberg et. al. – lower bound proof idea.

1 comment
Comments feed for this article
February 28, 2008 at 7:16 am
matsuo basho
What a good intro to pooling designs! Could you tell what the various applications are?