Simpon's Paradox

From "Pearl, Judea. Causality. Cambridge university press, 2009"

Simpson's paradox ... refers to the phenomenon whereby an event \(C\) increases the probability of \(E\) in a given population \(p\) and, at the same time, decreases the probability of \(E\) in every subpopulation of \(p\). In other words, if \(F\) and \(\neg F\) are two complementary properties describing two subpopulations, we might well encounter the inequalities \[P(E|C)>P(E|\neg C)\] \[P(E|C,F)\lt P(E|\neg C,F)\] \[P(E|C,\neg F)\lt P(E|\neg C,\neg F)\] ... For example, if we associate \(C\) (connoting cause) with taking a certain drug, \(E\) (connoting effect) with recovery, and \(F\) with being a female, then ... the drug seems to be harmful to both males and females yet beneficial to the population as a whole.

For example, consider the situation exemplified by the following tables from [1] \[\begin{array}{llll} \mathrm{Combined}&E&\neg E&&\mathrm{Recovery Rate}\\ \hline \mathrm{Drug} (C)& 20&20&40 &50\%\\ \mathrm{No drug} (\neg C)& 16&24&40 &40\%\\ \hline &36&44&80 \end{array}\] \[\begin{array}{llll} \mathrm{Females}&E&\neg E&&\mathrm{Recovery Rate}\\ \hline \mathrm{Drug} (C)& 2&8&10 &20\%\\ \mathrm{No drug} (\neg C)& 9&21&30 &30\%\\ \hline &11&29&40 \end{array}\] \[\begin{array}{llll} \mathrm{Males}&E&\neg E&&\mathrm{Recovery Rate}\\ \hline \mathrm{Drug} (C)& 18&12&30 &60\%\\ \mathrm{No drug} (\neg C)& 7&3&10 &70\%\\ \hline &25&15&40 \end{array}\]

As you can see taking the drug seems to be beneficial overall even if it is not for females and males.

The paradox derives because we must distinguish seeing from doing:

The conditioning operator in probability calculus stands for the evidential conditional "given that we see," whereas the do(.) operator was devised to represent the causal conditional "given that we do." Accordingly, the inequality \[P(E|C)>P(E|\neg C)\] is not a statement about \(C\) being a positive causal factor for \(E\), properly written \[P(E|do(C))>P(E|do(\neg C))\] but rather about \(C\) being positive evidence for \(E\)

The \(do\) operator must be used to infer the effect of actions. To compute the effect of action \(do(C)\) on a variable \(E\), i.e., compute \(P(E|do(C))\) when the model is represented as a causal Bayesian network, we must

  1. remove all arcs incoming on \(C\)
  2. modify the conditional probability table of \(C\) so that all the probability mass is assigned to the true value
  3. query \(P(E)\) on the mutilated network

For example, if the causal Bayesian network of our example is:

Network = dot(digraph(['Drug C'->'Recovery E','Gender F'->'Drug C','Gender F'->'Recovery E'])).

then to compute \(P(E|do(C))\), we must remove the incoming arcs of \(C\) obtaining

MutilatedNetwork = dot(digraph(['Drug C'->'Recovery E','Gender F'->'Recovery E'])).

and set the conditional probability table of \(C\) to \[\begin{array}{lll} C&\mathrm{true}&\mathrm{false}\\ \hline &1&0 \end{array}\] Then we query \(P(E)\) on this network.

If we want to compute \(P(E|do(\neg C))\) we must set the conditional probability table of \(C\) to \[\begin{array}{lll} C&\mathrm{true}&\mathrm{false}\\ \hline &0&1 \end{array}\] and query \(P(E)\) on this network.

The network for our example can be represented with the following program

:- use_module(library(pita)). :- if(current_predicate(use_rendering/1)). :- use_rendering(graphviz). :- endif. :- pita. :- begin_lpad. :- action drug/0. female:0.5. recovery:0.6:- drug,\+ female. recovery:0.7:- \+ drug,\+ female. recovery:0.2:- drug,female. recovery:0.3:- \+ drug,female. drug:30/40:- \+ female. drug:10/40:-female. :-end_lpad.

where

:- action drug/0.
means that drug/0 is a predicate that can be used to specify actions.

If we query for the conditional probabilities of recovery given treatment on the whole population and on the two subpopulations, we get the results in the tables above:

prob(recovery,drug,P).
prob(recovery,\+ drug,P).
prob(recovery,(drug,female),P).
prob(recovery,(\+drug,female),P).
prob(recovery,(drug,\+female),P).
prob(recovery,(\+ drug,\+female),P).

If instead we want to know the probability of recovery given the action treatment (taking a drug), we must ask

prob(recovery,do(drug),P).
prob(recovery,do(\+ drug),P).
prob(recovery,(do(drug),female),P).
prob(recovery,(do(\+drug),female),P).
prob(recovery,(do(drug),\+ female),P).
prob(recovery,(do(\+ drug),\+ female),P).

As you can see, the probability of recovery on the whole population is now in accordance with that in the subpopulations, in particular it is the weighted average of the probability of recovery in the subpopulations.

The probability of recovery in the two subpopulations is the same as that for the case of seeing rather than doing, as the observation of the sex makes the arc from sex to drug irrelevant.

References

[1] Pearl, Judea. Causality. Cambridge university press, 2009.