...
One has to be very careful with row data
Even accurate data may easily yield wrong conclusions
We will consider several examples today
A pharmaceutical company promotes a new treatment for acne:
Mild Acne | Severe Acne | |||
---|---|---|---|---|
Cured | Not Cured | Cured | Not Cured | |
Old Treatment | 2 | 8 | 40 | 40 |
New Treatment | 30 | 60 | 12 | 8 |
90 patients received old treatment, 42 were cured with
42/90=46.7% success rate
110 patients received new treatment, 42 were cured with
42/110=38.7% success rate
The artificial subdivision into "mild" and "severe" cases made it possible to present the data in a way such that the new treatment looks better while it is not!
Basketball example
First Half | Second Half | ||||||
---|---|---|---|---|---|---|---|
Player | Baskets | Attempts | percent | Baskets | Attempts | percent | |
Kevin | 4 | 10 | 40% | 3 | 4 | 75% | |
Kobe | 1 | 4 | 25% | 7 | 10 | 70% |
While totally during the whole game
Player | Baskets | Attempts | percent |
Kevin | 7 | 14 | 50% |
Kobe | 8 | 14 | 57% |
In 85% cases, a mammogram correctly identifies a tumor as benign (not cancer) or malignant (cancer)
How threatening a positive mammogram may be?
Out of 10,000 tumors, 100 are malignant, and 85 of them will be correctly identified by mammograms.
Out of these 10,000 tumors, 9900 are benign, and \( 0.15 \times 9900 = 1485 \) of them will be misidentified by mammogram as cancer.
Thus out of 1485+85=1570 positive mammograms only 85, i.e 85/1570=0.054 or 5.4% cases have cancer.
Not as threatening as it looks!
Assumption made: only 1% of tumors (100 out of 10,000) are malignant. What if 20%?
Now out of 10,000 tumors, 2,000 are malignant, and 1,700 of them will be correctly identified by mammograms.
Out of these 10,000 tumors, 8,000 are benign, and \( 0.15 \times 8000 = 1200 \) of them will be misidentified by mammogram as cancer.
Thus out of 1,700+1,200=2,900 positive mammograms already 1,700, i.e 1700/2900=0.586 or 58.6% cases have cancer.
Now it looks bad!
Overly optimistic assumption: just 0% of tumors (0 out of 10,000) are malignant.
No worry in this case, whatever the mammogram indicates: no cancer cases
Overly pessimistic assumtion: just 100% of tumors (all 10,000 out of 10,000) are malignant
Too bad a scenario: all 8,500 cases are identified correctly
Conclusion: how threatening is that depends not only on the accuracy of the identification, but also on the frequency of malignant tumors among all tumors
Assume that 4% out of a 1000 athletes (i.e. 40 people) use drugs for enhanced performance
A 95% accurate test will give us false positive on 5% out of 960 people
That comes out as \(0.05 \times 960 = 48\) innocent athletes accused
The test will find correctly \(0.95 \times 40=38\) drug users
Thus out of 38+48=86 accused only 38/86=0.44, i.e. 44% indeed used the drugs, and 56% did not.
That is a result of a 95% accurate test.
2011 effect of 2001 tax cut
While the chart on the left suggests that rich pay bigger share of taxes, the right chart shows that they pay smaller amount in absolute dollars.
Clearly, that is because the total tax revenue was smaller.