Section 3E

How numbers can deceive

...

Generalities

One has to be very careful with row data

Even accurate data may easily yield wrong conclusions

We will consider several examples today

Better in each case but worse overall

A pharmaceutical company promotes a new treatment for acne:

	Mild Acne		Severe Acne
	Cured	Not Cured	Cured	Not Cured
Old Treatment	2	8	40	40
New Treatment	30	60	12	8

Rates of success: 33% versus 20% for mild acne,
and 60% versus 50% for severe acne.

More specifics...

90 patients received old treatment, 42 were cured with
42/90=46.7% success rate

110 patients received new treatment, 42 were cured with
42/110=38.7% success rate

The artificial subdivision into "mild" and "severe" cases made it possible to present the data in a way such that the new treatment looks better while it is not!

Better in each case but worse overall

Basketball example

		First Half			Second Half
Player	Baskets	Attempts	percent	Baskets	Attempts	percent
Kevin	4	10	40%	3	4	75%
Kobe	1	4	25%	7	10	70%

While totally during the whole game

Player	Baskets	Attempts	percent
Kevin	7	14	50%
Kobe	8	14	57%

Mammograms and cancer

In 85% cases, a mammogram correctly identifies a tumor as benign (not cancer) or malignant (cancer)

How threatening a positive mammogram may be?

Out of 10,000 tumors, 100 are malignant, and 85 of them will be correctly identified by mammograms.

Out of these 10,000 tumors, 9900 are benign, and \( 0.15 \times 9900 = 1485 \) of them will be misidentified by mammogram as cancer.

Thus out of 1485+85=1570 positive mammograms only 85, i.e 85/1570=0.054 or 5.4% cases have cancer.

Not as threatening as it looks!

Mammograms and cancer ... why so optimistic?

Assumption made: only 1% of tumors (100 out of 10,000) are malignant. What if 20%?

Now out of 10,000 tumors, 2,000 are malignant, and 1,700 of them will be correctly identified by mammograms.

Out of these 10,000 tumors, 8,000 are benign, and \( 0.15 \times 8000 = 1200 \) of them will be misidentified by mammogram as cancer.

Thus out of 1,700+1,200=2,900 positive mammograms already 1,700, i.e 1700/2900=0.586 or 58.6% cases have cancer.

Now it looks bad!

Mammograms and cancer ... extreme cases

Overly optimistic assumption: just 0% of tumors (0 out of 10,000) are malignant.

No worry in this case, whatever the mammogram indicates: no cancer cases

Overly pessimistic assumtion: just 100% of tumors (all 10,000 out of 10,000) are malignant

Too bad a scenario: all 8,500 cases are identified correctly

Conclusion: how threatening is that depends not only on the accuracy of the identification, but also on the frequency of malignant tumors among all tumors

More on too many false positives: 95% accurate drug test

Assume that 4% out of a 1000 athletes (i.e. 40 people) use drugs for enhanced performance

A 95% accurate test will give us false positive on 5% out of 960 people

That comes out as \(0.05 \times 960 = 48\) innocent athletes accused

The test will find correctly \(0.95 \times 40=38\) drug users

Thus out of 38+48=86 accused only 38/86=0.44, i.e. 44% indeed used the drugs, and 56% did not.

That is a result of a 95% accurate test.

tax cuts for rich and their effect

2011 effect of 2001 tax cut

While the chart on the left suggests that rich pay bigger share of taxes, the right chart shows that they pay smaller amount in absolute dollars.

Clearly, that is because the total tax revenue was smaller.