*(Sandipan Dey, **18 April 2017) *

The following *inference* problems (along with the descriptions) are taken directly from the exercises of the *edX Course HarvardX: PH525.3x Statistical Inference and Modeling for High-throughput Experiments*.

## Inference in Practice Exercises

These exercises will help clarify that *p-values* are *random variables* and some of the properties of these *p-values*. Note that just like the sample average is a random variable because it is based on a random sample, the p-values are based on random variables (sample mean and sample standard deviation for example) and thus it is also a random variable. The following 2 properties of the *p-values* are mathematically proved in the following figure:

- The
*p-values*are*random variables*. - Under
*null-hypothesis*the*p-values*form a*uniform distribution*.

To see this, let’s see how *p-values* change when we take different samples. The next table shows the dataset *Bodyweight* from which

- The
*control*and*treatment*groups each of size 12 are randomly drawn. - Then
*2-sample t-test*is performed with this groups to compute the*p-value*. - Steps 1-2 is replicated 10000 times.

#### Bodyweight

```
27.03
24.80
27.02
28.07
23.55
22.72
```

The next table shows first few rows of randomly chosen *control* and *treatment* groups for a single replication.

control | treatment |
---|---|

21.51 | 19.96 |

28.14 | 18.08 |

24.04 | 20.81 |

23.45 | 22.12 |

23.68 | 30.45 |

19.79 | 24.96 |

The following animation shows first few replication steps.

The following figure shows the distribution of the *p-values* obtained, which is nearly *uniform*, as expected.

Now, let’s assume that we are testing the effectiveness of 20 diets on mice weight. For each of the 20 diets let’s run an experiment with 10 control mice and 10 treated mice. Assume the *null hypothesis* that the diet has no effect is *true* for all 20 diets and that mice weights follow a normal distribution with mean 30 grams and a standard deviation of 2 grams, run a Monte Carlo simulation for one of these studies, to learn about the distribution of the number of p-values that are less than 0.05. Let’s run these 20 experiments 1,000 times and each time save the number of *p-values* that are less than 0.05.

The following figures show using *Monte-Carlo* simulations how some of the *t-tests* reject the *null-hypothesis* (at 5% level of significance) simply *by chance*, even though it is *true*.

The following figure shows how the *FWER* (*Family-wise error rate*, i.e., *probability of rejecting at least one true null-hypothesis*) computed with (*Monte-Carlo simulation*) increases with the number of multiple hypothesis tests.

The following figures show theoretically how the *FWER* can be computed:

Now, let’s try to understand the concept of a *error controlling procedure*. We can think of it as defining a set of instructions, such as “reject all the null hypothesis for for which p-values < 0.0001*”* or *“reject the null hypothesis for the 10 features with smallest p-values”. Then, knowing the ***p-values** are random variables, we use statistical theory to compute how many mistakes, on average, will we make if we follow this procedure. More precisely we commonly bounds on these rates, meaning that we show that they are smaller than some pre-determined value.

We can compute the following different error rates:

- The
*FWER*(*Family-wise error rate*) tells us the*probability*of having*at least one false positive*. - The
*FDR*(*False discovery rate*) is the*expected rate*of*rejected null hypothesis*.

#### Note 1

The *FWER* and *FDR* are not procedures but error rates. We will review procedures here and use Monte Carlo simulations to estimate their error rates.

#### Note 2

We sometimes use the colloquial term “pick genes that” meaning “reject the null hypothesis for genes that.”

## Bonferroni Correction Exercises (Bonferonni versus Sidak)

Let’s consider the following figure:

Let’s plot of α/m and 1−(1−α)^(1/m) for various values of *m>1*. Which procedure is more conservative (picks less genes, i.e. rejects less null hypothesis): Bonferroni’s or Sidak’s? As can be seen from the next figures, **Bonferroni** is more conservative.