(Sandipan Dey, 18 April 2017)
The following inference problems (along with the descriptions) are taken directly from the exercises of the edX Course HarvardX: PH525.3x Statistical Inference and Modeling for High-throughput Experiments.
Inference in Practice Exercises
These exercises will help clarify that p-values are random variables and some of the properties of these p-values. Note that just like the sample average is a random variable because it is based on a random sample, the p-values are based on random variables (sample mean and sample standard deviation for example) and thus it is also a random variable. The following 2 properties of the p-values are mathematically proved in the following figure:
- The p-values are random variables.
- Under null-hypothesis the p-values form a uniform distribution.
To see this, let’s see how p-values change when we take different samples. The next table shows the dataset Bodyweight from which
- The control and treatment groups each of size 12 are randomly drawn.
- Then 2-sample t-test is performed with this groups to compute the p-value.
- Steps 1-2 is replicated 10000 times.
27.03 24.80 27.02 28.07 23.55 22.72
The next table shows first few rows of randomly chosen control and treatment groups for a single replication.
The following animation shows first few replication steps.
The following figure shows the distribution of the p-values obtained, which is nearly uniform, as expected.
Now, let’s assume that we are testing the effectiveness of 20 diets on mice weight. For each of the 20 diets let’s run an experiment with 10 control mice and 10 treated mice. Assume the null hypothesis that the diet has no effect is true for all 20 diets and that mice weights follow a normal distribution with mean 30 grams and a standard deviation of 2 grams, run a Monte Carlo simulation for one of these studies, to learn about the distribution of the number of p-values that are less than 0.05. Let’s run these 20 experiments 1,000 times and each time save the number of p-values that are less than 0.05.
The following figures show using Monte-Carlo simulations how some of the t-tests reject the null-hypothesis (at 5% level of significance) simply by chance, even though it is true.
The following figure shows how the FWER (Family-wise error rate, i.e., probability of rejecting at least one true null-hypothesis) computed with (Monte-Carlo simulation) increases with the number of multiple hypothesis tests.
The following figures show theoretically how the FWER can be computed:
Now, let’s try to understand the concept of a error controlling procedure. We can think of it as defining a set of instructions, such as “reject all the null hypothesis for for which p-values < 0.0001” or “reject the null hypothesis for the 10 features with smallest p-values”. Then, knowing the p-values are random variables, we use statistical theory to compute how many mistakes, on average, will we make if we follow this procedure. More precisely we commonly bounds on these rates, meaning that we show that they are smaller than some pre-determined value.
We can compute the following different error rates:
- The FWER (Family-wise error rate) tells us the probability of having at least one false positive.
- The FDR (False discovery rate) is the expected rate of rejected null hypothesis.
The FWER and FDR are not procedures but error rates. We will review procedures here and use Monte Carlo simulations to estimate their error rates.
We sometimes use the colloquial term “pick genes that” meaning “reject the null hypothesis for genes that.”
Bonferroni Correction Exercises (Bonferonni versus Sidak)
Let’s consider the following figure:
Let’s plot of and for various values of m>1. Which procedure is more conservative (picks less genes, i.e. rejects less null hypothesis): Bonferroni’s or Sidak’s? As can be seen from the next figures, Bonferroni is more conservative.