Using One-way Analysis of Variance with R and Python to find the Association between quantitative response variable Life expectancy and the converted categorical explanatory variable Income per person / Alcohol consumption in the GapMinder Dataset

Model Interpretation for ANOVA:

When examining the association between the life expectancy in number of years (quantitative response) and the variable income per person (which is the GDP per capita in constant 2000 US$) categorized into 2 ordered categories (if income per person is in between (0, 2385], it’s low, otherwise it’shigh, where 2385 is approximately the median value of the variable, splitting around which we got categorical explanatory variable) for different countries from the Gapminder dataset, a (one-way) Analysis of Variance (ANOVA) revealed that among the countries with high (2385-52302] income per person, reported to have significantly more life expectancy (Mean=75.74 s.d. ±6.08) compared to the countries with low (0-2385] income per person (Mean=63.57, s.d. ±8.86), F(1, 174)=113.0, p = 1.8 x 10^(-20) .

Note that the degrees of freedom that I report in parentheses) following ‘F’ can be found in the OLS table as the DF model and DF residuals. In this example 113.0 is the actual F value from the OLS table and we commonly report a very very small p value as simply = 1.8 x 10^(-20).

The results from python are shown below.

p1p2p3

The following are the same results with R.

p4p5

Model Interpretation for post hoc ANOVA results:

When examining the association between the life expectancy in number of years (quantitative response) and another explanatory variable alcohol consumption (avg in litres) categorized into 4 ordered categories (splitting around the quartiles we got categorical explanatory variable with 4 levels (0,3], (3-6], (6-10], (10-25]) for different countries from the same dataset, (one-way) ANOVA revealed that among daily, the life expectancy (quantitative response variable) and alcohol consumption were significantly associated, F (3, 172)=8.927, p=1.57×10^-5.

Post hoc comparisons of the alcohol consumption by pairs of categories revealed that the countries with alcohol consumption level (10,25] (group 1) reported significantly more life expectancy compared to those with level (0,3] (group 0). Similarly, the countries with alcohol consumption level (10,25] reported significantly more life expectancy compared to those with level (3,6]. And the countries with alcohol consumption level (6,10] reported significantly more life expectancy compared to those with level (3,6].   All other comparisons were statistically similar.

The results from python are shown below.

p6p7p8p9

 

The following are the same results with R.

p10

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s