Two-way ANOVA in R

Introduction

Our goal in this chapter is to learn how to work with two-way ANOVA models in R, using an

example from a plant competition experiment. The work flow is very similar to one-way

ANOVA in R. We’ll start with the problem and the data, and then work through model fitting,

evaluating assumptions, significance testing, and finally, presenting the results.

Competition between Calluna and Festuca

Plants have an optimal soil pH for growth, and this varies between species. Consequently we

would expect that if we grow two plants in competition with each other at different pH values

the effect of competition might vary according to the soil pH. In a recent study the growth of

the grass Festuca ovina (Sheep’s Fescue) in competition with the heather Calluna

vulgaris (Ling) was investigated in soils with different pH. Calluna is well adapted to grow on

very acidic soils such as on the Millstone grit and blanket bogs around

Sheffield. Festuca grows on soils with a much wider range of pH. We might hypothesise

that Calluna will be a better competitor of Festuca in very acid soils than in moderately acid

soils.

To test this hypothesis an experiment was designed in which Festuca seedlings were grown

in pots at all combinations of two levels of two different kinds of treatment:

Factor 1: Soil pH at 3.5 or 5.5

Factor 2: Presence or absence of Calluna.

This is a fully factorial, two-way design. The total number of treatments was thus 2×2=4. For

each of the treatments there were 5 replicate pots, giving a total of 2×2×5=20 pots.

The following data are the yields of Festuca from each treatment (dry weight in g) from the

two pH levels and in the presence or absence of Calluna.

pH 3.5 | pH 5.5 | |

Calluna Present | 2.76, 2.39, 3.54, 3.71, 2.49 | 3.21, 4.10, 3.04, 4.13, 5.21 |

Calluna Absent | 4.10, 2.72, 2.28, 4.43, 3.31 | 5.92, 7.31, 6.10, 5.25, 7.45 |

Walk through

You should begin working through the Festuca example from this point.

The data are in a CSV file called Festuca.csv. Start by reading the data into an R data frame,

giving it the name festuca, and then have a look at the data.

Notice that the data for a two-factor experiment can be laid out in a very similar manner to

those from a one factor experiment, with the response variable in one column, but now there

are two additional columns containing the codes for the treatments, one for pH and one

for Calluna. The first column (Weight) contains the Festuca dry weights, the second column

(pH) contains the codes for the pH treatment (levels: pH3.5, pH5.5), the third column (Calluna)

contains the codes for the presence or absence of Calluna (levels: Present, Absent).

A couple of points are worth noting at this point:

1. As is always the case in this book, the data are ‘tidy.’ Each experimental factor is in

one column and each observation is in a single row. The same applies to the majority

of R’s statistical modelling tools—they expect data to be supplied in this format.

2. We avoided using numbers to encode the levels of the pH treatment. This is

important, as it ensures that the pH variable will be converted into a factor rather than

a number when we read the data into R. We have said it before, but it is worth saying

one more time: fewer mistakes will occur if we use words to encode the levels of a

factor.

Visualising the data

We should take a look at the data before doing anything with it. We only have five replicates

per treatment combination, so any figure we produce is going to provide only limited

information. Five replicates is just about sufficient for a box plot.

boxplot(Weight~factor(Calluna)+pH,data=festuca) #note here that we have added pH as well!

The main purpose of a plot like this is to help us understand what the treatments are doing.

We want to quickly assess things like: How big are the main effects? What direction do they

work in? Is there likely to be an interaction? It looks like the higher pH conditions tends to

increase Festuca growth, and the presence of Calluna tends to reduce it (yes, plants

compete!). The more interesting observation is that the effect of Calluna seems to be greater

at higher pH. It looks like we might have an interaction.

The next thing we should do is check whether any of the assumptions have been violated. To

do this we need to fit the model first.

Fitting the ANOVA model

Carrying out a two-way ANOVA in R is really no different from one-way ANOVA. It still involves

two steps. First we have to fit the model using the lm function, remembering to store the

fitted model object. This is the step where R calculates the relevant means, along with the

additional information needed to generate the results in step two. The second step uses

the anova function to calculate F-statistics, degrees of freedom, and p-values.

Here is the R code needed to carry out the model-fitting step with lm:

festuca_model <- lm(Weight ~ pH + Calluna + pH : Calluna, data = festuca)

This is very similar to the R code used to fit a one-way ANOVA model. The first argument is a

formula (notice the ‘tilde’ symbol: ~) and the second argument is the name of the data frame

that contains all the variables listed in the formula. That’s all we need.

The specific model fitted by lm is a result of 1) the type of variables referenced in the formula,

and 2) the symbols used to define the terms in the formula. To ensure that we have fitted an

ANOVA model, the variables which appear to the right of the ~ must be factors or character

vectors—an ANOVA only involves factors. The variable name to the left of the ~ is the numeric

response variable we are analysing. We know that Calluna and pH are factors, so we can be

certain that lm has fitted some kind of ANOVA model.

What kind of ANOVA have we fitted, i.e., what are the terms on the right hand side of the

formula doing? Here is the formula we used:

Weight ~ pH + Calluna + pH:Calluna

There are three terms, each separated by a + symbol: pH, Calluna and pH:Calluna. This tells R

that we want to fit a model that accounts for the main effects of pH and Calluna, and that we

also wish to include the interaction between these two factors. The specification of the main

effects is fairly self-explanatory—we just include the name of each factor variable in the

formula. The interaction term is less obvious. It is specified by a colon (the : symbol) with the

two interacting variables either side of it.

In summary… 1) the ~ symbol specifies a formula in R, where the name on the left hand side

is the response variable we are analysing, and the names on the right denote the terms in the

model; 2) we place a + between terms to delineate them (we are not adding anything up

when the + is used in a formula); 3) each main effect is specified by the corresponding factor

name; and 4) an interaction between factors is specified by the : symbol.

Notice that we assigned the result a name (festuca_model) which now refers to the model

object produced by lm. Just as with a one-way ANOVA, we can’t extract p-values by printing

this object to the console, because all this gives us is a limited summary of the fitted model’s

coefficients:

festuca_model

We’re not going to worry about what those mean for a two-way ANOVA. We’re going to

use anova to calculate things like degrees of freedom, sum of squares, mean squares, Fstatistics, and finally, the p-values. But… before we do that, let’s check our assumptions now

that we have a fitted model object.

Diagnostics

We’re going to produce two diagnostic plots: a normal probability plot to evaluate the

normality assumption, and a scale-location plot allows us to evaluate the constant variance

assumption. Here’s the normal probability plot:

plot(festuca_model, which = 2, add.smooth = FALSE)

This plot allows us to check whether the deviations from the group means (the residuals) are

likely to have been drawn from a normal distribution. This looks… not so great. The points

deviate from the line in a systematic way so it looks like the normality assumption may not

be satisfied. The left tail is above the line and the right tail is below it. This tells us that the

tails of the residual distribution do not extend out as far as they should—the distribution is

‘squashed’ toward its middle.

The scale-location plot allows us to see whether or not the variability of the residuals is

roughly constant within each group. Here’s the plot:

plot(festuca_model, which = 3, add.smooth = FALSE)

We’re on the lookout for a systematic pattern in the size of the residuals and the fitted

values—does the variability go up or down with the fitted values? There is no such pattern so

it looks like the constant variance assumption is at least satisfied here.

We’ve identified one potential problem. We’ll ignore it for now and press on. The goal here

is to learn the work flow for two-way ANOVA in R. However, keep in mind that if we were

serious about the analysis we should find a way to ‘fix’ it, for example using the methods we

discussed in the data visualisation section e.g. log transformation.

Interpreting the results

Now we’re ready to calculate the degrees of freedom, sums of squares, mean squares, the Fratio, and p-values for the main effects and the interaction terms:

anova(festuca_model)

What does all this mean? We interpret each line of the ANOVA table in exactly the same way

as we do for a one-way ANOVA. The first part tells us what kind of output we are looking at:

This reminds us that we are looking at an ANOVA table where our response variable was

called Weight. The table contains the key information:

This ANOVA table is similar to the ones we have already seen, except that now we have to

consider three lines—one for each term in the model. The first is for the main effect of pH,

the second for the main effect of Calluna, the third for the interaction between pH

and Calluna.

The F-ratio is the test statistic for each term. These provide a measure of how large and

consistent the effects associated with each term are. Each F-ratio has a pair of degrees of

freedom associated with it: one belonging to the term itself, the other due to the error

(residual). Together, the F-ratio and its degrees of freedom determines the p-value.

The p-value gives the probability that the differences between the set of means for each term

in the model, or a more extreme difference, could have arisen through sampling variation

under the null hypothesis of no difference. We take p < 0.05 as evidence that at least one of

the treatments is having an effect. Here, p < 0.05 for three effects, so we conclude both main

effects and the interaction are significant (though at different significance levels).

The ANOVA table tells us nothing about the direction of the effects. We have to delve a little

further into the fitted model or plot the data to be able to do this. The presence of an

interaction between treatments indicates that the impact of one factor depends on the levels

of the other factor. This means that if there is a significant interaction in a two-way ANOVA,

then the main effects should be interpreted with care.

An ‘interaction diagram’ provides a good way to think about these issues…

Understanding the model graphically

How should we go about interpreting the significant effects? To reiterate, the interaction tells

us that the magnitude, or even direction, of the effect of one factor is dependent upon the

levels of the other factor. In other words the treatment effects are contingent on one another.

This contingency can arise in a number of ways, giving rise to different mixtures of main

effects and interactions. This is illustrated most easily by considering some hypothetical

results from a pH/Calluna experiment of this sort, in schematic form (the Calluna bars are

linked by a dotted line):

Diagrams such as these are sometimes called ‘interaction diagrams’ and they are often the

best way of looking at the results from this sort of experiment to try and interpret what is

happening. You will notice that the lines linking the treatments are parallel when there is no

interaction, but become non-parallel when an interaction is present. An interaction may just

mildly change the way the main effects work (4th plot) or it might completely reverse the

effects (5th plot).

You can plot interactions in R using the interaction.plot function. Have a play with the data

and see what you can produce. Remember to use ?interaction.plot to look at what code you

need to enter.

Multiple comparison tests

Obviously, since the main treatments only have two levels there is no need for any multiple

comparison tests on the main effects — if there is a difference it must be between the two

levels. However, the interaction is significant, so it may be desirable to know which particular

treatment combinations differ. Predictably, the work flow is very similar to that applied to a

one-way ANOVA model.

We could use the TukeyHSD function to do this. We start by converting the model object

produced by lm into an aov object…

festuca_aov <- aov(festuca_model)

…and then we perform a Tukey HSD test:

TukeyHSD(festuca_aov, which = ‘pH:Calluna’)

We have suppressed the output for now. The only new tweak that we have to learn is

the which argument. Assigning this the value ‘pH:Calluna’ makes the TukeyHSD function carry

out all pairwise comparisons among the means of each treatment combination, i.e., we are

considering the full set of interactions. Here is the output:

You extract information from this table just as you did before. The table present a series of

pair-wise comparisons between mean values tested by the Tukey procedure. For example,

the first 3 lines show the significance of differences between the mean of the treatment

combination pH 3.5 without Calluna and the 3 other mean values. All we need from this table

is to note the codes for the treatment means which are being compared (listed in the first

column), and the p-value in each case listed in the final column.

There are three significant differences, all of which involve the treatment combinations pH

5.5 with Calluna absent. This implies that there are two ‘not significantly different’ groups:

one defined by pH 5.5 with Calluna absent, and then everything else.

As you might expect, we don’t have to step through the results of the TukeyHSD function to

define the ‘not significantly different’ groups. We used the agricolae package to do this for a

one-way ANOVA. We can use this again here. We need to load and attach the package first:

library(agricolae)

Once the package is ready for use, we can carry out the Tukey HSD test to find the ‘not

significantly different’ groups using the HSD.test function:

HSD.test(festuca_aov, trt = c(“pH”, “Calluna”), console = TRUE)

Setting the trt argument to c(“pH”, “Calluna”) makes the function carry out all pair-wise

comparisons among the mean values defined by each treatment combination. The output

that matters is the table at the very end, which shows the group identities as letters, the

treatment names, and the treatment means. This just reiterates what we already knew—

there are two ‘not significantly different’ groups, defined by pH 5.5 with Calluna absent, and

‘everything else.’

Multiple comparison tests for main effects

As mentioned above, in this experiment there is no point in trying to make further

comparisons between the means from the main treatments (pH 3.5 and 5.5, or with and

without Calluna) since (a) there is a significant interaction, so detailed comparisons of the

main effects are hard to interpret, and (b) even if that was not the case there are only two

levels in each treatment so any difference must be between those two levels!

However, it is quite common to have experiments with more than two levels in one or both

factors. If the ANOVA indicates that there is a significant effect of one, or both, of the

associated effects, and there is no interaction to worry about (don’t forget this caveat), then

you may wish to carry out multiple comparisons for the means associated with the main

effects.

This can be done using a Tukey test just as we did for the interaction in this example. The only

difference is that we have to specify the name of the main effect you are interested in. For

example, if we wanted to use TukeyHSD function to evaluate the significance of the pH main

effects, we would use: TukeyHSD(festuca_aov, which = ‘pH’)

Drawing conclusions and presenting results

In the results section of the report we will need to provide a succinct factual summary of the

analysis:

There were significant effects of soil pH (ANOVA: F=28.18, df=1,16, p<0.001), competition

with Calluna (F=14.4, df=1,16, p=0.002) and the interaction between these treatments

(F=7.61, df=1,16, p=0.014) on the dry weight yield of Festuca. Festuca grew much better in

the absence of Calluna at high pH than in any other treatment combination (Tukey multiple

comparison test p<0.05) (Figure 1).