Two-way ANOVA in R
Our goal in this chapter is to learn how to work with two-way ANOVA models in R, using an
example from a plant competition experiment. The work flow is very similar to one-way
ANOVA in R. We’ll start with the problem and the data, and then work through model fitting,
evaluating assumptions, significance testing, and finally, presenting the results.
Competition between Calluna and Festuca
Plants have an optimal soil pH for growth, and this varies between species. Consequently we
would expect that if we grow two plants in competition with each other at different pH values
the effect of competition might vary according to the soil pH. In a recent study the growth of
the grass Festuca ovina (Sheep’s Fescue) in competition with the heather Calluna
vulgaris (Ling) was investigated in soils with different pH. Calluna is well adapted to grow on
very acidic soils such as on the Millstone grit and blanket bogs around
Sheffield. Festuca grows on soils with a much wider range of pH. We might hypothesise
that Calluna will be a better competitor of Festuca in very acid soils than in moderately acid
To test this hypothesis an experiment was designed in which Festuca seedlings were grown
in pots at all combinations of two levels of two different kinds of treatment:
Factor 1: Soil pH at 3.5 or 5.5
Factor 2: Presence or absence of Calluna.
This is a fully factorial, two-way design. The total number of treatments was thus 2×2=4. For
each of the treatments there were 5 replicate pots, giving a total of 2×2×5=20 pots.
The following data are the yields of Festuca from each treatment (dry weight in g) from the
two pH levels and in the presence or absence of Calluna.
|pH 3.5||pH 5.5|
|Calluna Present||2.76, 2.39, 3.54, 3.71, 2.49||3.21, 4.10, 3.04, 4.13, 5.21|
|Calluna Absent||4.10, 2.72, 2.28, 4.43, 3.31||5.92, 7.31, 6.10, 5.25, 7.45|
You should begin working through the Festuca example from this point.
The data are in a CSV file called Festuca.csv. Start by reading the data into an R data frame,
giving it the name festuca, and then have a look at the data.
Notice that the data for a two-factor experiment can be laid out in a very similar manner to
those from a one factor experiment, with the response variable in one column, but now there
are two additional columns containing the codes for the treatments, one for pH and one
for Calluna. The first column (Weight) contains the Festuca dry weights, the second column
(pH) contains the codes for the pH treatment (levels: pH3.5, pH5.5), the third column (Calluna)
contains the codes for the presence or absence of Calluna (levels: Present, Absent).
A couple of points are worth noting at this point:
1. As is always the case in this book, the data are ‘tidy.’ Each experimental factor is in
one column and each observation is in a single row. The same applies to the majority
of R’s statistical modelling tools—they expect data to be supplied in this format.
2. We avoided using numbers to encode the levels of the pH treatment. This is
important, as it ensures that the pH variable will be converted into a factor rather than
a number when we read the data into R. We have said it before, but it is worth saying
one more time: fewer mistakes will occur if we use words to encode the levels of a
Visualising the data
We should take a look at the data before doing anything with it. We only have five replicates
per treatment combination, so any figure we produce is going to provide only limited
information. Five replicates is just about sufficient for a box plot.
boxplot(Weight~factor(Calluna)+pH,data=festuca) #note here that we have added pH as well!
The main purpose of a plot like this is to help us understand what the treatments are doing.
We want to quickly assess things like: How big are the main effects? What direction do they
work in? Is there likely to be an interaction? It looks like the higher pH conditions tends to
increase Festuca growth, and the presence of Calluna tends to reduce it (yes, plants
compete!). The more interesting observation is that the effect of Calluna seems to be greater
at higher pH. It looks like we might have an interaction.
The next thing we should do is check whether any of the assumptions have been violated. To
do this we need to fit the model first.
Fitting the ANOVA model
Carrying out a two-way ANOVA in R is really no different from one-way ANOVA. It still involves
two steps. First we have to fit the model using the lm function, remembering to store the
fitted model object. This is the step where R calculates the relevant means, along with the
additional information needed to generate the results in step two. The second step uses
the anova function to calculate F-statistics, degrees of freedom, and p-values.
Here is the R code needed to carry out the model-fitting step with lm:
festuca_model <- lm(Weight ~ pH + Calluna + pH : Calluna, data = festuca)
This is very similar to the R code used to fit a one-way ANOVA model. The first argument is a
formula (notice the ‘tilde’ symbol: ~) and the second argument is the name of the data frame
that contains all the variables listed in the formula. That’s all we need.
The specific model fitted by lm is a result of 1) the type of variables referenced in the formula,
and 2) the symbols used to define the terms in the formula. To ensure that we have fitted an
ANOVA model, the variables which appear to the right of the ~ must be factors or character
vectors—an ANOVA only involves factors. The variable name to the left of the ~ is the numeric
response variable we are analysing. We know that Calluna and pH are factors, so we can be
certain that lm has fitted some kind of ANOVA model.
What kind of ANOVA have we fitted, i.e., what are the terms on the right hand side of the
formula doing? Here is the formula we used:
Weight ~ pH + Calluna + pH:Calluna
There are three terms, each separated by a + symbol: pH, Calluna and pH:Calluna. This tells R
that we want to fit a model that accounts for the main effects of pH and Calluna, and that we
also wish to include the interaction between these two factors. The specification of the main
effects is fairly self-explanatory—we just include the name of each factor variable in the
formula. The interaction term is less obvious. It is specified by a colon (the : symbol) with the
two interacting variables either side of it.
In summary… 1) the ~ symbol specifies a formula in R, where the name on the left hand side
is the response variable we are analysing, and the names on the right denote the terms in the
model; 2) we place a + between terms to delineate them (we are not adding anything up
when the + is used in a formula); 3) each main effect is specified by the corresponding factor
name; and 4) an interaction between factors is specified by the : symbol.
Notice that we assigned the result a name (festuca_model) which now refers to the model
object produced by lm. Just as with a one-way ANOVA, we can’t extract p-values by printing
this object to the console, because all this gives us is a limited summary of the fitted model’s
We’re not going to worry about what those mean for a two-way ANOVA. We’re going to
use anova to calculate things like degrees of freedom, sum of squares, mean squares, Fstatistics, and finally, the p-values. But… before we do that, let’s check our assumptions now
that we have a fitted model object.
We’re going to produce two diagnostic plots: a normal probability plot to evaluate the
normality assumption, and a scale-location plot allows us to evaluate the constant variance
assumption. Here’s the normal probability plot:
plot(festuca_model, which = 2, add.smooth = FALSE)
This plot allows us to check whether the deviations from the group means (the residuals) are
likely to have been drawn from a normal distribution. This looks… not so great. The points
deviate from the line in a systematic way so it looks like the normality assumption may not
be satisfied. The left tail is above the line and the right tail is below it. This tells us that the
tails of the residual distribution do not extend out as far as they should—the distribution is
‘squashed’ toward its middle.
The scale-location plot allows us to see whether or not the variability of the residuals is
roughly constant within each group. Here’s the plot:
plot(festuca_model, which = 3, add.smooth = FALSE)
We’re on the lookout for a systematic pattern in the size of the residuals and the fitted
values—does the variability go up or down with the fitted values? There is no such pattern so
it looks like the constant variance assumption is at least satisfied here.
We’ve identified one potential problem. We’ll ignore it for now and press on. The goal here
is to learn the work flow for two-way ANOVA in R. However, keep in mind that if we were
serious about the analysis we should find a way to ‘fix’ it, for example using the methods we
discussed in the data visualisation section e.g. log transformation.
Interpreting the results
Now we’re ready to calculate the degrees of freedom, sums of squares, mean squares, the Fratio, and p-values for the main effects and the interaction terms:
What does all this mean? We interpret each line of the ANOVA table in exactly the same way
as we do for a one-way ANOVA. The first part tells us what kind of output we are looking at:
This reminds us that we are looking at an ANOVA table where our response variable was
called Weight. The table contains the key information:
This ANOVA table is similar to the ones we have already seen, except that now we have to
consider three lines—one for each term in the model. The first is for the main effect of pH,
the second for the main effect of Calluna, the third for the interaction between pH
The F-ratio is the test statistic for each term. These provide a measure of how large and
consistent the effects associated with each term are. Each F-ratio has a pair of degrees of
freedom associated with it: one belonging to the term itself, the other due to the error
(residual). Together, the F-ratio and its degrees of freedom determines the p-value.
The p-value gives the probability that the differences between the set of means for each term
in the model, or a more extreme difference, could have arisen through sampling variation
under the null hypothesis of no difference. We take p < 0.05 as evidence that at least one of
the treatments is having an effect. Here, p < 0.05 for three effects, so we conclude both main
effects and the interaction are significant (though at different significance levels).
The ANOVA table tells us nothing about the direction of the effects. We have to delve a little
further into the fitted model or plot the data to be able to do this. The presence of an
interaction between treatments indicates that the impact of one factor depends on the levels
of the other factor. This means that if there is a significant interaction in a two-way ANOVA,
then the main effects should be interpreted with care.
An ‘interaction diagram’ provides a good way to think about these issues…
Understanding the model graphically
How should we go about interpreting the significant effects? To reiterate, the interaction tells
us that the magnitude, or even direction, of the effect of one factor is dependent upon the
levels of the other factor. In other words the treatment effects are contingent on one another.
This contingency can arise in a number of ways, giving rise to different mixtures of main
effects and interactions. This is illustrated most easily by considering some hypothetical
results from a pH/Calluna experiment of this sort, in schematic form (the Calluna bars are
linked by a dotted line):
Diagrams such as these are sometimes called ‘interaction diagrams’ and they are often the
best way of looking at the results from this sort of experiment to try and interpret what is
happening. You will notice that the lines linking the treatments are parallel when there is no
interaction, but become non-parallel when an interaction is present. An interaction may just
mildly change the way the main effects work (4th plot) or it might completely reverse the
effects (5th plot).
You can plot interactions in R using the interaction.plot function. Have a play with the data
and see what you can produce. Remember to use ?interaction.plot to look at what code you
need to enter.
Multiple comparison tests
Obviously, since the main treatments only have two levels there is no need for any multiple
comparison tests on the main effects — if there is a difference it must be between the two
levels. However, the interaction is significant, so it may be desirable to know which particular
treatment combinations differ. Predictably, the work flow is very similar to that applied to a
one-way ANOVA model.
We could use the TukeyHSD function to do this. We start by converting the model object
produced by lm into an aov object…
festuca_aov <- aov(festuca_model)
…and then we perform a Tukey HSD test:
TukeyHSD(festuca_aov, which = ‘pH:Calluna’)
We have suppressed the output for now. The only new tweak that we have to learn is
the which argument. Assigning this the value ‘pH:Calluna’ makes the TukeyHSD function carry
out all pairwise comparisons among the means of each treatment combination, i.e., we are
considering the full set of interactions. Here is the output:
You extract information from this table just as you did before. The table present a series of
pair-wise comparisons between mean values tested by the Tukey procedure. For example,
the first 3 lines show the significance of differences between the mean of the treatment
combination pH 3.5 without Calluna and the 3 other mean values. All we need from this table
is to note the codes for the treatment means which are being compared (listed in the first
column), and the p-value in each case listed in the final column.
There are three significant differences, all of which involve the treatment combinations pH
5.5 with Calluna absent. This implies that there are two ‘not significantly different’ groups:
one defined by pH 5.5 with Calluna absent, and then everything else.
As you might expect, we don’t have to step through the results of the TukeyHSD function to
define the ‘not significantly different’ groups. We used the agricolae package to do this for a
one-way ANOVA. We can use this again here. We need to load and attach the package first:
Once the package is ready for use, we can carry out the Tukey HSD test to find the ‘not
significantly different’ groups using the HSD.test function:
HSD.test(festuca_aov, trt = c(“pH”, “Calluna”), console = TRUE)
Setting the trt argument to c(“pH”, “Calluna”) makes the function carry out all pair-wise
comparisons among the mean values defined by each treatment combination. The output
that matters is the table at the very end, which shows the group identities as letters, the
treatment names, and the treatment means. This just reiterates what we already knew—
there are two ‘not significantly different’ groups, defined by pH 5.5 with Calluna absent, and
Multiple comparison tests for main effects
As mentioned above, in this experiment there is no point in trying to make further
comparisons between the means from the main treatments (pH 3.5 and 5.5, or with and
without Calluna) since (a) there is a significant interaction, so detailed comparisons of the
main effects are hard to interpret, and (b) even if that was not the case there are only two
levels in each treatment so any difference must be between those two levels!
However, it is quite common to have experiments with more than two levels in one or both
factors. If the ANOVA indicates that there is a significant effect of one, or both, of the
associated effects, and there is no interaction to worry about (don’t forget this caveat), then
you may wish to carry out multiple comparisons for the means associated with the main
This can be done using a Tukey test just as we did for the interaction in this example. The only
difference is that we have to specify the name of the main effect you are interested in. For
example, if we wanted to use TukeyHSD function to evaluate the significance of the pH main
effects, we would use: TukeyHSD(festuca_aov, which = ‘pH’)
Drawing conclusions and presenting results
In the results section of the report we will need to provide a succinct factual summary of the
There were significant effects of soil pH (ANOVA: F=28.18, df=1,16, p<0.001), competition
with Calluna (F=14.4, df=1,16, p=0.002) and the interaction between these treatments
(F=7.61, df=1,16, p=0.014) on the dry weight yield of Festuca. Festuca grew much better in
the absence of Calluna at high pH than in any other treatment combination (Tukey multiple
comparison test p<0.05) (Figure 1).