Interaction Plots

When we have continuous-by-continuous interactions in linear regression, it is impossible to directly interpret the coefficients on the interactions. In fact, it is just generally difficult to interpret these kinds of models. Often, a better approach is to translate one of the continuous variables into a factor and interpet the interaction-term coefficients on each level of that variable. Another approach is to visualize graphically. Both will give us the same inference.

Note: While interaction plots can help make effects interpretable, one of their major downsides is an inability to effectively convey statistical uncertainty. For this reason (and some of the other disadvantages that will become clear below), I would recommend these plots only for data summary but not for inference or prediction, or publication.

Let's start with some fake data:

set.seed(1)
x1 <- runif(100, 0, 1)
x2 <- sample(1:10, 100, TRUE)/10
y <- 1 + 2 * x1 + 3 * x2 + 4 * x1 * x2 + rnorm(100)

We've built a model that has a strong interaction between x1 and x2. We can model this as a continuous interaction:

m <- lm(y ~ x1 * x2)

Alternatively, we can treat x2 as a factor (because, while approximately continuous, it only takes on 10 discrete values):

m2 <- lm(y ~ x1 * factor(x2))

Let's look at the output of both models and see if we can make sense of them:

summary(m)
## 
## Call:
## lm(formula = y ~ x1 * x2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.916 -0.603 -0.109  0.580  2.383 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.616      0.541    2.99   0.0036 ** 
## x1             0.783      0.878    0.89   0.3748    
## x2             1.937      0.865    2.24   0.0274 *  
## x1:x2          5.965      1.370    4.35  3.3e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.962 on 96 degrees of freedom
## Multiple R-squared:  0.802,  Adjusted R-squared:  0.795 
## F-statistic:  129 on 3 and 96 DF,  p-value: <2e-16
summary(m2)
## 
## Call:
## lm(formula = y ~ x1 * factor(x2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2086 -0.5368 -0.0675  0.5007  2.3648 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)
## (Intercept)        3.5007     5.4427    0.64     0.52
## x1                -2.3644    10.4793   -0.23     0.82
## factor(x2)0.2     -1.5294     5.4711   -0.28     0.78
## factor(x2)0.3     -1.8828     5.5500   -0.34     0.74
## factor(x2)0.4     -1.2069     5.4991   -0.22     0.83
## factor(x2)0.5      0.2490     5.5039    0.05     0.96
## factor(x2)0.6     -0.8439     5.4614   -0.15     0.88
## factor(x2)0.7     -0.7045     5.4917   -0.13     0.90
## factor(x2)0.8     -0.3576     5.4764   -0.07     0.95
## factor(x2)0.9     -0.0365     5.5302   -0.01     0.99
## factor(x2)1       -0.3488     5.5207   -0.06     0.95
## x1:factor(x2)0.2   4.6778    10.5179    0.44     0.66
## x1:factor(x2)0.3   5.5067    10.5979    0.52     0.60
## x1:factor(x2)0.4   5.8601    10.5629    0.55     0.58
## x1:factor(x2)0.5   4.4851    10.6169    0.42     0.67
## x1:factor(x2)0.6   6.2043    10.5215    0.59     0.56
## x1:factor(x2)0.7   7.8928    10.5514    0.75     0.46
## x1:factor(x2)0.8   8.4370    10.5690    0.80     0.43
## x1:factor(x2)0.9   7.9411    10.5961    0.75     0.46
## x1:factor(x2)1     9.6787    10.5511    0.92     0.36
## 
## Residual standard error: 1.01 on 80 degrees of freedom
## Multiple R-squared:  0.819,  Adjusted R-squared:  0.776 
## F-statistic: 19.1 on 19 and 80 DF,  p-value: <2e-16

For our continuous-by-continuous interaction model, we have the interaction expressed as a single number: ~5.96. This doesn't tell us anything useful because its only interpretation is the additionaly expected value of y as an amount added to the intercept plus the coefficients on each covariate (but only for the point at which x1==1 and x2==1). Thus, while we might be inclined to talk about this as an interaction term, it really isn't…it's just a mostly meaningless number. In the second, continuous-by-factor model, things are more interpretable. Here our factor dummies for x2 tell us the expected value of y (if added to the intercept) when x1==0. Similarly, the factor-“interaction” dummies tell us the expected value y (if added to the intercept and the coefficient on x1) when x1==1. These seem more interpretable.

Three-Dimensional Interaction Plotting

Another approach to understanding continuous-by-continuous interaction terms is to plot them. We saw above that the continuous-by-factor model, while intretable, required a lot of numbers (in a large table) to communicate the relationships between x1, x2, and y. R offers a number of plotting functions to visualize these kinds of interaction “response surfaces”.

Let's start by estimating predicted values. Because by x1 and x2 are scaled [0,1], we'll just create a single vector of values on the 0-1 scale and use that for both of our prediction values.

nx <- seq(0, 1, length.out = 10)

The use of the outer function here is, again, a convenience because our input values are scaled [0,1]. Essentially, it builds a 10-by-10 matrix of input values and predicts y for each combination of x1 and x2.

z <- outer(nx, nx, FUN = function(a, b) predict(m, data.frame(x1 = a, x2 = b)))

We can look at the z matrix to see what is going on:

z
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10]
##  [1,] 1.616 1.831 2.046 2.262 2.477 2.692 2.907 3.122 3.338  3.553
##  [2,] 1.703 1.992 2.281 2.570 2.858 3.147 3.436 3.725 4.014  4.303
##  [3,] 1.790 2.152 2.515 2.877 3.240 3.602 3.965 4.327 4.690  5.052
##  [4,] 1.877 2.313 2.749 3.185 3.621 4.058 4.494 4.930 5.366  5.802
##  [5,] 1.964 2.474 2.983 3.493 4.003 4.513 5.022 5.532 6.042  6.552
##  [6,] 2.051 2.634 3.218 3.801 4.384 4.968 5.551 6.135 6.718  7.301
##  [7,] 2.138 2.795 3.452 4.109 4.766 5.423 6.080 6.737 7.394  8.051
##  [8,] 2.225 2.955 3.686 4.417 5.147 5.878 6.609 7.339 8.070  8.801
##  [9,] 2.312 3.116 3.920 4.725 5.529 6.333 7.138 7.942 8.746  9.550
## [10,] 2.399 3.277 4.155 5.033 5.910 6.788 7.666 8.544 9.422 10.300

All of the resulting functions require us to use this z matrix as the “height” of the plot at each combination of x1 and x2. Sounds a little crazy, but it will become clear once we do the plotting.

Perspective plots

A perspective plot draws a “response surface” (i.e., the values of the z matrix) across a two-dimensional grid. The plot is what you might typically think of when you hear “three-dimensional graph”. Let's take a look:

persp(nx, nx, z, theta = 45, phi = 10, shade = 0.75, xlab = "x1", ylab = "x2", 
    zlab = "y")

plot of chunk unnamed-chunk-8

Note: The theta parameter refers to the horizontal rotation of the plot and the phi parameter refers to the tilt of the plot (see ?persp). The plot shows us many things, especially: 1. The vertical height of the surface is the expected (predicted) value of y at each combination of x1 and x2. 2. The slope of the surface on each edge of the plot is a marginal effect. In other words, the shallow slope on the lefthand face of the plot is the marginal effect of x1 when x2==0. Similarly, the steep slope on the righthand face of the plot is the marginal effect of x2 when x1==1. The other marginal effects (x1|x2==1 and x2|x1==0) are hidden from our view on the back of the plot.

There are two problems with perspective plots: 1. Because they are two-dimensional representations of three-dimensional objects, their scales are deceiving. Clearly the “height” of the plot is bigger in the front than in the back. It is therefore only a heuristic. 2. Because they are three-dimensional, we cannot see the entire plot at once (as evidence by the two hidden marginal effects discussed above). There is nothing we can do about the first point, unless you want to use a 3D printer to print out the response surface. On the second point, however, we can see different rotations of the plot in order to get a better grasp on the various marginal effects.

Let's look at two different sets of rotations. One showing four plots on the diagonal (like above):

par(mai = rep(0.2, 4))
layout(matrix(1:4, nrow = 2, byrow = TRUE))
s <- sapply(c(45, 135, 225, 315), function(i) persp(nx, nx, z, theta = i, phi = 10, 
    shade = 0.75, xlab = "x1", ylab = "x2", zlab = "y"))

plot of chunk unnamed-chunk-9

The plot in the upper-left corner is the same one we saw above. But now, we see three additional rotations (imagine the plots rotating 90 degrees each, right-to-left), so the lower-right plot highlights the two “hidden” marginal effects from above.

Another set of plots shows the same plot at right angles, thus highlighting the marginal effects at approximately true scale but masking much of the curviture of the response surface:

par(mai = rep(0.2, 4))
layout(matrix(1:4, nrow = 2))
sapply(c(90, 180, 270, 360), function(i) persp(nx, nx, z, theta = i, phi = 10, 
    shade = 0.75, xlab = "x1", ylab = "x2", zlab = "y"))

plot of chunk unnamed-chunk-10

##             [,1]       [,2]       [,3]       [,4]
##  [1,]  1.225e-16 -2.000e+00 -3.674e-16  2.000e+00
##  [2,]  2.000e+00  2.449e-16 -2.000e+00 -4.898e-16
##  [3,] -1.410e-17 -1.727e-33  1.410e-17  3.454e-33
##  [4,] -1.000e+00  1.000e+00  1.000e+00 -1.000e+00
##  [5,] -3.473e-01 -4.253e-17  3.473e-01  8.506e-17
##  [6,]  1.419e-16 -3.473e-01  5.680e-17  3.473e-01
##  [7,]  2.268e-01  2.268e-01  2.268e-01  2.268e-01
##  [8,] -1.178e+00 -1.178e+00 -1.525e+00 -1.525e+00
##  [9,]  1.970e+00  2.412e-16 -1.970e+00 -4.824e-16
## [10,] -9.934e-17  1.970e+00  3.831e-16 -1.970e+00
## [11,]  3.999e-02  3.999e-02  3.999e-02  3.999e-02
## [12,] -3.955e+00 -3.955e+00 -1.986e+00 -1.986e+00
## [13,] -1.970e+00 -2.412e-16  1.970e+00  4.824e-16
## [14,]  9.934e-17 -1.970e+00 -3.831e-16  1.970e+00
## [15,] -3.999e-02 -3.999e-02 -3.999e-02 -3.999e-02
## [16,]  4.955e+00  4.955e+00  2.986e+00  2.986e+00

While this highlights the marginal effects somewhat nicely, the two left-hand plots are quite difficult to actually look at due to the shape of the interaction. Note: The plots can be colored in many interesting ways, but the details are complicated (see ? persp).

Matrix Plots

Because the perspective plots are somewhat difficult to interpret, we might want to produce a two-dimensional representation that better highlights our interaction without the confusion of flattening a three-dimensional surface to two dimensions. The image function supplies us with a way to use color (or grayscale, in the case below) to show the values of y across the x1-by-x2 matrix. We again supply arguments quite similar to above:

layout(1)
par(mai = rep(1, 4))
image(nx, nx, z, xlab = "x1", ylab = "x2", main = "Expected Y", col = gray(50:1/50))

plot of chunk unnamed-chunk-11

Here, the darker colors represent higher values of y. Because the mind can't interpret color differences as well as it can interpret differences in slope, the interaction becomes somewhat muddled. For example, the marginal effect of x2|x1==0 is much less steep than the marginal effect of x2|x1==1, but it is difficult to quantify that by comparing the difference between white and gray on the left-hand side of the plot to the difference between gray and black on the right-hand side of the plot (those differences in color representing the marignal effects. We could redraw the plot with some contour lines to try to better see things:

image(nx, nx, z, xlab = "x1", ylab = "x2", main = "Expected Y", col = gray(50:1/50))
contour(z = z, add = TRUE)

plot of chunk unnamed-chunk-12

Here we see that when x1==0, a change in x2 from 0 to 1 increases y from about 1 to about 4. By contrast, when x1==0, the same change in x2 is from about 3 to about 10, which is substantially larger.

Contour lines

Since the contours seemed to make all of the difference in terms of interpretability above, we could just draw those instead without the underlying image matrix:

filled.contour(z = z, xlab = "x1", ylab = "x2", main = "Expected Y", col = gray(20:1/20))

plot of chunk unnamed-chunk-13

Here we see the same relationship highlighted by the contour lines, but they are nicely scaled and the plot supplies a gradient scale (at right) to help quantify the different colors.

Thus we have several different ways to look at continuous-by-continuous interactions. All of these techniques have advantages and disadvantages, but all do a better job at clarifying the nature of the relationships between x1, x2, and y than does the standard regression model or even the continuous-by-factor model.