Interactions are important, but they're hard to understand without visualization. This script works through how to visualize interactions in linear regression models.
set.seed(1)
x1 <- rnorm(200)
x2 <- rbinom(200, 1, 0.5)
y <- x1 + x2 + (2 * x1 * x2) + rnorm(200)
Interactions (at least in fake data) tend to produce weird plots:
plot(y ~ x1)
plot(y ~ x2)
This means they also produce weird residual plots:
ols1 <- lm(y ~ x1 + x2)
plot(ols1$residuals ~ x1)
plot(ols1$residuals ~ x2)
For example, in the first plot we find that there are clearly two relationships between y
and x1
, one positive and one negative.
We thus want to model this using an interaction:
ols2 <- lm(y ~ x1 + x2 + x1:x2)
summary(ols2)
##
## Call:
## lm(formula = y ~ x1 + x2 + x1:x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.855 -0.680 -0.002 0.682 3.769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0627 0.1103 -0.57 0.57
## x1 1.1199 0.1258 8.90 3.7e-16 ***
## x2 1.1303 0.1538 7.35 5.3e-12 ***
## x1:x2 1.9017 0.1672 11.37 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.09 on 196 degrees of freedom
## Multiple R-squared: 0.821, Adjusted R-squared: 0.818
## F-statistic: 299 on 3 and 196 DF, p-value: <2e-16
Note: This is equivalent to either of the following:
summary(lm(y ~ x1 + x2 + x1 * x2))
##
## Call:
## lm(formula = y ~ x1 + x2 + x1 * x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.855 -0.680 -0.002 0.682 3.769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0627 0.1103 -0.57 0.57
## x1 1.1199 0.1258 8.90 3.7e-16 ***
## x2 1.1303 0.1538 7.35 5.3e-12 ***
## x1:x2 1.9017 0.1672 11.37 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.09 on 196 degrees of freedom
## Multiple R-squared: 0.821, Adjusted R-squared: 0.818
## F-statistic: 299 on 3 and 196 DF, p-value: <2e-16
summary(lm(y ~ x1 * x2))
##
## Call:
## lm(formula = y ~ x1 * x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.855 -0.680 -0.002 0.682 3.769
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0627 0.1103 -0.57 0.57
## x1 1.1199 0.1258 8.90 3.7e-16 ***
## x2 1.1303 0.1538 7.35 5.3e-12 ***
## x1:x2 1.9017 0.1672 11.37 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.09 on 196 degrees of freedom
## Multiple R-squared: 0.821, Adjusted R-squared: 0.818
## F-statistic: 299 on 3 and 196 DF, p-value: <2e-16
However, specifying only the interaction…
summary(lm(y ~ x1:x2))
##
## Call:
## lm(formula = y ~ x1:x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.390 -0.873 0.001 0.971 4.339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5305 0.0988 5.37 2.2e-07 ***
## x1:x2 3.0492 0.1415 21.55 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.4 on 198 degrees of freedom
## Multiple R-squared: 0.701, Adjusted R-squared: 0.7
## F-statistic: 464 on 1 and 198 DF, p-value: <2e-16
produces an incomplete (and thus invalid) model. Now let's figure out how to visualize this interaction based upon the complete/correct model.
Our example data are particularly simple. There are two groups (defined by x2
) and one covariate (x1
).
We can plot these two groups separately in order to see their distributions of y
as a function of x1
.
We can index our vectors in order to plot the groups separately in red and blue:
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
It is already clear that there is an interaction. Let's see if we plot the estimated effects.
The easiest way of examining interactions is with predicted outcomes plots.
We simply want to show the predicted value of the outcome based upon combinations of input variables.
We know that we can do this with the predict
function applied to some new data.
The expand.grid
function is help to build the necessary new data:
xseq <- seq(-5, 5, length.out = 100)
newdata <- expand.grid(x1 = xseq, x2 = c(0, 1))
Let's build a set of predicted values for our no-interaction model:
fit1 <- predict(ols1, newdata, se.fit = TRUE, type = "response")
Then do the same for our full model with the interaction:
fit2 <- predict(ols2, newdata, se.fit = TRUE, type = "response")
Now let's plot the original data, again. Then we'll overlay it with the predicted values for the two groups.
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
points(xseq, fit1$fit[1:100], type = "l", col = "red")
points(xseq, fit1$fit[101:200], type = "l", col = "blue")
The result is a plot that differentiates the absolute levels of y
in the two groups, but forces them to have equivalent slopes.
We know this is wrong.
Now let's try to plot the data with the correct fitted values, accounting for the interaction:
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
points(xseq, fit2$fit[1:100], type = "l", col = "red")
points(xseq, fit2$fit[101:200], type = "l", col = "blue")
This looks better. The fitted values lines correspond nicely to the varying slopes in our two groups.
But, we still need to add uncertainty. Luckily, we have the necessarily information in fit2$se.fit
.
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
points(xseq, fit2$fit[1:100], type = "l", col = "red")
points(xseq, fit2$fit[101:200], type = "l", col = "blue")
points(xseq, fit2$fit[1:100] - fit2$se.fit[1:100], type = "l", col = "red",
lty = 2)
points(xseq, fit2$fit[1:100] + fit2$se.fit[1:100], type = "l", col = "red",
lty = 2)
points(xseq, fit2$fit[101:200] - fit2$se.fit[101:200], type = "l", col = "blue",
lty = 2)
points(xseq, fit2$fit[101:200] + fit2$se.fit[101:200], type = "l", col = "blue",
lty = 2)
We can also produce the same plot through bootstrapping.
tmpdata <- data.frame(x1 = x1, x2 = x2, y = y)
myboot <- function() {
thisboot <- sample(1:nrow(tmpdata), nrow(tmpdata), TRUE)
coef(lm(y ~ x1 * x2, data = tmpdata[thisboot, ]))
}
bootcoefs <- replicate(2500, myboot())
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
apply(bootcoefs, 2, function(coefvec) {
points(xseq, coefvec[1] + (xseq * coefvec[2]), type = "l", col = rgb(1,
0, 0, 0.01))
points(xseq, coefvec[1] + (xseq * (coefvec[2] + coefvec[4])) + coefvec[3],
type = "l", col = rgb(0, 0, 1, 0.01))
})
## NULL
points(xseq, fit2$fit[1:100], type = "l")
points(xseq, fit2$fit[101:200], type = "l")
points(xseq, fit2$fit[1:100] - fit2$se.fit[1:100], type = "l", lty = 2)
points(xseq, fit2$fit[1:100] + fit2$se.fit[1:100], type = "l", lty = 2)
points(xseq, fit2$fit[101:200] - fit2$se.fit[101:200], type = "l", lty = 2)
points(xseq, fit2$fit[101:200] + fit2$se.fit[101:200], type = "l", lty = 2)
If we overlay our previous lines of top of this, we see that they produce the same result, above.
Of course, we may want to show confidence intervals rather than SEs. And this is simple.
We can reproduce the graph with 95% confidence intervals, using qnorm
to determine how much to multiple our SEs by.
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
points(xseq, fit2$fit[1:100], type = "l", col = "red")
points(xseq, fit2$fit[101:200], type = "l", col = "blue")
points(xseq, fit2$fit[1:100] - qnorm(0.975) * fit2$se.fit[1:100], type = "l",
lty = 2, col = "red")
points(xseq, fit2$fit[1:100] + qnorm(0.975) * fit2$se.fit[1:100], type = "l",
lty = 2, col = "red")
points(xseq, fit2$fit[101:200] - qnorm(0.975) * fit2$se.fit[101:200], type = "l",
lty = 2, col = "blue")
points(xseq, fit2$fit[101:200] + qnorm(0.975) * fit2$se.fit[101:200], type = "l",
lty = 2, col = "blue")
We can also use plots to visualize why we need to include constitutive terms in our interaction models. Recall that our model is defined as:
ols2 <- lm(y ~ x1 + x2 + x1:x2)
We can compare this to a model with only one term and the interaction:
ols3 <- lm(y ~ x1 + x1:x2)
fit3 <- predict(ols3, newdata, se.fit = TRUE, type = "response")
And plot its results:
plot(x1[x2 == 0], y[x2 == 0], col = rgb(1, 0, 0, 0.5), xlim = c(min(x1), max(x1)),
ylim = c(min(y), max(y)))
points(x1[x2 == 1], y[x2 == 1], col = rgb(0, 0, 1, 0.5))
points(xseq, fit3$fit[1:100], type = "l", col = "red")
points(xseq, fit3$fit[101:200], type = "l", col = "blue")
# We can compare these lines to those from the full model:
points(xseq, fit2$fit[1:100], type = "l", col = "red", lwd = 2)
points(xseq, fit2$fit[101:200], type = "l", col = "blue", lwd = 2)
By leaving out a term, we misestimate the effect of x1
in both groups.