Matrices are a two-dimensional data structure that are quite useful, especially for statistics in R.
Just like in mathematical notation, an R matrix is an m-by-n grid of elements.
To create a matrix, we use the matrix
function, which we supply with several parameters including the content of the matrix and its dimensions.
If we just give a matrix a data
parameter, it produces a column vector:
matrix(1:6)
## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
## [5,] 5
## [6,] 6
If we want the matrix to have different dimensions we can specify nrow
and/or ncol
parameters:
matrix(1:6, nrow = 2)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
matrix(1:6, ncol = 3)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
matrix(1:6, nrow = 2, ncol = 3)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
By default, the data are filled into the resulting matrix “column-wise”.
If we specify byrow=TRUE
, the elements are instead filled in “row-wise”:
matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
Requesting a matrix smaller than the supplied data parameter will result in only some of the data being used and the rest discarded:
matrix(1:6, nrow = 2, ncol = 1)
## [,1]
## [1,] 1
## [2,] 2
Note: requesting a matrix with larger dimensions than the data produces a warning:
matrix(1:6, nrow = 2, ncol = 4)
## Warning: data length [6] is not a sub-multiple or multiple of the number
## of columns [4]
## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 1
## [2,] 2 4 6 2
In this example, we still receive a matrix but the matrix elements outside of our data are filled in automatically. This process is called “recycling” in which R repeats the data until it fills in the requested dimensions of the matrix.
Just as with using length
to count the elements in a vector, we can use several functions to measure a matrix object.
If we apply the function length
to matrix, it still counts all the elements in the matrix, but doesn't tell us about dimensions:
a <- matrix(1:10, nrow = 2)
length(a)
## [1] 10
If we want to get the number of rows in the matrix, we can use nrow
:
nrow(a)
## [1] 2
If we want to get the number of columns in the matrix, we can use ncol
:
ncol(a)
## [1] 5
We can also get the number of rows and the number of columns in a single call to dim
:
dim(a)
## [1] 2 5
We can also combine (or bind) vectors and/or matrices together using cbind
and rbind
.
rbind
is used to “row-bind” by stacking vectors and/or matrices on top of one another vertically.
cbind
is used to “column-bind” by stacking vectors and/or matrices next to one another horizontally.
rbind(1:3, 4:6, 7:9)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
cbind(1:3, 4:6, 7:9)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
We can also easily transpose a matrix using t
:
rbind(1:3, 4:6, 7:9)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
t(rbind(1:3, 4:6, 7:9))
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Indexing a matrix is very similar to indexing a vector, except now we have to account for two dimensions. The first dimension is rows. The second dimension is columns.
b <- rbind(1:3, 4:6, 7:9)
b[1, ] #' first row
## [1] 1 2 3
b[, 1] #' first column
## [1] 1 4 7
b[1, 1] #' element in first row and first column
## [1] 1
Just with vector indexing, we can extract multiple elements:
b[1:2, ]
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
b[1:2, 2:3]
## [,1] [,2]
## [1,] 2 3
## [2,] 5 6
And we can also use -
indexing:
b[-1, 2:3]
## [,1] [,2]
## [1,] 5 6
## [2,] 8 9
We can also use logical indexing in the same way:
b[c(TRUE, TRUE, FALSE), ]
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
b[, c(TRUE, FALSE, TRUE)]
## [,1] [,2]
## [1,] 1 3
## [2,] 4 6
## [3,] 7 9
It is sometimes helpful to extract the diagonal of matrix (e.g., the diagonal of a variance-covariance matrix)
Diagonals can be extracted using diag
:
diag(b)
## [1] 1 5 9
It is also possible to use diag
to assign new values to the diagonal of a matrix.
For example, we might want to make all of the diagonal elements 0:
b
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
diag(b) <- 0
b
## [,1] [,2] [,3]
## [1,] 0 2 3
## [2,] 4 0 6
## [3,] 7 8 0
We can also extra the upper or lower triangles of a matrix (e.g., to extract one half of a correlation matrix)
upper.tri
and lower.tri
produce logical matrices of the same dimension as the original matrix, which can then be used to index:
upper.tri(b) #' upper triangle
## [,1] [,2] [,3]
## [1,] FALSE TRUE TRUE
## [2,] FALSE FALSE TRUE
## [3,] FALSE FALSE FALSE
b[upper.tri(b)]
## [1] 2 3 6
lower.tri(b) #' lower triangle
## [,1] [,2] [,3]
## [1,] FALSE FALSE FALSE
## [2,] TRUE FALSE FALSE
## [3,] TRUE TRUE FALSE
b[lower.tri(b)]
## [1] 4 7 8
Recall that vectors can have named elements. Matrices can have named dimensions. Each row and column of a matrix can have a name that is supplied when it is created or added/modified later.
c <- matrix(1:6, nrow = 2)
Row names are added with rownames
:
rownames(c) <- c("Row1", "Row2")
Column names are added with colnames
:
colnames(c) <- c("x", "y", "z")
Dimension names can also be added initially when the matrix is created using the dimnames
parameter in matrix
:
matrix(1:6, nrow = 2, dimnames = list(c("Row1", "Row2"), c("x", "y", "z")))
## x y z
## Row1 1 3 5
## Row2 2 4 6
Dimension names can also be created in this way for only the rows or columns by using a NULL
value for one of the dimensions:
matrix(1:6, nrow = 2, dimnames = list(c("Row1", "Row2"), NULL))
## [,1] [,2] [,3]
## Row1 1 3 5
## Row2 2 4 6