Tables

We often want to tabulate data (e.g., categorical data). R supplies tabulation functionality with the table function:

set.seed(1)
a <- sample(1:5, 25, TRUE)
a
##  [1] 2 2 3 5 2 5 5 4 4 1 2 1 4 2 4 3 4 5 2 4 5 2 4 1 2
table(a)
## a
## 1 2 3 4 5 
## 3 8 2 7 5

The result is a table, showing the names of each possible value and a frequency count of for each value. This looks similarly regardless of the class of the vector. Note: If the vector contains continuous data, the result may be unexpected:

table(rnorm(100))
## 
##   -2.22390027400994     -1.563782051071   -1.43758624082998 
##                   1                   1                   1 
##    -1.4250983947325   -1.28459935387219   -1.28074943178832 
##                   1                   1                   1 
##   -1.26361438497058   -1.23753842192996   -1.16657054708471 
##                   1                   1                   1 
##   -1.13038577760069    -1.0655905803883   -0.97228683550556 
##                   1                   1                   1 
##  -0.940649162618608  -0.912068366948338  -0.891921127284569 
##                   1                   1                   1 
##  -0.880871723252545  -0.873262111744435  -0.832043296117832 
##                   1                   1                   1 
##  -0.814968708869917  -0.797089525071965  -0.795339117255372 
##                   1                   1                   1 
##  -0.776776621764597   -0.69095383969683  -0.649471646796233 
##                   1                   1                   1 
##  -0.649010077708898  -0.615989907707918  -0.542888255010254 
##                   1                   1                   1 
##  -0.500696596002705  -0.452783972553158  -0.433310317456782 
##                   1                   1                   1 
##  -0.429513109491881  -0.424810283377287  -0.418980099421959 
##                   1                   1                   1 
##  -0.412519887482398  -0.411510832795067  -0.376702718583628 
##                   1                   1                   1 
##  -0.299215117897316  -0.289461573688223  -0.282173877322451 
##                   1                   1                   1 
##  -0.279346281854269  -0.275778029088027  -0.235706556439501 
##                   1                   1                   1 
##  -0.227328691424755  -0.224267885278309   -0.21951562675344 
##                   1                   1                   1 
##  -0.172623502645857  -0.119168762418038  -0.117753598165951 
##                   1                   1                   1 
##  -0.115825322156954 -0.0571067743838088 -0.0548774737115786 
##                   1                   1                   1 
## -0.0110454784656636 0.00837095999603331  0.0191563916602738 
##                   1                   1                   1 
##  0.0253828675878054  0.0465803028049967   0.046726172188352 
##                   1                   1                   1 
##  0.0652881816716207   0.119717641289537   0.133336360814841 
##                   1                   1                   1 
##    0.14377148075807   0.229019590694692   0.242263480859686 
##                   1                   1                   1 
##   0.248412648872596   0.250141322854153   0.252223448156132 
##                   1                   1                   1 
##   0.257338377155533   0.266137361672105   0.358728895971352 
##                   1                   1                   1 
##    0.36594112304922   0.377395645981701   0.435683299355719 
##                   1                   1                   1 
##   0.503607972233726   0.560746090888056   0.576718781896486 
##                   1                   1                   1 
##    0.59625901661066   0.618243293566247   0.646674390495345 
##                   1                   1                   1 
##    0.66413569989411   0.726750747385451    0.77214218580453 
##                   1                   1                   1 
##   0.781859184600258   0.804189509744908    0.83204712857239 
##                   1                   1                   1 
##   0.992160365445798   0.996543928544126   0.996986860909106 
##                   1                   1                   1 
##    1.08576936214569    1.10096910219409     1.1519117540872 
##                   1                   1                   1 
##    1.15653699715018    1.23830410085338    1.25408310644997 
##                   1                   1                   1 
##     1.2560188173061    1.29931230256343    1.45598840106634 
##                   1                   1                   1 
##    1.62544730346494    1.67829720781629    1.75790308981071 
##                   1                   1                   1 
##    2.44136462889459 
##                   1

We also often want to obtain percentages (i.e., the proportion of observations falling into each category). We can obtain this information by wrapping our table function in a prop.table function:

prop.table(table(a))
## a
##    1    2    3    4    5 
## 0.12 0.32 0.08 0.28 0.20

The result is a “proportion” table, showing the proportion of observations in each category. If we want percentages, we can simply multiply the resulting table by 100:

prop.table(table(a)) * 100
## a
##  1  2  3  4  5 
## 12 32  8 28 20

To get frequencies and proportions (or percentages) together, we can bind the two tables:

cbind(table(a), prop.table(table(a)))
##   [,1] [,2]
## 1    3 0.12
## 2    8 0.32
## 3    2 0.08
## 4    7 0.28
## 5    5 0.20
rbind(table(a), prop.table(table(a)))
##         1    2    3    4   5
## [1,] 3.00 8.00 2.00 7.00 5.0
## [2,] 0.12 0.32 0.08 0.28 0.2

In addition to these basic (univariate) tabulation functions, we can also tabulate in two or more dimensions. To obtain simple crosstabulations, we can still use table:

b <- rep(c(1, 2), length = 25)
table(a, b)
##    b
## a   1 2
##   1 0 3
##   2 5 3
##   3 1 1
##   4 5 2
##   5 2 3

The result is a crosstable with the first requested variable a as rows and the second as columns. With more than two variables, the table is harder to read:

c <- rep(c(3, 4, 5), length = 25)
table(a, b, c)
## , , c = 3
## 
##    b
## a   1 2
##   1 0 1
##   2 3 1
##   3 0 1
##   4 1 0
##   5 1 1
## 
## , , c = 4
## 
##    b
## a   1 2
##   1 0 0
##   2 2 2
##   3 0 0
##   4 2 2
##   5 0 0
## 
## , , c = 5
## 
##    b
## a   1 2
##   1 0 2
##   2 0 0
##   3 1 0
##   4 2 0
##   5 1 2

R supplies two additional functions that make reading these kinds of tables easier. The ftable function attempts to collapse the previous result into a more readable format:

ftable(a, b, c)
##     c 3 4 5
## a b        
## 1 1   0 0 0
##   2   1 0 2
## 2 1   3 2 0
##   2   1 2 0
## 3 1   0 0 1
##   2   1 0 0
## 4 1   1 2 2
##   2   0 2 0
## 5 1   1 0 1
##   2   1 0 2

The xtabs function provides an alternative way of requesting tabulations. This uses R's formula data structure (see 'formulas.r'). A righthand-only formula produces the same result as table:

xtabs(~a + b)
##    b
## a   1 2
##   1 0 3
##   2 5 3
##   3 1 1
##   4 5 2
##   5 2 3
xtabs(~a + b + c)
## , , c = 3
## 
##    b
## a   1 2
##   1 0 1
##   2 3 1
##   3 0 1
##   4 1 0
##   5 1 1
## 
## , , c = 4
## 
##    b
## a   1 2
##   1 0 0
##   2 2 2
##   3 0 0
##   4 2 2
##   5 0 0
## 
## , , c = 5
## 
##    b
## a   1 2
##   1 0 2
##   2 0 0
##   3 1 0
##   4 2 0
##   5 1 2

Table margins

With a crosstable, we can also add table margins using addmargins:

x <- table(a, b)
addmargins(x)
##      b
## a      1  2 Sum
##   1    0  3   3
##   2    5  3   8
##   3    1  1   2
##   4    5  2   7
##   5    2  3   5
##   Sum 13 12  25

Proportions in crosstables

As with a one-dimensional table, we can calculate proportions from an k-dimensional table:

prop.table(table(a, b))
##    b
## a      1    2
##   1 0.00 0.12
##   2 0.20 0.12
##   3 0.04 0.04
##   4 0.20 0.08
##   5 0.08 0.12

The default result is a table with proportions of the entire table. We can calculate row percentages with the margin parameter set to 1:

prop.table(table(a, b), 1)
##    b
## a        1      2
##   1 0.0000 1.0000
##   2 0.6250 0.3750
##   3 0.5000 0.5000
##   4 0.7143 0.2857
##   5 0.4000 0.6000

We can calculate column percentages with the margin parameter set to 2:

prop.table(table(a, b), 2)
##    b
## a         1       2
##   1 0.00000 0.25000
##   2 0.38462 0.25000
##   3 0.07692 0.08333
##   4 0.38462 0.16667
##   5 0.15385 0.25000