We often want to tabulate data (e.g., categorical data).
R supplies tabulation functionality with the table
function:
set.seed(1)
a <- sample(1:5, 25, TRUE)
a
## [1] 2 2 3 5 2 5 5 4 4 1 2 1 4 2 4 3 4 5 2 4 5 2 4 1 2
table(a)
## a
## 1 2 3 4 5
## 3 8 2 7 5
The result is a table, showing the names of each possible value and a frequency count of for each value. This looks similarly regardless of the class of the vector. Note: If the vector contains continuous data, the result may be unexpected:
table(rnorm(100))
##
## -2.22390027400994 -1.563782051071 -1.43758624082998
## 1 1 1
## -1.4250983947325 -1.28459935387219 -1.28074943178832
## 1 1 1
## -1.26361438497058 -1.23753842192996 -1.16657054708471
## 1 1 1
## -1.13038577760069 -1.0655905803883 -0.97228683550556
## 1 1 1
## -0.940649162618608 -0.912068366948338 -0.891921127284569
## 1 1 1
## -0.880871723252545 -0.873262111744435 -0.832043296117832
## 1 1 1
## -0.814968708869917 -0.797089525071965 -0.795339117255372
## 1 1 1
## -0.776776621764597 -0.69095383969683 -0.649471646796233
## 1 1 1
## -0.649010077708898 -0.615989907707918 -0.542888255010254
## 1 1 1
## -0.500696596002705 -0.452783972553158 -0.433310317456782
## 1 1 1
## -0.429513109491881 -0.424810283377287 -0.418980099421959
## 1 1 1
## -0.412519887482398 -0.411510832795067 -0.376702718583628
## 1 1 1
## -0.299215117897316 -0.289461573688223 -0.282173877322451
## 1 1 1
## -0.279346281854269 -0.275778029088027 -0.235706556439501
## 1 1 1
## -0.227328691424755 -0.224267885278309 -0.21951562675344
## 1 1 1
## -0.172623502645857 -0.119168762418038 -0.117753598165951
## 1 1 1
## -0.115825322156954 -0.0571067743838088 -0.0548774737115786
## 1 1 1
## -0.0110454784656636 0.00837095999603331 0.0191563916602738
## 1 1 1
## 0.0253828675878054 0.0465803028049967 0.046726172188352
## 1 1 1
## 0.0652881816716207 0.119717641289537 0.133336360814841
## 1 1 1
## 0.14377148075807 0.229019590694692 0.242263480859686
## 1 1 1
## 0.248412648872596 0.250141322854153 0.252223448156132
## 1 1 1
## 0.257338377155533 0.266137361672105 0.358728895971352
## 1 1 1
## 0.36594112304922 0.377395645981701 0.435683299355719
## 1 1 1
## 0.503607972233726 0.560746090888056 0.576718781896486
## 1 1 1
## 0.59625901661066 0.618243293566247 0.646674390495345
## 1 1 1
## 0.66413569989411 0.726750747385451 0.77214218580453
## 1 1 1
## 0.781859184600258 0.804189509744908 0.83204712857239
## 1 1 1
## 0.992160365445798 0.996543928544126 0.996986860909106
## 1 1 1
## 1.08576936214569 1.10096910219409 1.1519117540872
## 1 1 1
## 1.15653699715018 1.23830410085338 1.25408310644997
## 1 1 1
## 1.2560188173061 1.29931230256343 1.45598840106634
## 1 1 1
## 1.62544730346494 1.67829720781629 1.75790308981071
## 1 1 1
## 2.44136462889459
## 1
We also often want to obtain percentages (i.e., the proportion of observations falling into each category).
We can obtain this information by wrapping our table
function in a prop.table
function:
prop.table(table(a))
## a
## 1 2 3 4 5
## 0.12 0.32 0.08 0.28 0.20
The result is a “proportion” table, showing the proportion of observations in each category. If we want percentages, we can simply multiply the resulting table by 100:
prop.table(table(a)) * 100
## a
## 1 2 3 4 5
## 12 32 8 28 20
To get frequencies and proportions (or percentages) together, we can bind the two tables:
cbind(table(a), prop.table(table(a)))
## [,1] [,2]
## 1 3 0.12
## 2 8 0.32
## 3 2 0.08
## 4 7 0.28
## 5 5 0.20
rbind(table(a), prop.table(table(a)))
## 1 2 3 4 5
## [1,] 3.00 8.00 2.00 7.00 5.0
## [2,] 0.12 0.32 0.08 0.28 0.2
In addition to these basic (univariate) tabulation functions, we can also tabulate in two or more dimensions.
To obtain simple crosstabulations, we can still use table
:
b <- rep(c(1, 2), length = 25)
table(a, b)
## b
## a 1 2
## 1 0 3
## 2 5 3
## 3 1 1
## 4 5 2
## 5 2 3
The result is a crosstable with the first requested variable a
as rows and the second as columns
.
With more than two variables, the table is harder to read:
c <- rep(c(3, 4, 5), length = 25)
table(a, b, c)
## , , c = 3
##
## b
## a 1 2
## 1 0 1
## 2 3 1
## 3 0 1
## 4 1 0
## 5 1 1
##
## , , c = 4
##
## b
## a 1 2
## 1 0 0
## 2 2 2
## 3 0 0
## 4 2 2
## 5 0 0
##
## , , c = 5
##
## b
## a 1 2
## 1 0 2
## 2 0 0
## 3 1 0
## 4 2 0
## 5 1 2
R supplies two additional functions that make reading these kinds of tables easier.
The ftable
function attempts to collapse the previous result into a more readable format:
ftable(a, b, c)
## c 3 4 5
## a b
## 1 1 0 0 0
## 2 1 0 2
## 2 1 3 2 0
## 2 1 2 0
## 3 1 0 0 1
## 2 1 0 0
## 4 1 1 2 2
## 2 0 2 0
## 5 1 1 0 1
## 2 1 0 2
The xtabs
function provides an alternative way of requesting tabulations.
This uses R's formula data structure (see 'formulas.r').
A righthand-only formula produces the same result as table
:
xtabs(~a + b)
## b
## a 1 2
## 1 0 3
## 2 5 3
## 3 1 1
## 4 5 2
## 5 2 3
xtabs(~a + b + c)
## , , c = 3
##
## b
## a 1 2
## 1 0 1
## 2 3 1
## 3 0 1
## 4 1 0
## 5 1 1
##
## , , c = 4
##
## b
## a 1 2
## 1 0 0
## 2 2 2
## 3 0 0
## 4 2 2
## 5 0 0
##
## , , c = 5
##
## b
## a 1 2
## 1 0 2
## 2 0 0
## 3 1 0
## 4 2 0
## 5 1 2
With a crosstable, we can also add table margins using addmargins
:
x <- table(a, b)
addmargins(x)
## b
## a 1 2 Sum
## 1 0 3 3
## 2 5 3 8
## 3 1 1 2
## 4 5 2 7
## 5 2 3 5
## Sum 13 12 25
As with a one-dimensional table, we can calculate proportions from an k-dimensional table:
prop.table(table(a, b))
## b
## a 1 2
## 1 0.00 0.12
## 2 0.20 0.12
## 3 0.04 0.04
## 4 0.20 0.08
## 5 0.08 0.12
The default result is a table with proportions of the entire table.
We can calculate row percentages with the margin
parameter set to 1:
prop.table(table(a, b), 1)
## b
## a 1 2
## 1 0.0000 1.0000
## 2 0.6250 0.3750
## 3 0.5000 0.5000
## 4 0.7143 0.2857
## 5 0.4000 0.6000
We can calculate column percentages with the margin
parameter set to 2:
prop.table(table(a, b), 2)
## b
## a 1 2
## 1 0.00000 0.25000
## 2 0.38462 0.25000
## 3 0.07692 0.08333
## 4 0.38462 0.16667
## 5 0.15385 0.25000