R is obviously a statistical programming language and environment, so we can use it to do statistics. With any vector, we can calculate a number of statistics, including:
set.seed(1)
a <- rnorm(100)
mininum
min(a)
## [1] -2.215
maximum
max(a)
## [1] 2.402
We can get the minimum and maximum together with range
:
range(a)
## [1] -2.215 2.402
We can also obtain the minimum by sorting the vector (using sort
):
sort(a)[1]
## [1] -2.215
And we can obtain the maximum by sorting in the opposite order:
sort(a, decreasing = TRUE)[1]
## [1] 2.402
To calculate the central tendency, we have several options. mean
mean(a)
## [1] 0.1089
This is of course equivalent to:
sum(a)/length(a)
## [1] 0.1089
median
median(a)
## [1] 0.1139
In a vector with an even number of elements, this is equivalent to:
(sort(a)[length(a)/2] + sort(a)[length(a)/2 + 1])/2
## [1] 0.1139
In a vector with an odd number of elements, this is equivalent to:
a2 <- a[-1] #' drop first observation of `a`
sort(a2)[length(a2)/2 + 1]
## [1] 0.1533
We can also obtain measures of dispersion: Variance
var(a)
## [1] 0.8068
This is equivalent to:
sum((a - mean(a))^2)/(length(a) - 1)
## [1] 0.8068
Standard deviation
sd(a)
## [1] 0.8982
Which is equivalent to:
sqrt(var(a))
## [1] 0.8982
Or:
sqrt(sum((a - mean(a))^2)/(length(a) - 1))
## [1] 0.8982
There are also some convenience functions that provide multiple statistics.
The fivenum
function provides the five-number summary (minimum, Q1, median, Q3, and maximum):
fivenum(a)
## [1] -2.2147 -0.5103 0.1139 0.6934 2.4016
It is also possible to obtain arbitrary percentiles/quantiles from a vector:
quantile(a, 0.1) #' 10% quantile
## 10%
## -1.053
You can also specify a vector of quantiles:
quantile(a, c(0.025, 0.975))
## 2.5% 97.5%
## -1.671 1.797
quantile(a, seq(0, 1, by = 0.1))
## 0% 10% 20% 30% 40% 50% 60% 70% 80%
## -2.2147 -1.0527 -0.6139 -0.3753 -0.0767 0.1139 0.3771 0.5812 0.7713
## 90% 100%
## 1.1811 2.4016
The summary
function, applied to a numeric vector, provides those values and the mean:
summary(a)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.210 -0.494 0.114 0.109 0.692 2.400
Note: The summary
function returns different results if the vector is a logical, character, or factor.
For a logical vector, summary
returns some tabulations:
summary(as.logical(rbinom(100, 1, 0.5)))
## Mode FALSE TRUE NA's
## logical 62 38 0
For a character vector, summary
returns just some basic information about the vector:
summary(sample(c("a", "b", "c"), 100, TRUE))
## Length Class Mode
## 100 character character
For a factor, summary
returns a table of all values in the vector:
summary(factor(a))
## -2.2146998871775 -1.98935169586337 -1.80495862889104
## 1 1 1
## -1.52356680042976 -1.47075238389927 -1.37705955682861
## 1 1 1
## -1.27659220845804 -1.2536334002391 -1.22461261489836
## 1 1 1
## -1.12936309608079 -1.04413462631653 -0.934097631644252
## 1 1 1
## -0.835628612410047 -0.820468384118015 -0.743273208882405
## 1 1 1
## -0.709946430921815 -0.70749515696212 -0.68875569454952
## 1 1 1
## -0.626453810742332 -0.621240580541804 -0.612026393250771
## 1 1 1
## -0.589520946188072 -0.573265414236886 -0.568668732818502
## 1 1 1
## -0.54252003099165 -0.47815005510862 -0.473400636439312
## 1 1 1
## -0.443291873218433 -0.41499456329968 -0.394289953710349
## 1 1 1
## -0.367221476466509 -0.305388387156356 -0.304183923634301
## 1 1 1
## -0.253361680136508 -0.164523596253587 -0.155795506705329
## 1 1 1
## -0.135178615123832 -0.135054603880824 -0.112346212150228
## 1 1 1
## -0.102787727342996 -0.0593133967111857 -0.0561287395290008
## 1 1 1
## -0.0538050405829051 -0.0449336090152309 -0.0392400027331692
## 1 1 1
## -0.0161902630989461 0.00110535163162413 0.0280021587806661
## 1 1 1
## 0.0743413241516641 0.0745649833651906 0.153253338211898
## 1 1 1
## 0.183643324222082 0.188792299514343 0.267098790772231
## 1 1 1
## 0.291446235517463 0.329507771815361 0.332950371213518
## 1 1 1
## 0.341119691424425 0.36458196213683 0.370018809916288
## 1 1 1
## 0.387671611559369 0.389843236411431 0.398105880367068
## 1 1 1
## 0.417941560199702 0.475509528899663 0.487429052428485
## 1 1 1
## 0.556663198673657 0.558486425565304 0.569719627442413
## 1 1 1
## 0.575781351653492 0.593901321217509 0.593946187628422
## 1 1 1
## 0.610726353489055 0.61982574789471 0.689739362450777
## 1 1 1
## 0.696963375404737 0.700213649514998 0.738324705129217
## 1 1 1
## 0.763175748457544 0.768532924515416 0.782136300731067
## 1 1 1
## 0.821221195098089 0.881107726454215 0.918977371608218
## 1 1 1
## 0.943836210685299 1.06309983727636 1.10002537198388
## 1 1 1
## 1.12493091814311 1.16040261569495 1.1780869965732
## 1 1 1
## 1.20786780598317 1.35867955152904 1.43302370170104
## 1 1 1
## 1.46555486156289 1.51178116845085 1.58683345454085
## 1 1 1
## 1.59528080213779 1.98039989850586 2.17261167036215
## 1 1 1
## 2.40161776050478
## 1
A summary
of a dataframe will return the summary information separate for each column vector.
This may look produce different result for each column, depending on the class of the column:
summary(data.frame(a = 1:10, b = 11:20))
## a b
## Min. : 1.00 Min. :11.0
## 1st Qu.: 3.25 1st Qu.:13.2
## Median : 5.50 Median :15.5
## Mean : 5.50 Mean :15.5
## 3rd Qu.: 7.75 3rd Qu.:17.8
## Max. :10.00 Max. :20.0
summary(data.frame(a = 1:10, b = factor(11:20)))
## a b
## Min. : 1.00 11 :1
## 1st Qu.: 3.25 12 :1
## Median : 5.50 13 :1
## Mean : 5.50 14 :1
## 3rd Qu.: 7.75 15 :1
## Max. :10.00 16 :1
## (Other):4
A summary
of a list will return not very useful information:
summary(list(a = 1:10, b = 1:10))
## Length Class Mode
## a 10 -none- numeric
## b 10 -none- numeric
A summary
of a matrix returns a summary of each column separately (like a dataframe):
summary(matrix(1:20, nrow = 4))
## V1 V2 V3 V4
## Min. :1.00 Min. :5.00 Min. : 9.00 Min. :13.0
## 1st Qu.:1.75 1st Qu.:5.75 1st Qu.: 9.75 1st Qu.:13.8
## Median :2.50 Median :6.50 Median :10.50 Median :14.5
## Mean :2.50 Mean :6.50 Mean :10.50 Mean :14.5
## 3rd Qu.:3.25 3rd Qu.:7.25 3rd Qu.:11.25 3rd Qu.:15.2
## Max. :4.00 Max. :8.00 Max. :12.00 Max. :16.0
## V5
## Min. :17.0
## 1st Qu.:17.8
## Median :18.5
## Mean :18.5
## 3rd Qu.:19.2
## Max. :20.0