Variables

Working with objects in R will become tedious if we don't give those objects names to refer to them in subsequent analysis. In R, we can “assign” an object a name that we can then reference subsequently. For example, rather than see the result of the expression 2+2, we can store the result of this expression and look at it later:

a <- 2 + 2

To see the value of the result, we simply call our variable's name:

## [1] 4

Thus the <- (less than and minus symbols together) mean assign the right-hand side to the name on the left-hand side. We can get the same result using = (an equal sign):

a = 2 + 2
a

## [1] 4

We can also, much more uncommonly, produce the same result by reversing the order of the statement and using a different symbol:

a <- 2 + 2
a

## [1] 4

This is very uncommon, though. The <- is the preferred assignment operator. When we assign an expression to a variable name, the result of the evaluated expression is saved. Thus, when we call a again later, we don't see 2+2 but instead see 4. We can overwrite the value stored in a variable by simply assigning something new to that variable:

a <- 2 + 2
a <- 3
a

## [1] 3

We can also copy a variable into a different name:

b <- a
b

## [1] 3

We may decide we don't need a variable any more and it is possible to remove that variable from the R environment using rm:

rm(a)

Sometimes we forget what we've done and want to see what variables we have floating around in our R environment. We can see them with ls:

ls()

##   [1] "a1"            "a2"            "allout"        "amat"         
##   [5] "b"             "b1"            "betas"         "between"      
##   [9] "bin"           "bmat"          "bootcoefs"     "c"            
##  [13] "c1"            "c2"            "c3"            "change"       
##  [17] "ci67"          "ci95"          "ci99"          "cmat"         
##  [21] "coef.mi"       "coefs.amelia"  "condmeans_x"   "condmeans_x2" 
##  [25] "condmeans_y"   "cumprobs"      "d"             "d1"           
##  [29] "d2"            "d3"            "d4"            "d5"           
##  [33] "df1"           "df2"           "dist"          "e"            
##  [37] "e1"            "e2"            "e3"            "e4"           
##  [41] "e5"            "englebert"     "f"             "fit1"         
##  [45] "fit2"          "fit3"          "FUN"           "g"            
##  [49] "g1"            "g2"            "grandm"        "grandse"      
##  [53] "grandvar"      "h"             "height"        "i"            
##  [57] "imp"           "imp.amelia"    "imp.mi"        "imp.mice"     
##  [61] "lm"            "lm.amelia.out" "lm.mi.out"     "lm.mice.out"  
##  [65] "lm1"           "lm2"           "lmfit"         "lmp"          
##  [69] "localfit"      "localp"        "logodds"       "logodds_lower"
##  [73] "logodds_se"    "logodds_upper" "m"             "m1"           
##  [77] "m2"            "m2a"           "m2b"           "m3a"          
##  [81] "m3b"           "me"            "me_se"         "means"        
##  [85] "mmdemo"        "model1"        "model2"        "myboot"       
##  [89] "mydf"          "mydf2"         "myformula"     "myttest"      
##  [93] "myttest2"      "myttest3"      "n"             "n1"           
##  [97] "n2"            "n3"            "new1"          "newdata"      
## [101] "newdata1"      "newdata2"      "newdf"         "newvar"       
## [105] "nx"            "ologit"        "ols"           "ols1"         
## [109] "ols2"          "ols3"          "ols3b"         "ols4"         
## [113] "ols5"          "ols5a"         "ols5b"         "ols6"         
## [117] "ols6a"         "ols6b"         "oprobit"       "oprobprobs"   
## [121] "out"           "p"             "p1"            "p2"           
## [125] "p2a"           "p2b"           "p3a"           "p3b"          
## [129] "p3b.fitted"    "part1"         "part2"         "plogclass"    
## [133] "plogprobs"     "pool.mice"     "ppcurve"       "pred1"        
## [137] "s"             "s.amelia"      "s.mi"          "s.mice"       
## [141] "s.orig"        "s.real"        "s1"            "s2"           
## [145] "s3"            "search"        "ses"           "ses.amelia"   
## [149] "sigma"         "slope"         "slopes"        "sm1"          
## [153] "sm2"           "smydf"         "sx"            "sy"           
## [157] "tmp1"          "tmp2"          "tmp3"          "tmp4"         
## [161] "tmpdata"       "tmpdf"         "tmpsplit"      "tmpx"         
## [165] "tmpz"          "tr"            "val"           "valcol"       
## [169] "w"             "weight"        "within"        "x"            
## [173] "X"             "x1"            "x1cut"         "x1t"          
## [177] "x2"            "X2"            "x2t"           "x3"           
## [181] "x4"            "x5"            "x6"            "xseq"         
## [185] "y"             "y1"            "y1s"           "y2"           
## [189] "y2s"           "y3"            "y3s"           "y4"           
## [193] "y5"            "y6"            "yt"            "z"            
## [197] "z1"            "z2"            "z5"            "z6"

This returns a character vector containing all of the names for all named objects currently in our R environment. It is also possible to remove ALL variables in our current R session. You can do that with the following:

# rm(list=ls())

Note: This is usually an option on the RGui dropdown menus and should only be done if you really want to remove everything. Sometimes you can also see an expression like:

b <- NULL

This expression does not remove the object, but instead makes its value NULL. NULL is different from missing (NA) because R (generally) ignores a NULL value whenever it sees it. You can see this in the difference between the following two vectors:

c(1, 2, NULL)

## [1] 1 2

c(1, 2, NA)

## [1]  1  2 NA

The first has two elements and the second has three. It is also possible to use the assign function to assign a value to name:

assign("x", 3)
x

## [1] 3

This is not common in interactive use of R but can be helpful at more advanced levels.

Variable naming rules

R has some relatively simple rules governing how objects can be named: (1) R object names are case sensitive, so a is not the same as A. This applies to objects and functions. (2) R object names (generally) must start with a letter or a period. (3) R object names can contain letters, numbers, periods (.), and underscores (_). (4) The names of R objects can be just about any length, but anything over about 10 characters gets annoying to type. CAUTION: We can violate some of these restrictions by naming things with backticks, but this can be confusing:

f <- 2
f

## [1] 2

f <- 3
f

## [1] 3

That makes sense and can allow us to name variables that start with a number. Then to call objects with these noncompliant names, we need to use the backticks:

`1f` <- 3
# Then try typing `1f` (with the backticks)

If we just called 1f, we would get an error. But this also means we can name objects with just a number as a name:

`4` <- 5
4

## [1] 4

# Then try typing `4` (with the backticks)

Which is kind of weird. It is best avoided.