Thomas J. Leeper > Teaching > R Programming Course > Scripts

Tutorial Scripts

Below you will find links to a number of fully executable R scripts (written in roxygen comments) that walk through various aspects of R programming.


The R language


Data as dataframes

  • Loading/reading data into R: Script and Tutorial
    • Built-in data (data)
    • R data files (load)
    • Tabular data (read.csv, read.table, etc.)
    • Manual data entry and file scans (scan and readLines)
    • Reading foreign data (read.spss, read.dta, etc.)
  • Viewing dataframe structure: Script and Tutorial

  • Saving R data: Script and Tutorial
    • save and load
    • dput and dget
    • dump and source
    • write.csv and write.table
  • Dataframe rearrangement: Script and Tutorial
    • order
    • subset
    • split
    • sample

Data processing


Data summaries


Plotting as data summary

  • Data summary plots: Script and Tutorial

    • Rugs (Marginal distributions for scatterplots): Script and Tutorial
    • Local regression (LOWESS/LOESS) for scatterplots: Script and Tutorial
    • jitter for scatterplots of categorical data: Script and Tutorial
    • add ons: lines, segments (for error bars), polygon, points, abline, text, legend
    • Plotting functions with curve: Script and Tutorial
  • Graphical parameters: Script and Tutorial TODO

  • Plotting Colors: Script and Tutorial

  • Saving plots:
    • In RGui, you can use point-and-click menus to save plots, but it also possible to save plots using code. The appropriate function depends on the file format you desire for the resulting plot. The main ones are: pdf, jpeg, png, tiff, bmp, and svg. PDF and PNG are good choices, though TIFF is often required for academic publishing.
    • If building a plot in stages (e.g. overlaying different model fits), it is also possible to save the plot in different stages. This can be useful for building plots to be used in slides (e.g., to control the display the contents of a plot during the talk). Relevant functions here are: dev.print, dev.copy, dev2bitmap, and savePlot.
  • Note: There are several other graphics packages (including ggplot2, lattice, and grid). My personal preference is to rely on the flexibility of base graphics, but these alternative approaches are preferred by some.

Statistics

  • Basic parametric statistical tests: Script and Tutorial TODO
    • chisq.test
    • t.test
    • cor.test
    • prop.test
    • binom.test
  • One-way ANOVA (aov, oneway.test, and kruskal.test): Script and Tutorial

  • Nonparametric statistical tests (e.g., t.test versus wilcox_test)

  • Variance tests: Script and Tutorial TODO
    • var.test
    • fligner.test
    • bartlett.test
    • ansari.test
  • Permutation tests: Script and Tutorial

  • by and *apply

  • Statistical distributions
    • Probability density, cumulative distribution, and quantile functions: Script and Tutorial
    • Random number generation

Linear Regression (OLS)


Regression plotting

  • Plots for regression diagnostics: Script and Tutorial
    • Default plots from plot(lm)
    • Residual plots and qqplot
    • Scatterplots
  • Regression coefficient summary plots: Script and Tutorial

  • Plots for OLS linear effects: Script and Tutorial

  • Plots for interaction effects

    • Plots for binary interactions: Script and Tutorial
    • Plots for continuous-by-continuous interactions: Script and Tutorial
    • Predicted outcomes

Generalized Linear Models

The tutorials below supply a basic introduction to many GLM techniques. A guide to all of the available packages and functions for GLMs can be found in the Econometrics Task View.

  • Binary outcome models (and link functions): Script and Tutorial TODO -> bivariate and multivariate

    • Simple plots (Bivariate predicted probabilities): Script and Tutorial
    • Multivariate predicted probabilities, interactions, and marginal effects: Script and Tutorial
  • Ordered outcome models: Script and Tutorial

    • Estimation, predicted probabilities, and plots
  • Count outcome models: Script and Tutorial TODO

  • Multinomial outcome models: Script and Tutorial
    • Estimation, predicted probabilities, and plots
    • Multinomial logit is also available from the mlogit package
    • Multinomial probit is available in the MNP package
  • Survival models from survival TODO

  • Note: Gary King’s Zelig set of packages provides a slightly more unified interface for GLMs, but it is basically just a convenient wrapper for the functions described in the above tutorials.

Experiments


Reproducible research

  • Using source

  • Using sink

  • Comments: Script and Tutorial

  • Public data archiving with dvn: Script and Tutorial

  • Integration with Microsoft Word
  • knitr stitch

  • Integration with LaTeX reports
    • knit
    • xtable (also hmisc::Latex, apsrtable, and stargazer)
  • Presentations with beamer

  • Web publishing with Rmarkdown
    • knitr
    • R2HTML
    • Slidify

Repeated tasks

  • apply and *apply family
  • loops (for, while)
  • Split-Apply-Combine (by, split)
  • Sampling/Bootstrapping/permutations (sample and replicate)
  • Aggregation functions (ave, aggregate, etc.)

User-Defined functions

  • Variable scope and environments
  • Return values (return and invisible)
  • Custom classes
  • Default arguments
  • print and summary S3 methods

Over-time data

  • Time-series (ts class)
  • Panel data (plm)
  • Mixed effects
  • Multi-level models

Text processing

  • String manipulation: Script and Tutorial
  • Regular expressions: Script and Tutorial TODO
  • Reading and writing to console, files, and connections

Other advanced topics

  • File manipulation: list.files/dir, file.create, etc.
  • System calls: shell, shell.exec, and system
  • Bayes: MCMCpack, RJags, RBugs, RStan
  • Big data: data.table, parallel computing
  • Mapping
  • Web services: twitteR, MTurkR