Software

I am a passionate R hacker and have authored more than 30 packages for R ranging from cloud computing clients, to original statistical tools, to just-for-fun packages like meme. Most of my work now focuses on tooling for cloudyr open source projects, a handful of projects for rOpenSci, and tools to replicate advanced Stata functionality in R. You can find source code for all of my open source projects on GitHub.

A few highlights are as follows:

The cloudyr project is an effort to connect R to cloud computing applications. My main contribution is a fully featured client for Amazon Web Services tools called awspack. See the cloudyr website or GitHub page for more details.

dataverse provides access to The Dataverse Network APIs. dataverse is the current generation package, providing access to the complete functionality of current Dataverse installations.

ghit: Lightweight GitHub Package Installer is a lightweight, vectorized drop-in replacement for devtools::install_github() that uses native git and R methods to clone and install a package from GitHub. It provides a lighter weight alternative to devtools with a very similar API, slightly different defaults, and completely rebuilt internals.

MTurkR is a client library providing access to the Amazon Mechanical Turk crowdsourcing platform through R. It also has a Graphical User Interface and Wiki with advice on using the package and MTurk more generally.

prediction and margins together form an R port of Stata’s margins command. prediction provides tidy, type-safe predictions from model objects and Stata-style predictive margins. margins can calculate (average) marginal effects and their variances from regression models. The latter is especially helpful for models with power terms, non-linear transformations, and interaction terms, and for generalized linear models.

rio makes data file import and export as easy as possible by relying on file extensions to make a (reasonable) assumption about how to read a file into a data.frame or, conversely, save a data.frame to disk. It greatly simplifies data import and export and offers a function for easily converting between file formats (possibly from the command line).

tabulizer provides R bindings to the Tabula java library, which extracts tables from PDF documents using a small set of really powerful and accurate algorithms. tabulizer provides a thin client around Tabula, and provides a handy interactive mode to identifying tables in PDFs directly within an R graphics window.

UNF is an R package for generating variable- and dataset-level universal numeric fingerprint signatures to uniquely identify data. UNF signatures provide a way to uniquely and persistently identify (a version of) a dataset. The UNF algorithm was created by Micah Altman and was updated to version 5 of the UNF algorithm in the current package, which I maintain. The UNF package also provides UNF-based functions to identify discrepancies between data frames and works well with the dvn package, listed above, for comparing Dataverse-stored datasets against local copies.


Creative Commons License Except where noted, this website is licensed under a Creative Commons Attribution 4.0 International License.