Software

I am a passionate R hacker and have authored more than 30 packages for R ranging from cloud computing clients, to original statistical tools, to just-for-fun packages like meme. Most of my software work currently focuses on tooling for cloudyr open source projects and tools to bring various forms of Stata-style simplicity to R. You can find source code for all of my open source projects on GitHub.

My major commitment is to the cloudyr project, an effort to connect R to cloud computing applications. My main contribution is a fully featured client for Amazon Web Services tools called awspack. See the cloudyr website or GitHub page for more details.

I have also been working for several years on building tools for Stata-style predictions and marginal effects for regression models. prediction and margins together form an R port of Stata’s margins command. prediction provides tidy, type-safe predictions from model objects and Stata-style predictive margins. margins can calculate (average) marginal effects and their variances from regression models. The latter is especially helpful for models with power terms, non-linear transformations, and interaction terms, and for generalized linear models.

A few highlights of my other projects include:

dataverse provides access to The Dataverse Network APIs. dataverse is the current generation package, providing access to the complete functionality of current Dataverse installations.

MTurkR is a client library providing access to the Amazon Mechanical Turk crowdsourcing platform through R. It also has a Graphical User Interface and Wiki with advice on using the package and MTurk more generally.

rio makes data file import and export as easy as possible by relying on file extensions to make a (reasonable) assumption about how to read a file into a data.frame or, conversely, save a data.frame to disk. It greatly simplifies data import and export and offers a function for easily converting between file formats (possibly from the command line).

pdfcount is a simple one-line R package that provides reasonably reliable word counts for PDF documents. It is useful for counting words in LaTeX-generated PDF documents. A shiny app is included in the package and it can be accessed via a simple web interface: https://leeper.shinyapps.io/pdfcount/.

tabulizer provides R bindings to the Tabula java library, which extracts tables from PDF documents using a small set of really powerful and accurate algorithms. tabulizer provides a thin client around Tabula, and provides a handy interactive mode to identifying tables in PDFs directly within an R graphics window.

UNF is an R package for generating variable- and dataset-level universal numeric fingerprint signatures to uniquely identify data. UNF signatures provide a way to uniquely and persistently identify (a version of) a dataset. The UNF algorithm was created by Micah Altman and was updated to version 6 of the UNF algorithm in the current package, which I maintain. The UNF package also provides UNF-based functions to identify discrepancies between data frames and works well with the dataverse package, listed above, for comparing Dataverse-stored datasets against local copies.

Except where noted, this website is licensed under a Creative Commons Attribution 4.0 International License. Views expressed are solely my own, not those of any current, past, or future employer.