I am a passionate R hacker and have authored more than 30 packages for R ranging from cloud computing clients, to original statistical tools, to just-for-fun packages like meme. Most of my work now focuses on tooling for cloudyr open source projects, a handful of projects for rOpenSci, and tools to replicate advanced Stata functionality in R. You can find source code for all of my open source projects on GitHub.
A few highlights are as follows:
The cloudyr project is an effort to connect R to cloud computing applications. My main contribution is a fully featured client for Amazon Web Services tools called awspack. See the cloudyr website or GitHub page for more details.
ghit: Lightweight GitHub Package Installer is a lightweight, vectorized drop-in replacement for
devtools::install_github() that uses native git and R methods to clone and install a package from GitHub. It provides a lighter weight alternative to devtools with a very similar API, slightly different defaults, and completely rebuilt internals.
MTurkR is a client library providing access to the Amazon Mechanical Turk crowdsourcing platform through R. It also has a Graphical User Interface and Wiki with advice on using the package and MTurk more generally.
prediction and margins together form an R port of Stata’s
margins command. prediction provides tidy, type-safe predictions from model objects and Stata-style predictive margins. margins can calculate (average) marginal effects and their variances from regression models. The latter is especially helpful for models with power terms, non-linear transformations, and interaction terms, and for generalized linear models.
rio makes data file import and export as easy as possible by relying on file extensions to make a (reasonable) assumption about how to read a file into a data.frame or, conversely, save a data.frame to disk. It greatly simplifies data import and export and offers a function for easily converting between file formats (possibly from the command line).
tabulizer provides R bindings to the Tabula java library, which extracts tables from PDF documents using a small set of really powerful and accurate algorithms. tabulizer provides a thin client around Tabula, and provides a handy interactive mode to identifying tables in PDFs directly within an R graphics window.
UNF is an R package for generating variable- and dataset-level universal numeric fingerprint signatures to uniquely identify data. UNF signatures provide a way to uniquely and persistently identify (a version of) a dataset. The UNF algorithm was created by Micah Altman and was updated to version 5 of the UNF algorithm in the current package, which I maintain. The UNF package also provides UNF-based functions to identify discrepancies between data frames and works well with the dvn package, listed above, for comparing Dataverse-stored datasets against local copies.
Except where noted, this website is licensed under a Creative Commons Attribution 4.0 International License.