How Can I Get Started Using R?
18 Jul 2015
I received an email this week that included the following:
Your posts, a long with those of numerous others, have convinced me that I should give R a shot. I am hoping that you would be so kind as to suggest some essential packages for political science research.
I appreciate the email and I’m thrilled to see another person interested in jumping into R. Here’s what I’d suggest: don’t think in terms of specific packages. Instead, think in terms of problems you have that you want to solve.
As some autobiographical background for this advice, I came to R through my PhD coursework at Northwestern University. I had learned SPSS during undergrad and then learned Stata at Northwestern. I had a brief - actually, let’s say rough - introduction to R during the last course of the political science methods sequence, and then I forgot about it for about a year. But then, I had a problem I needed to solve. It was 2011 and I wanted to use Amazon Mechanical Turk for a research project where I contacted survey respondents at multiple points in time. MTurk allowed this, but there wasn’t an easy way to do it through the web interface.
For me, then, R was something that made sense because it was powerful, it solved a problem I had that didn’t have an easy alternative, and it helped structure my broader scientific workflow.
My advice is therefore that the best way to dive into R is to identify a problem you already have (or think you’ll soon have) and figure out how R can help you achieve it. There are a handful of problems that scientists, including political scientists, regularly have that R can help them solve quite easily:
- Producing high-quality scientific graphics. Whether you use the base R graphics or ggplot2, R has unparalleled graphics capabilities. These are becoming even better as more and more tools are developed for interactive, web-native graphics through services like plot.ly, ggvis, and rcharts.
- Creating reproducible research. Reproducibility simplifies the task of linking scientific results into final publications (through packages like knitr and stargazer) and increases the credibility and transparency of your work, through tools for archiving and sharing research in public archives (like the dvn package provides for Dataverse or the rfigshare package provides for figshare) or tools that help to make analyses more reproducible like checkpoint, miniCRAN, or packrat.
- Obtaining data. The rOpenSci, rOpenGov, and rOpenHealth projects provide a huge number of packages for retrieving various kinds of web-based data. Our Open Data Task View describes a ton of these data sources and the packages available for working with them. For social scientists, I’d also highlight Anthony Damico’s “analyze survey data for free” website, which provides R-based tutorials for working with a huge number of publicly available survey data sources.
- Specific types of data problems. Maybe you have text data you want to analyze; try quanteda. Maybe your data structure is a complete mess; try dplyr. Maybe you have data so massive it’s hard to work with; try data.table. Maybe you have a data analysis task that’s unimaginably time consuming or computationally intensive; try some of the ideas on the High Performance Computing Task View. There are many other problems social scientists commonly face, so I can’t list them all here.
There are thousands of R packages on CRAN and hundreds more on Github. Recommending which you should try is pretty challenging, unless you have a specific problem you want to solve. For scientists, some of the above scenarios are pretty standard problems and R has great facilities for solving those problems. But, maybe you have problem that hasn’t been solved in R (like my need for a programmatic interface to MTurk). That’s a “problem” that’s often not really a problem because any time there’s a challenge that hasn’t been addressed yet, it means there’s a great opportunity to take a deep dive into learning R and using it to solve that problem.
Except where noted, this website is licensed under a Creative Commons Attribution 4.0 International License.