Chris Friedman

Point-Density Plots - My New Favorite Dot Plot

Fri, 20 Sep 2019 22:40:32 GMT

A few weeks ago the wonderful RWeekly mailing list introduced me to a new type of plot - the point density plot. Wonderfully, the ability to make this plot has been added to the R community in the form of a new package, ggpointdensity.

From the top line description, it’s a cross between a scatter plot and a 2D density plot. The motivation for creating the package and using this new plot is that the points in scatter plots can overlap one another while the alternative density plots lose the resolution given by plotting indiviudal points.

Before I get ahead of myself, the plots look like this:

If it isn’t obvious from the package name (or the image above), this is made to work with ggplot2 and simply adds a new geom, geom_pointdensity().

Each point is colored by how many points are around it.

To further illustrate it, let’s look at a data set I’ve been playing with at work. I’ve been examining school districts around philadelphia and getting information about each one from the U.S. Census American Community Survey. After contacting both the Google Maps API and the US Census API, I found 1,074 school districts within three hours of Philly.

Before using this new geom, I would have probably made this plot:

While there are a bunch of districts surounding it, there’s a central feature in the bottom left of the plot where many points are over plotted. To produce that plot I used geom_point(). What if instead I were to use geom_density_2d()?

As the package author mentions in their description, you lose the abilty to see outliers. You can certainly see that, as we lost a lot of resolution. Certainly then, we could try to plot both the points and density at the same time…

I showed my coworker that one and they said they liked it because it looked kind of like a papaya. I dont really see it but r emo::ji("man_shrugging")…

Anyway, let’s now visualize this with our brand new tool - geom_pointdensity()!

I like this because you get the high resolution of the dot plot but you can also see where the areas are with the highest density of points.

Also, I think these plots just look really really cool.

Happy Little Accidents

Mon, 05 Aug 2019 00:00:00 GMT

knitr::opts_chunk$set(message = FALSE, warning = FALSE)

This morning, I just found out about #tidytuesday and I figured it would be a fun thing to play with.

For my first foray into tidytuesday, we have data on Bob Ross’s paintings during his show. The data were compiled by fivethirtyeight and reported here.

The data are available here. On the info page for the data, they show how to load the data and give an example of some basic tidying. I’ll do that below:

library(dplyr)
library(tidyr)
library(stringr)

bob_ross <-
  readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-06/bob-ross.csv")

# to clean up the episode information
bob_ross <-
  bob_ross %>%
  janitor::clean_names() %>%
  separate(episode, into = c("season", "episode"), sep = "E") %>%
  mutate(season = str_extract(season, "[:digit:]+")) %>%
  mutate_at(vars(season, episode), as.integer)

head(bob_ross)

There are a couple of paintings that are named the same thing.

bob_ross <-
  bob_ross %>%
  group_by(title) %>%
  mutate(title_count = 1:group_size(.)) %>%
  ungroup() %>%
  mutate(title = if_else(title_count > 1,
                         paste(title, title_count),
                         title)) %>%
  select(-title_count)

There are some columns that are relate to the frame the painting got put into and some columns that relate to elements inside each painting. Like the fivethirtyeight crew, I’m more interested in the elements inside the paintings as opposed to the frames, so i’ll go ahead and drop those columns

painting_data <-
  bob_ross %>%
  select(-contains("frame"), -steve_ross, -guest, -diane_andre)

In my professional work, I perform social network analysis, so let’s go ahead and look at networks of elements in Bob Ross’s paintings!

Networks of Bob Ross Paintings

To get us looking at social networks, we first need to take the data from this wide format and turn it into an edge list. The edge list will connect each painting to every element that is inside it. From there, we can get a picture of what the network of paintings looks like!

Organizing the Edge List

library(igraph)
library(ggnetwork)

titles <- painting_data[["title"]]
incidence_mat <-
  painting_data %>%
  select(-season, -episode, -title)

incidence_mat <- as.matrix(incidence_mat)
rownames(incidence_mat) <- titles

incidence_graph <-
  graph_from_incidence_matrix(incidence_mat, )

ggplot(incidence_graph, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges() +
  # type is TRUE if a node is an episode and FALSE if it's an element
  geom_nodes(aes(color = type)) +
  theme_blank()

That’s real busy! It looks like there are a few episodes that only have a few elements and some episodes that have many elements in it. There’s also that one episode that shares three elements with another episode and no others.

Let’s see if we can clean this up a bit! First, i’ll connect episodes by how many elements they share.

episode_x_episode <- incidence_mat %*% t(incidence_mat)

ep_x_ep_graph <-
  graph_from_adjacency_matrix(episode_x_episode,
                              # only look at the upper part of the matrix since it is symetrical
                              mode = "upper",
                              # the connections are weighted
                              weighted = TRUE,
                              # don't count self-loops
                              diag = FALSE)

ggnetwork(ep_x_ep_graph, weight = "weight") %>%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges() +
  geom_nodes() +
  theme_blank()

Not much to look at. Or better yet, Ross’s paintings tend to share something in common with other paintings.

Instead of looking at all the features at once, why don’t we look at groups of features. I’ve gone ahead and grouped each feature into different categories. I’ll load that up and then split up the painting df into different categories.

feature_categories <-
  readr::read_csv("../../data/Bob Ross/ross_painting_features.csv")

painting_categories <-
  painting_data %>%
  gather(feature, value, -season, -episode, -title) %>%
  left_join(feature_categories, by = "feature") %>%
  filter(value > 0) %>%
  count(season, episode, title, category) %>%
  spread(category, n) %>%
  mutate_at(vars(-season, -episode, -title), ~if_else(is.na(.), 0, 1))

titles <- painting_categories[["title"]]


incidence_mat <-
  painting_categories %>%
  select(-season, -episode, -title) %>%
  as.matrix()
rownames(incidence_mat) <- titles

episode_x_episode <- incidence_mat %*% t(incidence_mat)


ep_x_ep_graph <-
  graph_from_adjacency_matrix(
    episode_x_episode,
    # only look at the upper part of the matrix since it is symetrical
    mode = "upper",
    # the connections are weighted
    weighted = TRUE,
    # don't count self-loops
    diag = FALSE)

# add in season as an attribute of each episode
vertex_attr(ep_x_ep_graph, "season") <- painting_categories[["season"]]

Now to plot! I’m going to build these plots with a cutpoint though, because otherwise they become very unweildy.

ep_x_ep_graph %>%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight <= 2)) %>%
  delete_vertices(., which(igraph::degree(.) == 0)) %>%  
  ggnetwork(weight = "weight") %>%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = "gray") +
  geom_nodes(aes(color = season)) +
  theme_blank() +
  Friedman::scale_color_drexel(discrete = FALSE) +
  labs(title = "Paintings that share 3, 4, or 5 feature categories")

Above, I’ve colored nodes by season and only shown connections between episodes if those episodes share more than two classes of feature (e.g. two episodes have a sky feature, tree and plant feature, and a man-made feature).

What happens if we filter to only show edges if two espodes share 4 features? 5?

ep_x_ep_graph %>%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight <= 3)) %>%
  delete_vertices(., which(igraph::degree(.) == 0)) %>%  
  ggnetwork(weight = "weight") %>%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = "gray") +
  geom_nodes(aes(color = season)) +
  theme_blank() +
  Friedman::scale_color_drexel(discrete = FALSE)  +
  labs(title = "Paintings that share 4 or 5 feature categories")

Looking at the paintings that share more than 3 features really brings about that there is a central group of paintings that all share a lot in common and then a few different groups of paintings that all have different things in common.

ep_x_ep_graph %>%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight <= 4)) %>%
  delete_vertices(., which(igraph::degree(.) == 0)) %>%  
  ggnetwork(weight = "weight") %>%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = "gray") +
  geom_nodes(aes(color = season)) +
  theme_blank() +
  Friedman::scale_color_drexel(discrete = FALSE) +
  labs(title = "Paintings that share 5 feature categories")

Now looking just at paintings that share 5 feature categories, it can really be seen that there is a central set of themes that is very common accross seasons. What are they?

episodes_of_interest <-
  ep_x_ep_graph %>%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight <= 4)) %>%
  delete_vertices(., which(igraph::degree(.) == 0)) %>%
  vertex_attr("name")

painting_categories %>%
  filter(title %in% episodes_of_interest) %>%
  mutate(feature_sum = aquatic + clouds + `general nature` +
           `man made` + nature + sky + `trees and plants`) %>%
  filter(feature_sum > 4) %>%
  summarize_at(vars(-season, -episode, -title, -feature_sum), sum)

All (or near all) of these paintings have an aquatic element, a general nature element, and trees and plants. So the real defining factor between the three groups in the plot above probably has to do with the other four categories. Next time I play with these data, I’ll look at that and do a deeper dive on components of the network of Bob Ross paintings.

How often does the Senate Vote In Palindromes?

Wed, 31 Jan 2018 00:00:00 GMT

Every Friday, five thirty eight comes out with two logic, math, or probability based puzzles - one quick to solve and one that takes a long time to solve. Although I don’t always get a chance to partake, thinking about them is always fun.

This week’s quick puzzle related to a problem I’m trying to solve in the office, so I thought I would give this one a try. Taken from their site, the puzzle goes like this:

On Monday, the Senate voted 81-18 to end the government shutdown. This naturally grabbed the Riddler’s attention: It’s a palindrome! The vote tally reads the same forward and backward. This specific tally was made possible by the absence of John McCain. But do senators need to be absent to create palindrome tallies? If so, what numbers of absences will do the trick?

Extra credit: How many palindromic Senate votes have occurred in the past three decades?

In addition to potentially helping solve a problem in the office, I figure this is a good excuse to work on experimenting with object oriented programming and writing functions to work with S3 methods. As you will see, the function I wrote is more for fun than anything else, but I think it does the trick in showing how s3 methods work. In addition, the first solution to the puzzle I propose is verbose. At the end I show how I solved the problem in an even quicker

Let’s begin!

The first thing I did when approaching this puzzle was to re-frame my question into something I can write code for.

What are the possible combinations of votes that can result in a palindrome?

When thinking about palindromes, the first thing I need to do is think about what the reversible part of the palindrome is. With senate votes, the first part of the palindrome can be considered the number of “yea” votes while the second part of it can be considered the number of “nay” votes.

Considering the number of seats in the senate, the number of yea votes can be any number between 0 and 100.

Considering the problem of palindromes will narrow down 0:100 in two ways.

First, a unanimous vote in the senate is impossible! Kidding aside, I also know that a unanimous vote can’t result in a palindrome, so the case where there are 100 yea votes can be discarded. In addition, we won’t look at cases were no senators vote!

Second, Looking at votes between 1 and 99 is doubling the amount of work. Votes in the yea column greater than or equal to 50 are palindromes of the number of votes below 50. So it’s only necessary to look at yea votes where the number of votes is between 1 and 49.

For illustrative purposes, let’s put that vector in a data frame

library(dplyr)

votes <- data_frame(yea = 1:49)

Now that we have a vector of all of the numbers between 0 and 49, we’ll build a vector of their palindromes and put them in a new column for all of the “nay"" votes.

library(stringr)
library(purrr)

votes <- votes %>%
  mutate(nay = formatC(yea, width = 2, format = "d", flag = "0") %>%
           str_split("") %>%
           map(rev) %>%
           map_chr(paste, collapse = "") %>%
           as.numeric())

In the above code chunk, I introduced a function it seems a lot of people don’t get to play with very often, formatC, an interface to the c function printf. Basically, it allows users to format a text input. Here, it’s used to add leading zeros.

Above, there’s also a call to str_split from the stringr package. It’s used here to split the two digits so that they can then be reversed.

str_split outputs a list of character vectors where each element is the two characters. To operate on this list, map from the purrr package is used, to operate on each element in the list. In this instance map applies the rev function to reverse each item in each element in the list so that "8" "1" becomes "1" "8".

Similarly, map_chr pastes the items in each element together (changing "1" "8" to "18") and uses the _chr modifier to return a character vector that is then converted into a numeric one.

After computing the nay votes, we need to narrow down our list of yea’s and nay’s to votes that are possible in the 100 seat senate.

votes <- votes %>%
  mutate(vote_count = yea + nay) %>%
  # Filter our impossible vote counts
  filter_at(vars(vote_count), all_vars(. <= 100))

If you’ve been playing along at home, you will notice that 10 of the vote counts were dropped.

Now we have all of the possible vote tallies that are palindromic!

Do senators need to be absent to create palindrome tallies?

This question is answered by seeing if any of the items in votes$vote_count equal 100. If they do, then the answer to the question is no.

any(votes$vote_count == 100)

YES! Senators DO need to be absent for a palindromic vote count!

What are the number of absences that will do the trick?

100 - votes$vote_count %>%
  unique()

Above, I show the absences needed for a vote to be palindromic. That said, if 50 senators are absent, a quorum is not present, so business isn’t being conducted.

In actuality, the number of absences can be 49, 34, 23, 12, or 1.

Importing Social Network Data into R

Tue, 19 Dec 2017 00:00:00 GMT

library(knitr)
library(printr)

In my professional life, I manage and analyze data on a team that studies the social networks surrounding children with autism. The purpose of this post is not to discuss that work in depth, but rather to show how to quickly and easily import one type of data I work with into R. For social network analysis, I use the package igraph.

The type of data I’m going to talk about importing today is egocentric network data. For those that don’t know, egocentric network data involves asking a single person about the makeup of an entire network.

Collecting Egocentric Network Data

To demonstrate, imagine that I’m interested in social networks at the gym and that you go to the gym with a three friends every week. I approach you and after asking you to join my study, ask you who you go the gym with.

You say “James, Jen, and Rene.”

Then I ask some questions about the group in relation to James:

How many times a week do you see James?

How many times a week does James see Jen?

How many times a week does James see Rene?

We then do the same for Jen and Rene where I ask you often you see Jen and how often she sees Rene, and then I ask how often you see Rene.¹

My survey here came in two parts:

A name generatorr where I ask you who you interact with
A name modifier where I ask you how you interact with each person

After the data are collected, I log it all in my spreadsheet where each row corresponds to a data collection instance.

So, I have data that look like this:

library(dplyr)

gym_networks <- data_frame(
  participant = c("You", "Bart", "Lisa"),
  p1 = c("James", "Milhouse", "Sherry"),
  p2 = c("Jen", "Nelson", "Terry"),
  p3 = c("Rene", "Martin", "Ralph"),
  participant_x_p1 = sample(1:10, 3),
  p1_x_p2 = sample(1:10, 3),
  p1_x_p3 = sample(1:10, 3),  
  participant_x_p2 = sample(1:10, 3),
  p2_x_p3 = sample(1:10, 3),
  participant_x_p3 = sample(1:10, 3)
)
print(gym_networks)

Getting the data ready for analysis

The data here are represented in an adjacency list with weights attached. Although igraph has a function for importing adjacency lists, it isn’t not configured to handle weights, so we will take our adjacency list and convert it into an edge list, which igraph can handle with weights.

To accomplish this, we’ll use the package tidyr.

library(tidyr)

el <- gym_networks %>%
  # Step 1: Make each row a single edge
  gather(key, value = "weight", -(participant:p3)) %>%
  # Step 2: Configure two new columns, an ego, and an alter
  mutate(ego = case_when(grepl("participant", key) ~ participant,
                         grepl("p1_", key) ~ p1,
                         grepl("p2_", key) ~ p2,
                         grepl("p3_", key) ~ p3),
         alter = case_when(grepl("_p1", key) ~ p1,
                           grepl("_p2", key) ~ p2,
                           grepl("_p3", key) ~ p3)) %>%
  # Step 3: Clean up the data frame
  select(ego, alter, weight)

print(head(el))

In three steps, we go from adjacency list to edge list. In one more step, we have an igraph object to analyse and plot!

library(igraph)

graph <- graph_from_data_frame(el)

Happy analyzing, friends!

I don’t ask how often Jen sees James and how often Rene sees James or Jen because we assume that it takes two to tango in this respect.
↩

Getting RateBeer Data…Programmatically

Thu, 07 Dec 2017 00:00:00 GMT

library(httr)
library(jsonlite)
library(purrr)
library(rvest)
knitr::opts_chunk$set(eval = FALSE)
API_key <- Sys.getenv("rateBeer_API_key")

In my last post I showed how using the package httr, You can access the RateBeer API to get information about beers made by a brewery. When I left off, I showed a problem - the API only shows 10 beers at a time.

Today, I’m going to show how we can get more beers at once. After that, I’m going to show how we can use the API, rvest, and purrr to get beers from all the brewers around me.

Updating the call to the API

Setting the `first:` argument

In the last post, I didn’t mention one of the arguments that can be used when making beersByBrewer query. Besides the argument for the brewerID, we can also use the first argument to specify how many beers we want to see from a brewer. As you will see below, one of the changes I make to the call to the API includes seting the value for the argument first to 999.

Ninety-nine beers on the wall? Let’s make it Nine hundred and ninety-nine.

Turning the call to the API into a function.

As you will see later on, it will be handy to have the call to the API as a function. Below, I declare that function:

get_beers_from_brewer <- function(brewer_id, api_key) {
  URL <- "https://api.ratebeer.com/v1/api/graphql"
  POST(URL,
       body = list(
         query = paste0(
"query{
  beersByBrewer(brewerId: ", brewer_id, ", first: 999) {
    totalCount
      items{
        name
        abv
        averageRating
        ratingCount
        isRetired
        style{
          name
        }
        brewer {
          id
          name
          streetAddress
          city
          state {
            name
          }
          zip
        }
      }
     }
}"),
       variables = "{}",
       operationName = NULL),
       encode = "json", # tells httr to encode the body of the request as json
       add_headers("content-type" = "application/json",
                   "Accept" = "application/json",
                   "x-api-key" = api_key))
}

Finding Breweries

Okay, so now, I have an easy way to get information about the beers that breweries make. All I need to do now is point the API to the brewer ID of each brewery I want information on.

Remember, the brewer id can be found by looking at the url for that brewery and is in the form:

https://www.ratebeer.com/brewers/<BREWERY_NAME_HERE>/<BREWER_ID_HERE>

The thing is, I like data and want A LOT of it. There’s no way I’m going to hand sort through urls to try to find breweries. Can’t this be automated?

YES!

RateBeer maintains lists of breweries by state. For example, the breweries in Pennsylvania can be found here. Using the package rvest, we can pull information about all of the breweries in the state, and then use purrr to iterate over that list, using the function, get_beers_from_brewer().

rvest is a package to make harvesting information from the web easy. Below, you see how, in three steps, I have a list of urls that point to all of the breweries in the state.

library(rvest)

brewery_list_url <- "https://www.ratebeer.com/breweries/pennsylvania/38/213/"

brewery_ids <-  read_html(brewery_list_url) %>%
  html_nodes("#brewerTable a:nth-child(1)") %>%
  html_attr('href')

To explain what happened in the previous code chunk:

read_html() loads the url for the list of breweris in PA. This is the same thing that happens if you click this link.
html_nodes() searches for every place in the html file we navigated to that has a link to a brewery. The text in the argument for the function points to the css selector for where breweries can be found on the page. I found this selector using selectorgadget. This function gives me a list of the each time that selector shows up.
html_attr() searches that list for an attribute of the specified type. In this case I specified a hyperlink.

yards_url <- grep("yards", brewery_ids, value = TRUE)

As I said, this outputs a list of urls. As an example, the url for Yards Brewing looks like this:

/brewers/yards-brewing-company/166/

Now, I can take this list of urls, and use the function purrr::map_chr() to get a list of brewer IDs. As a note, I use map_chr because it flattens the list of IDs into a single character vector. I highly suggest that you check out the rest of the map_* functions.

Below, I take each item in the list of urls, split each url at any "/" and then use map_chr() to select the fourth element, the brewer ID.

brewery_ids <- brewery_ids %>%
  strsplit("/") %>%
  map_chr(4)

Now I have a list of brewery IDs that I can feed into get_beers_from_brewer()

Getting ALL THE BEERS!

To iterate through the list, we’ll use map() and some functions from jsonlite, a package that can parse JSON. Then we’ll use map() to work through the levels of the response, from it’s highest level (“data”) down to the actual data frame (held in the named object, “items”).

brewery_beer_df <- brewery_ids %>%
  map(function(brewer_id){
    Sys.sleep(1) # the API restricts to 1 request per second.
    get_beers_from_brewer(brewer_id, API_key)
  }) %>%
  # Here we use jsonlite functions to turn the response of the request into
  # json
  map(content, type = "text") %>%
  # and then turn that into an r object which has dfs in it from each
  # brewery.
  map(fromJSON, flatten = TRUE) %>%
  # working down through the response levels.
  map("data") %>% map("beersByBrewer") %>% map_dfr("items")

The object brewery_beer_df is the data frame with all the beers from breweries we requested.

Accessing the RateBeer API with R Using httr

Tue, 05 Dec 2017 00:00:00 GMT

library(knitr)
library(kableExtra)

A few months ago, I was talking with a friend of mine about the idea for this blog and how I wanted to use data science to explore beer. He suggested that I use the blog as well as beer to learn something new about where I live. So I ask, what can beer teach me about Philadelphia?

The first thing I need? Data!

Oddly enough, it’s actually pretty challenging to get access to high quality, current beer data.

I chose to use RateBeer’s data, mostly because they have an easily accessible API, and meet my needs better than anyone else. They also disclose how they come to their average beer rating, allowing me to see what’s under the hood. In the footnotes, I briefly explain some alternatives¹

Collecting Data

I want to look at breweries in the area. Sadly, the RateBeer API doesn’t have a feature to search for breweries in the area. There is however, a way to query what beers a brewery makes. To get a list of beers a brewery makes though, I need to know what that brewery’s unique ID is. Easy enough to find. the URL for a brewery on RateBeer is of the form:

https://www.ratebeer.com/brewers/<BREWERY_NAME_HERE>/<BREWER_ID_HERE>

So, as an example, 166 is the ID for Yards Brewering Company. The url is:

https://www.ratebeer.com/brewers/yards-brewing-company/166/

RateBeer’s API uses the language of GraphQL. It’s beyond the scope of this post to dive into GraphQL, so instead, I’ll explain how it’s implemented in regards to the queries that I make.

GraphQL requests are written in JSON format. Basically, I specify that I make a query. Nested in that call is the type of query I want to make (as well as arguments to that query), which then has the responses I wanted nested within the call.

So, a query for the name of the beer with the ID number 4934 looks like this:

query {
  beer(id: 4934) {
    name
  }
}

Similarly, my query for the beers made by Yards will look like this:

query{
  beersByBrewer(brewerId: 166) { # 166 is Yards
    totalCount # This gives the total number of beers in the beer list
    items{ # For each beer, I want these items...
      name # the name of the beer
      abv # the beer's ABV
      averageRating # the average rating of the beer
      ratingCount # the number of ratings the beer has
      isRetired # is the beer retired?
      style{
        # Style needs to be jumped into one level because when I query style, I
        # can also ask for a description of the style, and can even jump into
        # recommended glassware. That said, all I want is the style name.
        name
      }
    }
  }
}

Now that I know what format to make my request in, it’s time to actually get my first bit of data off the API!

Hello, httr

httr is a package developed by Hadley Wickham of RStudio to make it easy to make HTTP requests. the package and the nuances of HTTP wont be gone into here, but some good resources for httr include the quickstart vignette, and Bradley Boehmke’s post on using httr. The httr syntax is quite simple. The main functions are curl verbs (httr is a wrapper for curl functions), and the function arguments all start with the URL and are then followed by things to modify the URL and to send with the URL.

RateBeer’s API is at https://api.ratebeer.com/v1/api/graphql. In the header of the request, content type and response type are required along with an API key. The query itself modifies the URL to call. So, my call to the RateBeer API to get the beers made by yards looks like this:

library(httr)

API_key <- Sys.getenv("rateBeer_API_key")

URL <- "https://api.ratebeer.com/v1/api/graphql"

beers_by_yards <- POST(URL,
         body = list(
           query =
"query{
  beersByBrewer(brewerId: 166) {
    totalCount
    items{
      name
      abv
      averageRating
      ratingCount
      isRetired
      style{
        name
      }
    }
  }
}",
           variables = "{}",
           operationName = NULL),
         encode = "json", # tells httr to encode the body of the request as json
         add_headers("content-type" = "application/json",
                     "Accept" = "application/json",
                     "x-api-key" = API_key))

Parsing the Response

content() is httr’s function for extracting content from a request. Using the type = argument, we can have the function give us the data from the request as valid JSON. then we can use the jsonlite package to make the data easier to work with.

library(jsonlite)

json <- content(beers_by_yards, type = "text")

parsed_json <- fromJSON(json, flatten = TRUE)

Recall that the response to the request was supposed to be JSON? This means that all of our items are nested in the same way as we requested them. So, to get the number of beers that Yards makes, as well as a data frame with those beers, we work through those levels.

beer_count <- parsed_json$data$beersByBrewer$totalCount

beer_df <- parsed_json$data$beersByBrewer$items

beer_count
kable(beer_df, "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

So, where’s all 111 beers? The API gives us 10 beers at a time. When We make the request to the API, we can tell it where to start that list of 10 beers, alongside our request to look at beers from a certain brewery.

In my next post, I’ll show how we can get all 111 of those beers and beers from other breweries, programmatically.

Beer Advocate expressly forbids scraping and does not have an official API.
Untappd has an API but they don’t give out API keys to people that are just interested in data. If I build an app, maybe my decision will change, but in the meantime, no using their API. It looks like they may not expressly forbidscraping or crawling on the site, but scraping has its own challenges. I may cover it in the future, but in the meantime, I want to just use an API.
BeerDB looks like an awesome idea - beer data for developers! Yet, the API doesn’t show ratings and you can only get ABV if you are a premium user.I can get all the information that I am looking for from other APIs, so no need to use and pay for this one.
Open Beer Database hasn’t updated the database since 2011. That’s a solid no.
The Beer Spot looks like it could be a fun community, but considering that A), they may not have very many users and B) no one has reviewd Yuengling (beer geek or not, a Philadelphia area staple) I’m not going to use their API.

A mission statement

Wed, 29 Nov 2017 00:00:00 GMT

On the left hand side of this page (or when you click the “About” button), I start with one sentence that gets at why I do many things I do. I love going on adventures and I love learning from data.

I think that to adventure means to do much more than to go an exotic locale. To me, an adventure is to find myself somewhere I’ve never been. Sometimes, adventure takes me to the top of yosemite falls, to my local climbing gym, or even around the block.

I’m certainly on an adventure, putting these words to screen.

Which brings me to why I am here, rebuilding my website for the up-teenth time.

I’m here to learn something new about the topics that interest me.

I’m here to get data, analyze it, and communicate it so that others can learn alongside me.

My mission statement?

I’m here to adventure.