<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Chris Friedman]]></title><description><![CDATA[Pellentesque odio nisi, euismod in, pharetra a, ultricies in, diam. Sed arcu.]]></description><link>https://chris-s-friedman.com/</link><generator>GatsbyJS</generator><lastBuildDate>Sun, 24 May 2020 15:15:14 GMT</lastBuildDate><item><title><![CDATA[Point-Density Plots - My New Favorite Dot Plot]]></title><description><![CDATA[Experimenting with a new way to plot points to get a view into overplotting.]]></description><link>https://chris-s-friedman.com//posts/point-density-plots-my-new-favorite-dot-plot</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/point-density-plots-my-new-favorite-dot-plot</guid><pubDate>Fri, 20 Sep 2019 22:40:32 GMT</pubDate><content:encoded>&lt;p&gt;A few weeks ago the wonderful &lt;a href=&quot;https://rweekly.org/2019-36.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;RWeekly&lt;/a&gt; mailing list introduced me to a new type of plot - the point density plot. Wonderfully, the ability to make this plot has been added to the R community in the form of a new package, &lt;a href=&quot;https://github.com/LKremer/ggpointdensity&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;ggpointdensity&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;From the top line description, it’s a cross between a scatter plot and a 2D density plot. The motivation for creating the package and using this new plot is that the points in scatter plots can overlap one another while the alternative density plots lose the resolution given by plotting indiviudal points.&lt;/p&gt;
&lt;p&gt;Before I get ahead of myself, the plots look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://res.cloudinary.com/chrissfriedman/image/upload/v1569029031/point-density-plots/pointdensity.png&quot; alt=&quot;Plot from the package documentation on Github&quot;&gt;&lt;/p&gt;
&lt;p&gt;If it isn’t obvious from the package name (or the image above), this is made to work with ggplot2 and simply adds a new geom, &lt;code class=&quot;language-text&quot;&gt;geom_pointdensity()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Each point is colored by how many points are around it.&lt;/p&gt;
&lt;p&gt;To further illustrate it, let’s look at a data set I’ve been playing with at work. I’ve been examining school districts around philadelphia and getting information about each one from the U.S. Census American Community Survey. After contacting both the Google Maps API and the US Census API, I found 1,074 school districts within three hours of Philly.&lt;/p&gt;
&lt;p&gt;Before using this new geom, I would have probably made this plot:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://res.cloudinary.com/chrissfriedman/image/upload/v1569028882/point-density-plots/household_income_vs_percent_poverty-point.png&quot; alt=&quot;geom_point()&quot;&gt;&lt;/p&gt;
&lt;p&gt;While there are a bunch of districts surounding it, there’s a central feature in the bottom left of the plot where many points are over plotted. To produce that plot I used &lt;code class=&quot;language-text&quot;&gt;geom_point()&lt;/code&gt;. What if instead I were to use &lt;code class=&quot;language-text&quot;&gt;geom_density_2d()&lt;/code&gt;?&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://res.cloudinary.com/chrissfriedman/image/upload/v1569028882/point-density-plots/household_income_vs_percent_poverty-density.png&quot; alt=&quot;geom_density_2d() + stat_density_2d()&quot;&gt;&lt;/p&gt;
&lt;p&gt;As the package author mentions in their description, you lose the abilty to see outliers. You can certainly see that, as we lost a lot of resolution. Certainly then, we could try to plot both the points and density at the same time…&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://res.cloudinary.com/chrissfriedman/image/upload/v1569028882/point-density-plots/household_income_vs_percent_poverty-point_and_density.png&quot; alt=&quot;geom_point() + geom_density_2d() + stat_density_2d()&quot;&gt;&lt;/p&gt;
&lt;p&gt;I showed my coworker that one and they said they liked it because it looked kind of like a &lt;a href=&quot;https://en.wikipedia.org/wiki/Papaya&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;papaya&lt;/a&gt;. I dont really see it but &lt;code class=&quot;language-text&quot;&gt;r emo::ji(&amp;quot;man_shrugging&amp;quot;)&lt;/code&gt;…&lt;/p&gt;
&lt;p&gt;Anyway, let’s now visualize this with our brand new tool - &lt;code class=&quot;language-text&quot;&gt;geom_pointdensity()&lt;/code&gt;!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://res.cloudinary.com/chrissfriedman/image/upload/v1569028882/point-density-plots/household_income_vs_percent_poverty-pointdensity.png&quot; alt=&quot;geom_pointdensity()&quot;&gt;&lt;/p&gt;
&lt;p&gt;I like this because you get the high resolution of the dot plot but you can also see where the areas are with the highest density of points.&lt;/p&gt;
&lt;p&gt;Also, I think these plots just look really really cool.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Happy Little Accidents]]></title><description><![CDATA[Exploring Data on Bob Ross paintings for Tidy Tuesday.]]></description><link>https://chris-s-friedman.com//posts/happy-little-accidents</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/happy-little-accidents</guid><pubDate>Mon, 05 Aug 2019 00:00:00 GMT</pubDate><content:encoded>&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;knitr::opts_chunk$set(message = FALSE, warning = FALSE)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This morning, I just found out about &lt;a href=&quot;https://github.com/rfordatascience/tidytuesday&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;#tidytuesday&lt;/a&gt; and I figured it would be a fun thing to play with.&lt;/p&gt;
&lt;p&gt;For my first foray into tidytuesday, we have data on Bob Ross’s paintings during his show. The data were compiled by fivethirtyeight and reported &lt;a href=&quot;https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The data are available &lt;a href=&quot;https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-08-06&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;here&lt;/a&gt;. On the info page for the data, they show how to load the data and give an example of some basic tidying. I’ll do that below:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(dplyr)
library(tidyr)
library(stringr)

bob_ross &amp;lt;-
  readr::read_csv(&amp;quot;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-06/bob-ross.csv&amp;quot;)

# to clean up the episode information
bob_ross &amp;lt;-
  bob_ross %&amp;gt;%
  janitor::clean_names() %&amp;gt;%
  separate(episode, into = c(&amp;quot;season&amp;quot;, &amp;quot;episode&amp;quot;), sep = &amp;quot;E&amp;quot;) %&amp;gt;%
  mutate(season = str_extract(season, &amp;quot;[:digit:]+&amp;quot;)) %&amp;gt;%
  mutate_at(vars(season, episode), as.integer)

head(bob_ross)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are a couple of paintings that are named the same thing.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;bob_ross &amp;lt;-
  bob_ross %&amp;gt;%
  group_by(title) %&amp;gt;%
  mutate(title_count = 1:group_size(.)) %&amp;gt;%
  ungroup() %&amp;gt;%
  mutate(title = if_else(title_count &amp;gt; 1,
                         paste(title, title_count),
                         title)) %&amp;gt;%
  select(-title_count)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are some columns that are relate to the frame the painting got put into and some columns that relate to elements inside each painting. Like the fivethirtyeight crew, I’m more interested in the elements inside the paintings as opposed to the frames, so i’ll go ahead and drop those columns&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;painting_data &amp;lt;-
  bob_ross %&amp;gt;%
  select(-contains(&amp;quot;frame&amp;quot;), -steve_ross, -guest, -diane_andre)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In my professional work, I perform social network analysis, so let’s go ahead and look at networks of elements in Bob Ross’s paintings!&lt;/p&gt;
&lt;h1 id=&quot;networks-of-bob-ross-paintings&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#networks-of-bob-ross-paintings&quot; aria-label=&quot;networks of bob ross paintings permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Networks of Bob Ross Paintings&lt;/h1&gt;
&lt;p&gt;To get us looking at social networks, we first need to take the data from this wide format and turn it into an edge list. The edge list will connect each painting to every element that is inside it. From there, we can get a picture of what the network of paintings looks like!&lt;/p&gt;
&lt;h2 id=&quot;organizing-the-edge-list&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#organizing-the-edge-list&quot; aria-label=&quot;organizing the edge list permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Organizing the Edge List&lt;/h2&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(igraph)
library(ggnetwork)

titles &amp;lt;- painting_data[[&amp;quot;title&amp;quot;]]
incidence_mat &amp;lt;-
  painting_data %&amp;gt;%
  select(-season, -episode, -title)

incidence_mat &amp;lt;- as.matrix(incidence_mat)
rownames(incidence_mat) &amp;lt;- titles

incidence_graph &amp;lt;-
  graph_from_incidence_matrix(incidence_mat, )

ggplot(incidence_graph, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges() +
  # type is TRUE if a node is an episode and FALSE if it&amp;#39;s an element
  geom_nodes(aes(color = type)) +
  theme_blank()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That’s real busy! It looks like there are a few episodes that only have a few elements and some episodes that have many elements in it. There’s also that one episode that shares three elements with another episode and no others.&lt;/p&gt;
&lt;p&gt;Let’s see if we can clean this up a bit! First, i’ll connect episodes by how many elements they share.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;episode_x_episode &amp;lt;- incidence_mat %*% t(incidence_mat)

ep_x_ep_graph &amp;lt;-
  graph_from_adjacency_matrix(episode_x_episode,
                              # only look at the upper part of the matrix since it is symetrical
                              mode = &amp;quot;upper&amp;quot;,
                              # the connections are weighted
                              weighted = TRUE,
                              # don&amp;#39;t count self-loops
                              diag = FALSE)

ggnetwork(ep_x_ep_graph, weight = &amp;quot;weight&amp;quot;) %&amp;gt;%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges() +
  geom_nodes() +
  theme_blank()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Not much to look at. Or better yet, Ross’s paintings tend to share something in common with other paintings.&lt;/p&gt;
&lt;p&gt;Instead of looking at all the features at once, why don’t we look at groups of features. I’ve gone ahead and grouped each feature into different categories. I’ll load that up and then split up the painting df into different categories.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;feature_categories &amp;lt;-
  readr::read_csv(&amp;quot;../../data/Bob Ross/ross_painting_features.csv&amp;quot;)

painting_categories &amp;lt;-
  painting_data %&amp;gt;%
  gather(feature, value, -season, -episode, -title) %&amp;gt;%
  left_join(feature_categories, by = &amp;quot;feature&amp;quot;) %&amp;gt;%
  filter(value &amp;gt; 0) %&amp;gt;%
  count(season, episode, title, category) %&amp;gt;%
  spread(category, n) %&amp;gt;%
  mutate_at(vars(-season, -episode, -title), ~if_else(is.na(.), 0, 1))

titles &amp;lt;- painting_categories[[&amp;quot;title&amp;quot;]]


incidence_mat &amp;lt;-
  painting_categories %&amp;gt;%
  select(-season, -episode, -title) %&amp;gt;%
  as.matrix()
rownames(incidence_mat) &amp;lt;- titles

episode_x_episode &amp;lt;- incidence_mat %*% t(incidence_mat)


ep_x_ep_graph &amp;lt;-
  graph_from_adjacency_matrix(
    episode_x_episode,
    # only look at the upper part of the matrix since it is symetrical
    mode = &amp;quot;upper&amp;quot;,
    # the connections are weighted
    weighted = TRUE,
    # don&amp;#39;t count self-loops
    diag = FALSE)

# add in season as an attribute of each episode
vertex_attr(ep_x_ep_graph, &amp;quot;season&amp;quot;) &amp;lt;- painting_categories[[&amp;quot;season&amp;quot;]]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now to plot! I’m going to build these plots with a cutpoint though, because otherwise they become very unweildy.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;ep_x_ep_graph %&amp;gt;%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &amp;lt;= 2)) %&amp;gt;%
  delete_vertices(., which(igraph::degree(.) == 0)) %&amp;gt;%  
  ggnetwork(weight = &amp;quot;weight&amp;quot;) %&amp;gt;%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = &amp;quot;gray&amp;quot;) +
  geom_nodes(aes(color = season)) +
  theme_blank() +
  Friedman::scale_color_drexel(discrete = FALSE) +
  labs(title = &amp;quot;Paintings that share 3, 4, or 5 feature categories&amp;quot;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Above, I’ve colored nodes by season and only shown connections between episodes if those episodes share more than two classes of feature (e.g. two episodes have a sky feature, tree and plant feature, and a man-made feature).&lt;/p&gt;
&lt;p&gt;What happens if we filter to only show edges if two espodes share 4 features? 5?&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;ep_x_ep_graph %&amp;gt;%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &amp;lt;= 3)) %&amp;gt;%
  delete_vertices(., which(igraph::degree(.) == 0)) %&amp;gt;%  
  ggnetwork(weight = &amp;quot;weight&amp;quot;) %&amp;gt;%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = &amp;quot;gray&amp;quot;) +
  geom_nodes(aes(color = season)) +
  theme_blank() +
  Friedman::scale_color_drexel(discrete = FALSE)  +
  labs(title = &amp;quot;Paintings that share 4 or 5 feature categories&amp;quot;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Looking at the paintings that share more than 3 features really brings about that there is a central group of paintings that all share a lot in common and then a few different groups of paintings that all have different things in common.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;ep_x_ep_graph %&amp;gt;%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &amp;lt;= 4)) %&amp;gt;%
  delete_vertices(., which(igraph::degree(.) == 0)) %&amp;gt;%  
  ggnetwork(weight = &amp;quot;weight&amp;quot;) %&amp;gt;%
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_edges(color = &amp;quot;gray&amp;quot;) +
  geom_nodes(aes(color = season)) +
  theme_blank() +
  Friedman::scale_color_drexel(discrete = FALSE) +
  labs(title = &amp;quot;Paintings that share 5 feature categories&amp;quot;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now looking just at paintings that share 5 feature categories, it can really be seen that there is a central set of themes that is very common accross seasons. What are they?&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;episodes_of_interest &amp;lt;-
  ep_x_ep_graph %&amp;gt;%
  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &amp;lt;= 4)) %&amp;gt;%
  delete_vertices(., which(igraph::degree(.) == 0)) %&amp;gt;%
  vertex_attr(&amp;quot;name&amp;quot;)

painting_categories %&amp;gt;%
  filter(title %in% episodes_of_interest) %&amp;gt;%
  mutate(feature_sum = aquatic + clouds + `general nature` +
           `man made` + nature + sky + `trees and plants`) %&amp;gt;%
  filter(feature_sum &amp;gt; 4) %&amp;gt;%
  summarize_at(vars(-season, -episode, -title, -feature_sum), sum)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All (or near all) of these paintings have an aquatic element, a general nature element, and trees and plants. So the real defining factor between the three groups in the plot above probably has to do with the other four categories. Next time I play with these data, I’ll look at that and do a deeper dive on components of the network of Bob Ross paintings.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How often does the Senate Vote In Palindromes?]]></title><description><![CDATA[Every Friday, five thirty eight comes out with two logic, math, or probability based puzzles - one quick to solve and one that takes a long time to solve. Although I don't always get a chance to partake, thinking about them is always fun.]]></description><link>https://chris-s-friedman.com//posts/how-often-does-the-senate-vote-in-palindromes</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/how-often-does-the-senate-vote-in-palindromes</guid><pubDate>Wed, 31 Jan 2018 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Every Friday, &lt;a href=&quot;https://fivethirtyeight.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;five thirty eight&lt;/a&gt; comes out with two logic, math, or probability based puzzles - one quick to solve and one that takes a long time to solve. Although I don’t always get a chance to partake, thinking about them is always fun.&lt;/p&gt;
&lt;p&gt;This week’s quick puzzle related to a problem I’m trying to solve in the office, so I thought I would give this one a try. Taken from &lt;a href=&quot;https://fivethirtyeight.com/features/how-often-does-the-senate-vote-in-palindromes/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;their site&lt;/a&gt;, the puzzle goes like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;On Monday, &lt;a href=&quot;https://www.nytimes.com/interactive/2018/01/22/us/politics/live-senate-vote-government-shutdown2.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;the Senate voted&lt;/a&gt; 81-18 to end the government shutdown. This naturally grabbed the Riddler’s attention: It’s a palindrome! The vote tally reads the same forward and backward. This specific tally was made possible by the absence of John McCain. But do senators need to be absent to create palindrome tallies? If so, what numbers of absences will do the trick?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Extra credit:&lt;/em&gt; How many palindromic Senate votes have occurred in the &lt;a href=&quot;https://www.senate.gov/legislative/votes.htm&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;past three decades&lt;/a&gt;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In addition to potentially helping solve a problem in the office, I figure this is a good excuse to work on experimenting with &lt;a href=&quot;https://en.wikipedia.org/wiki/Object-oriented_programming&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;object oriented programming&lt;/a&gt; and writing functions to work with &lt;a href=&quot;http://adv-r.had.co.nz/OO-essentials.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;S3 methods&lt;/a&gt;. As you will see, the function I wrote is more for fun than anything else, but I think it does the trick in showing how s3 methods work. In addition, the first solution to the puzzle I propose is verbose. At the end I show how I solved the problem in an even quicker&lt;/p&gt;
&lt;h2 id=&quot;lets-begin&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#lets-begin&quot; aria-label=&quot;lets begin permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Let’s begin!&lt;/h2&gt;
&lt;center&gt;
&lt;p&gt;&lt;img src=&quot;https://res.cloudinary.com/chrissfriedman/image/upload/v1517458326/drwho_allons_y_ruxxvw.gif&quot;&gt;&lt;/p&gt;
&lt;/center&gt;
&lt;p&gt;The first thing I did when approaching this puzzle was to re-frame my question into something I can write code for.&lt;/p&gt;
&lt;h5 id=&quot;what-are-the-possible-combinations-of-votes-that-can-result-in-a-palindrome&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#what-are-the-possible-combinations-of-votes-that-can-result-in-a-palindrome&quot; aria-label=&quot;what are the possible combinations of votes that can result in a palindrome permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;What are the possible combinations of votes that can result in a palindrome?&lt;/h5&gt;
&lt;p&gt;When thinking about palindromes, the first thing I need to do is think about what the reversible part of the palindrome is. With senate votes, the first part of the palindrome can be considered the number of “yea” votes while the second part of it can be considered the number of “nay” votes.&lt;/p&gt;
&lt;p&gt;Considering the number of seats in the senate, the number of yea votes can be any number between 0 and 100.&lt;/p&gt;
&lt;p&gt;Considering the problem of palindromes will narrow down &lt;code class=&quot;language-text&quot;&gt;0:100&lt;/code&gt; in two ways.&lt;/p&gt;
&lt;p&gt;First, a unanimous vote in the senate is impossible! Kidding aside, I also know that a unanimous vote can’t result in a palindrome, so the case where there are 100 yea votes can be discarded. In addition, we won’t look at cases were no senators vote!&lt;/p&gt;
&lt;p&gt;Second, Looking at votes between 1 and 99 is doubling the amount of work. Votes in the yea column greater than or equal to 50 are palindromes of the number of votes below 50. So it’s only necessary to look at yea votes where the number of votes is between 1 and 49.&lt;/p&gt;
&lt;p&gt;For illustrative purposes, let’s put that vector in a data frame&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(dplyr)

votes &amp;lt;- data_frame(yea = 1:49)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that we have a vector of all of the numbers between 0 and 49, we’ll build a vector of their palindromes and put them in a new column for all of the “nay&quot;&quot; votes.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(stringr)
library(purrr)

votes &amp;lt;- votes %&amp;gt;%
  mutate(nay = formatC(yea, width = 2, format = &amp;quot;d&amp;quot;, flag = &amp;quot;0&amp;quot;) %&amp;gt;%
           str_split(&amp;quot;&amp;quot;) %&amp;gt;%
           map(rev) %&amp;gt;%
           map_chr(paste, collapse = &amp;quot;&amp;quot;) %&amp;gt;%
           as.numeric())&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the above code chunk, I introduced a function it seems a lot of people don’t get to play with very often, &lt;a href=&quot;https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/formatC&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;formatC&lt;/code&gt;&lt;/a&gt;, an interface to the c function &lt;code class=&quot;language-text&quot;&gt;printf&lt;/code&gt;. Basically, it allows users to format a text input. Here, it’s used to add leading zeros.&lt;/p&gt;
&lt;p&gt;Above, there’s also a call to &lt;a href=&quot;https://www.rdocumentation.org/packages/stringr/versions/1.1.0/topics/str_split&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;str_split&lt;/code&gt;&lt;/a&gt; from the &lt;a href=&quot;http://stringr.tidyverse.org/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;stringr&lt;/code&gt;&lt;/a&gt; package. It’s used here to split the two digits so that they can then be reversed.&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;str_split&lt;/code&gt; outputs a list of character vectors where each element is the two characters. To operate on this list, &lt;a href=&quot;https://www.rdocumentation.org/packages/purrr/versions/0.2.4/topics/map&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt;&lt;/a&gt; from the &lt;a href=&quot;http://purrr.tidyverse.org/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;purrr&lt;/code&gt;&lt;/a&gt; package is used, to operate on each element in the list. In this instance &lt;code class=&quot;language-text&quot;&gt;map&lt;/code&gt; applies the &lt;a href=&quot;https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/rev&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;rev&lt;/code&gt;&lt;/a&gt; function to reverse each item in each element in the list so that &lt;code class=&quot;language-text&quot;&gt;&amp;quot;8&amp;quot; &amp;quot;1&amp;quot;&lt;/code&gt; becomes &lt;code class=&quot;language-text&quot;&gt;&amp;quot;1&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Similarly, &lt;a href=&quot;http://purrr.tidyverse.org/reference/index.html#section-map-family&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;map_chr&lt;/code&gt;&lt;/a&gt; pastes the items in each element together (changing &lt;code class=&quot;language-text&quot;&gt;&amp;quot;1&amp;quot; &amp;quot;8&amp;quot;&lt;/code&gt; to &lt;code class=&quot;language-text&quot;&gt;&amp;quot;18&amp;quot;&lt;/code&gt;) and uses the &lt;code class=&quot;language-text&quot;&gt;_chr&lt;/code&gt; modifier to return a character vector that is then converted into a numeric one.&lt;/p&gt;
&lt;p&gt;After computing the nay votes, we need to narrow down our list of yea’s and nay’s to votes that are possible in the 100 seat senate.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;votes &amp;lt;- votes %&amp;gt;%
  mutate(vote_count = yea + nay) %&amp;gt;%
  # Filter our impossible vote counts
  filter_at(vars(vote_count), all_vars(. &amp;lt;= 100))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you’ve been playing along at home, you will notice that 10 of the vote counts were dropped.&lt;/p&gt;
&lt;p&gt;Now we have all of the possible vote tallies that are palindromic!&lt;/p&gt;
&lt;h2 id=&quot;do-senators-need-to-be-absent-to-create-palindrome-tallies&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#do-senators-need-to-be-absent-to-create-palindrome-tallies&quot; aria-label=&quot;do senators need to be absent to create palindrome tallies permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Do senators need to be absent to create palindrome tallies?&lt;/h2&gt;
&lt;p&gt;This question is answered by seeing if any of the items in &lt;code class=&quot;language-text&quot;&gt;votes$vote_count&lt;/code&gt; equal 100. If they do, then the answer to the question is no.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;any(votes$vote_count == 100)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;YES! Senators DO need to be absent for a palindromic vote count!&lt;/p&gt;
&lt;p&gt;What are the number of absences that will do the trick?&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;100 - votes$vote_count %&amp;gt;%
  unique()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Above, I show the absences needed for a vote to be palindromic. That said, if 50 senators are absent, a quorum is not present, so business isn’t being conducted.&lt;/p&gt;
&lt;p&gt;In actuality, the number of absences can be 49, 34, 23, 12, or 1.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Importing Social Network Data into R]]></title><description><![CDATA[In my professional life, I manage and analyze data on a team that studies the social networks surrounding children with autism. The purpose of this post is not to discuss that work in depth, but rather to show how to quickly and easily import one type of data I work with into R. ]]></description><link>https://chris-s-friedman.com//posts/importing-social-network-data-into-r</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/importing-social-network-data-into-r</guid><pubDate>Tue, 19 Dec 2017 00:00:00 GMT</pubDate><content:encoded>&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(knitr)
library(printr)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In my professional life, I manage and analyze data on a team that studies the social networks surrounding children with autism. The purpose of this post is not to discuss that work in depth, but rather to show how to quickly and easily import one type of data I work with into R. For social network analysis, I use the package &lt;a href=&quot;http://igraph.org/r/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;igraph&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The type of data I’m going to talk about importing today is egocentric network data. For those that don’t know, egocentric network data involves asking a single person about the makeup of an entire network.&lt;/p&gt;
&lt;h2 id=&quot;collecting-egocentric-network-data&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#collecting-egocentric-network-data&quot; aria-label=&quot;collecting egocentric network data permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Collecting Egocentric Network Data&lt;/h2&gt;
&lt;p&gt;To demonstrate, imagine that I’m interested in social networks at the gym and that you go to the gym with a three friends every week. I approach you and after asking you to join my study, ask you who you go the gym with.&lt;/p&gt;
&lt;p&gt;You say “James, Jen, and Rene.”&lt;/p&gt;
&lt;p&gt;Then I ask some questions about the group in relation to James:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How many times a week do you see James?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;How many times a week does James see Jen?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;How many times a week does James see Rene?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We then do the same for Jen and Rene where I ask you often you see Jen and how often she sees Rene, and then I ask how often you see Rene.&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;My survey here came in two parts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;name generatorr&lt;/strong&gt; where I ask you who you interact with&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;name modifier&lt;/strong&gt; where I ask you how you interact with each person&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After the data are collected, I log it all in my spreadsheet where each row corresponds to a data collection instance.&lt;/p&gt;
&lt;p&gt;So, I have data that look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(dplyr)

gym_networks &amp;lt;- data_frame(
  participant = c(&amp;quot;You&amp;quot;, &amp;quot;Bart&amp;quot;, &amp;quot;Lisa&amp;quot;),
  p1 = c(&amp;quot;James&amp;quot;, &amp;quot;Milhouse&amp;quot;, &amp;quot;Sherry&amp;quot;),
  p2 = c(&amp;quot;Jen&amp;quot;, &amp;quot;Nelson&amp;quot;, &amp;quot;Terry&amp;quot;),
  p3 = c(&amp;quot;Rene&amp;quot;, &amp;quot;Martin&amp;quot;, &amp;quot;Ralph&amp;quot;),
  participant_x_p1 = sample(1:10, 3),
  p1_x_p2 = sample(1:10, 3),
  p1_x_p3 = sample(1:10, 3),  
  participant_x_p2 = sample(1:10, 3),
  p2_x_p3 = sample(1:10, 3),
  participant_x_p3 = sample(1:10, 3)
)
print(gym_networks)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;getting-the-data-ready-for-analysis&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#getting-the-data-ready-for-analysis&quot; aria-label=&quot;getting the data ready for analysis permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Getting the data ready for analysis&lt;/h2&gt;
&lt;p&gt;The data here are represented in an adjacency list with weights attached. Although igraph has a function for importing adjacency lists, it isn’t not configured to handle weights, so we will take our adjacency list and convert it into an edge list, which igraph can handle with weights.&lt;/p&gt;
&lt;p&gt;To accomplish this, we’ll use the package &lt;a href=&quot;http://tidyr.tidyverse.org/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;tidyr&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(tidyr)

el &amp;lt;- gym_networks %&amp;gt;%
  # Step 1: Make each row a single edge
  gather(key, value = &amp;quot;weight&amp;quot;, -(participant:p3)) %&amp;gt;%
  # Step 2: Configure two new columns, an ego, and an alter
  mutate(ego = case_when(grepl(&amp;quot;participant&amp;quot;, key) ~ participant,
                         grepl(&amp;quot;p1_&amp;quot;, key) ~ p1,
                         grepl(&amp;quot;p2_&amp;quot;, key) ~ p2,
                         grepl(&amp;quot;p3_&amp;quot;, key) ~ p3),
         alter = case_when(grepl(&amp;quot;_p1&amp;quot;, key) ~ p1,
                           grepl(&amp;quot;_p2&amp;quot;, key) ~ p2,
                           grepl(&amp;quot;_p3&amp;quot;, key) ~ p3)) %&amp;gt;%
  # Step 3: Clean up the data frame
  select(ego, alter, weight)

print(head(el))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In three steps, we go from adjacency list to edge list. In one more step, we have an igraph object to analyse and plot!&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(igraph)

graph &amp;lt;- graph_from_data_frame(el)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Happy analyzing, friends!&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I don’t ask how often Jen sees James and how often Rene sees James or Jen because we assume that it takes two to tango in this respect.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Getting RateBeer Data…Programmatically]]></title><description><![CDATA[I show how we can get more beers at once. After that, I'm going to show how we can use the API, `rvest`, and `purrr` to get beers from all the brewers around me.]]></description><link>https://chris-s-friedman.com//posts/getting-ratebeer-data-programmatically</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/getting-ratebeer-data-programmatically</guid><pubDate>Thu, 07 Dec 2017 00:00:00 GMT</pubDate><content:encoded>&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(httr)
library(jsonlite)
library(purrr)
library(rvest)
knitr::opts_chunk$set(eval = FALSE)
API_key &amp;lt;- Sys.getenv(&amp;quot;rateBeer_API_key&amp;quot;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In my &lt;a href=&quot;../collecting-beer-data-using-ratebeer&quot;&gt;last post&lt;/a&gt; I showed how using the package &lt;code class=&quot;language-text&quot;&gt;httr&lt;/code&gt;, You can access the RateBeer API to get information about beers made by a brewery. When I left off, I showed a problem - the API only shows 10 beers at a time.&lt;/p&gt;
&lt;p&gt;Today, I’m going to show how we can get more beers at once. After that, I’m going to show how we can use the API, &lt;code class=&quot;language-text&quot;&gt;rvest&lt;/code&gt;, and &lt;code class=&quot;language-text&quot;&gt;purrr&lt;/code&gt; to get beers from all the brewers around me.&lt;/p&gt;
&lt;h2 id=&quot;updating-the-call-to-the-api&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#updating-the-call-to-the-api&quot; aria-label=&quot;updating the call to the api permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Updating the call to the API&lt;/h2&gt;
&lt;h3 id=&quot;setting-the-first-argument&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#setting-the-first-argument&quot; aria-label=&quot;setting the first argument permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Setting the &lt;code class=&quot;language-text&quot;&gt;first:&lt;/code&gt; argument&lt;/h3&gt;
&lt;p&gt;In the last post, I didn’t mention one of the arguments that can be used when making &lt;code class=&quot;language-text&quot;&gt;beersByBrewer&lt;/code&gt; query. Besides the argument for the brewerID, we can also use the &lt;code class=&quot;language-text&quot;&gt;first&lt;/code&gt; argument to specify how many beers we want to see from a brewer. As you will see below, one of the changes I make to the call to the API includes seting the value for the argument &lt;code class=&quot;language-text&quot;&gt;first&lt;/code&gt; to &lt;code class=&quot;language-text&quot;&gt;999&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Ninety-nine beers on the wall? Let’s make it Nine hundred and ninety-nine.&lt;/p&gt;
&lt;h3 id=&quot;turning-the-call-to-the-api-into-a-function&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#turning-the-call-to-the-api-into-a-function&quot; aria-label=&quot;turning the call to the api into a function permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Turning the call to the API into a function.&lt;/h3&gt;
&lt;p&gt;As you will see later on, it will be handy to have the call to the API as a function. Below, I declare that function:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;get_beers_from_brewer &amp;lt;- function(brewer_id, api_key) {
  URL &amp;lt;- &amp;quot;https://api.ratebeer.com/v1/api/graphql&amp;quot;
  POST(URL,
       body = list(
         query = paste0(
&amp;quot;query{
  beersByBrewer(brewerId: &amp;quot;, brewer_id, &amp;quot;, first: 999) {
    totalCount
      items{
        name
        abv
        averageRating
        ratingCount
        isRetired
        style{
          name
        }
        brewer {
          id
          name
          streetAddress
          city
          state {
            name
          }
          zip
        }
      }
     }
}&amp;quot;),
       variables = &amp;quot;{}&amp;quot;,
       operationName = NULL),
       encode = &amp;quot;json&amp;quot;, # tells httr to encode the body of the request as json
       add_headers(&amp;quot;content-type&amp;quot; = &amp;quot;application/json&amp;quot;,
                   &amp;quot;Accept&amp;quot; = &amp;quot;application/json&amp;quot;,
                   &amp;quot;x-api-key&amp;quot; = api_key))
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;finding-breweries&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#finding-breweries&quot; aria-label=&quot;finding breweries permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Finding Breweries&lt;/h2&gt;
&lt;p&gt;Okay, so now, I have an easy way to get information about the beers that breweries make. All I need to do now is point the API to the brewer ID of each brewery I want information on.&lt;/p&gt;
&lt;p&gt;Remember, the brewer id can be found by looking at the url for that brewery and is in the form:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;https://www.ratebeer.com/brewers/&amp;lt;BREWERY_NAME_HERE&amp;gt;/&amp;lt;BREWER_ID_HERE&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The thing is, I like data and want A LOT of it. There’s no way I’m going to hand sort through urls to try to find breweries. Can’t this be automated?&lt;/p&gt;
&lt;p&gt;YES!&lt;/p&gt;
&lt;p&gt;RateBeer maintains lists of breweries by state. For example, the breweries in Pennsylvania can be found &lt;a href=&quot;https://www.ratebeer.com/breweries/pennsylvania/38/213/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;here&lt;/a&gt;. Using the package &lt;a href=&quot;https://github.com/hadley/rvest&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;rvest&lt;/code&gt;&lt;/a&gt;, we can pull information about all of the breweries in the state, and then use &lt;a href=&quot;https://github.com/tidyverse/purrr&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;purrr&lt;/code&gt;&lt;/a&gt; to iterate over that list, using the function, &lt;code class=&quot;language-text&quot;&gt;get_beers_from_brewer()&lt;/code&gt;.   &lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;rvest&lt;/code&gt; is a package to make harvesting information from the web easy. Below, you see how, in three steps, I have a list of urls that point to all of the breweries in the state.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(rvest)

brewery_list_url &amp;lt;- &amp;quot;https://www.ratebeer.com/breweries/pennsylvania/38/213/&amp;quot;

brewery_ids &amp;lt;-  read_html(brewery_list_url) %&amp;gt;%
  html_nodes(&amp;quot;#brewerTable a:nth-child(1)&amp;quot;) %&amp;gt;%
  html_attr(&amp;#39;href&amp;#39;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To explain what happened in the previous code chunk:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;read_html()&lt;/code&gt; loads the url for the list of breweris in PA. This is the same thing that happens if you click this &lt;a href=&quot;https://www.ratebeer.com/breweries/pennsylvania/38/213/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;link&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;html_nodes()&lt;/code&gt; searches for every place in the html file we navigated to that has a link to a brewery. The text in the argument for the function points to the css selector for where breweries can be found on the page. I found this selector using &lt;a href=&quot;http://selectorgadget.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;selectorgadget&lt;/a&gt;. This function gives me a list of the each time that selector shows up.&lt;/li&gt;
&lt;li&gt;&lt;code class=&quot;language-text&quot;&gt;html_attr()&lt;/code&gt; searches that list for an attribute of the specified type. In this case I specified a hyperlink.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;yards_url &amp;lt;- grep(&amp;quot;yards&amp;quot;, brewery_ids, value = TRUE)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As I said, this outputs a list of urls. As an example, the url for Yards Brewing looks like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;/brewers/yards-brewing-company/166/&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Now, I can take this list of urls, and use the function &lt;code class=&quot;language-text&quot;&gt;purrr::map_chr()&lt;/code&gt; to get a list of brewer IDs. As a note, I use &lt;code class=&quot;language-text&quot;&gt;map_chr&lt;/code&gt; because it flattens the list of IDs into a single character vector. I highly suggest that you check out the rest of the &lt;code class=&quot;language-text&quot;&gt;map_*&lt;/code&gt; functions.&lt;/p&gt;
&lt;p&gt;Below, I take each item in the list of urls, split each url at any &lt;code class=&quot;language-text&quot;&gt;&amp;quot;/&amp;quot;&lt;/code&gt; and then use &lt;code class=&quot;language-text&quot;&gt;map_chr()&lt;/code&gt; to select the fourth element, the brewer ID.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;brewery_ids &amp;lt;- brewery_ids %&amp;gt;%
  strsplit(&amp;quot;/&amp;quot;) %&amp;gt;%
  map_chr(4)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now I have a list of brewery IDs that I can feed into &lt;code class=&quot;language-text&quot;&gt;get_beers_from_brewer()&lt;/code&gt;&lt;/p&gt;
&lt;h2 id=&quot;getting-all-the-beers&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#getting-all-the-beers&quot; aria-label=&quot;getting all the beers permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Getting ALL THE BEERS!&lt;/h2&gt;
&lt;p&gt;To iterate through the list, we’ll use &lt;code class=&quot;language-text&quot;&gt;map()&lt;/code&gt; and some functions from &lt;code class=&quot;language-text&quot;&gt;jsonlite&lt;/code&gt;, a package that can parse JSON. Then we’ll use &lt;code class=&quot;language-text&quot;&gt;map()&lt;/code&gt; to work through the levels of the response, from it’s highest level (“data”) down to the actual data frame (held in the named object, “items”).&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;brewery_beer_df &amp;lt;- brewery_ids %&amp;gt;%
  map(function(brewer_id){
    Sys.sleep(1) # the API restricts to 1 request per second.
    get_beers_from_brewer(brewer_id, API_key)
  }) %&amp;gt;%
  # Here we use jsonlite functions to turn the response of the request into
  # json
  map(content, type = &amp;quot;text&amp;quot;) %&amp;gt;%
  # and then turn that into an r object which has dfs in it from each
  # brewery.
  map(fromJSON, flatten = TRUE) %&amp;gt;%
  # working down through the response levels.
  map(&amp;quot;data&amp;quot;) %&amp;gt;% map(&amp;quot;beersByBrewer&amp;quot;) %&amp;gt;% map_dfr(&amp;quot;items&amp;quot;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The object &lt;code class=&quot;language-text&quot;&gt;brewery_beer_df&lt;/code&gt; is the data frame with all the beers from breweries we requested.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Accessing the RateBeer API with R Using httr]]></title><description><![CDATA[A few months ago, I was talking with a friend of mine about the idea for this blog and how I wanted to use data science to explore beer. He suggested that I use the blog as well as beer to learn something new about where I live. So I ask, what can beer teach me about Philadelphia?]]></description><link>https://chris-s-friedman.com//posts/collecting-beer-data-using-ratebeer</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/collecting-beer-data-using-ratebeer</guid><pubDate>Tue, 05 Dec 2017 00:00:00 GMT</pubDate><content:encoded>&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(knitr)
library(kableExtra)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A few months ago, I was talking with a friend of mine about the idea for this blog and how I wanted to use data science to explore beer. He suggested that I use the blog as well as beer to learn something new about where I live. So I ask, what can beer teach me about Philadelphia?&lt;/p&gt;
&lt;h2 id=&quot;the-first-thing-i-need-data&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#the-first-thing-i-need-data&quot; aria-label=&quot;the first thing i need data permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;The first thing I need? Data!&lt;/h2&gt;
&lt;p&gt;Oddly enough, it’s actually pretty challenging to get access to high quality, current beer data.&lt;/p&gt;
&lt;p&gt;I chose to use &lt;a href=&quot;https://www.ratebeer.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;RateBeer’s&lt;/a&gt; data, mostly because they have an easily accessible &lt;a href=&quot;https://www.ratebeer.com/api.asp&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;API&lt;/a&gt;, and meet my needs better than anyone else. They also &lt;a href=&quot;https://www.ratebeer.com/ratingsqa.asp&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;disclose&lt;/a&gt; how they come to their average beer rating, allowing me to see what’s under the hood. In the footnotes, I briefly explain some alternatives&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=&quot;collecting-data&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#collecting-data&quot; aria-label=&quot;collecting data permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Collecting Data&lt;/h2&gt;
&lt;p&gt;I want to look at breweries in the area. Sadly, the RateBeer API doesn’t have a feature to search for breweries in the area. There is however, a way to query what beers a brewery makes.
To get a list of beers a brewery makes though, I need to know what that brewery’s unique ID is. Easy enough to find. the URL for a brewery on RateBeer is of the form:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;https://www.ratebeer.com/brewers/&amp;lt;BREWERY_NAME_HERE&amp;gt;/&amp;lt;BREWER_ID_HERE&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So, as an example, &lt;code class=&quot;language-text&quot;&gt;166&lt;/code&gt; is the ID for &lt;a href=&quot;https://www.ratebeer.com/brewers/yards-brewing-company/166/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;Yards Brewering Company&lt;/a&gt;. The url is:&lt;/p&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;https://www.ratebeer.com/brewers/yards-brewing-company/166/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;RateBeer’s API uses the language of &lt;a href=&quot;http://graphql.org/learn/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;GraphQL&lt;/a&gt;. It’s beyond the scope of this post to dive into GraphQL, so instead, I’ll explain how it’s implemented in regards to the queries that I make.&lt;/p&gt;
&lt;p&gt;GraphQL requests are written in JSON format. Basically, I specify that I make a query. Nested in that call is the type of query I want to make (as well as arguments to that query), which then has the responses I wanted nested within the call.&lt;/p&gt;
&lt;p&gt;So, a query for the name of the beer with the ID number &lt;code class=&quot;language-text&quot;&gt;4934&lt;/code&gt; looks like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;query {
  beer(id: 4934) {
    name
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Similarly, my query for the beers made by Yards will look like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;query{
  beersByBrewer(brewerId: 166) { # 166 is Yards
    totalCount # This gives the total number of beers in the beer list
    items{ # For each beer, I want these items...
      name # the name of the beer
      abv # the beer&amp;#39;s ABV
      averageRating # the average rating of the beer
      ratingCount # the number of ratings the beer has
      isRetired # is the beer retired?
      style{
        # Style needs to be jumped into one level because when I query style, I
        # can also ask for a description of the style, and can even jump into
        # recommended glassware. That said, all I want is the style name.
        name
      }
    }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that I know what format to make my request in, it’s time to actually get my first bit of data off the API!&lt;/p&gt;
&lt;h3 id=&quot;hello-httr&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#hello-httr&quot; aria-label=&quot;hello httr permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Hello, httr&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/r-lib/httr&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;httr&lt;/code&gt;&lt;/a&gt; is a package developed by Hadley Wickham of RStudio to make it easy to make HTTP requests. the package and the nuances of HTTP wont be gone into here, but some good resources for httr include the &lt;a href=&quot;https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;quickstart vignette&lt;/a&gt;, and Bradley Boehmke’s &lt;a href=&quot;http://bradleyboehmke.github.io/2016/01/scraping-via-apis.html#httr_api&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;post&lt;/a&gt; on using httr.
The httr syntax is quite simple. The main functions are curl verbs (httr is a wrapper for &lt;a href=&quot;https://github.com/jeroen/curl&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;curl&lt;/a&gt; functions), and the function arguments all start with the URL and are then followed by things to modify the URL and to send with the URL.&lt;/p&gt;
&lt;p&gt;RateBeer’s API is at &lt;code class=&quot;language-text&quot;&gt;https://api.ratebeer.com/v1/api/graphql&lt;/code&gt;. In the header of the request, content type and response type are required along with an API key. The query itself modifies the URL to call. So, my call to the RateBeer API to get the beers made by yards looks like this:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(httr)

API_key &amp;lt;- Sys.getenv(&amp;quot;rateBeer_API_key&amp;quot;)

URL &amp;lt;- &amp;quot;https://api.ratebeer.com/v1/api/graphql&amp;quot;

beers_by_yards &amp;lt;- POST(URL,
         body = list(
           query =
&amp;quot;query{
  beersByBrewer(brewerId: 166) {
    totalCount
    items{
      name
      abv
      averageRating
      ratingCount
      isRetired
      style{
        name
      }
    }
  }
}&amp;quot;,
           variables = &amp;quot;{}&amp;quot;,
           operationName = NULL),
         encode = &amp;quot;json&amp;quot;, # tells httr to encode the body of the request as json
         add_headers(&amp;quot;content-type&amp;quot; = &amp;quot;application/json&amp;quot;,
                     &amp;quot;Accept&amp;quot; = &amp;quot;application/json&amp;quot;,
                     &amp;quot;x-api-key&amp;quot; = API_key))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;parsing-the-response&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#parsing-the-response&quot; aria-label=&quot;parsing the response permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Parsing the Response&lt;/h3&gt;
&lt;p&gt;&lt;code class=&quot;language-text&quot;&gt;content()&lt;/code&gt; is httr’s function for extracting content from a request. Using the &lt;code class=&quot;language-text&quot;&gt;type =&lt;/code&gt; argument, we can have the function give us the data from the request as valid JSON. then we can use the &lt;code class=&quot;language-text&quot;&gt;jsonlite&lt;/code&gt; package to make the data easier to work with.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;library(jsonlite)

json &amp;lt;- content(beers_by_yards, type = &amp;quot;text&amp;quot;)

parsed_json &amp;lt;- fromJSON(json, flatten = TRUE)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Recall that the response to the request was supposed to be JSON? This means that all of our items are nested in the same way as we requested them. So, to get the number of beers that Yards makes, as well as a data frame with those beers, we work through those levels.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;beer_count &amp;lt;- parsed_json$data$beersByBrewer$totalCount

beer_df &amp;lt;- parsed_json$data$beersByBrewer$items

beer_count
kable(beer_df, &amp;quot;html&amp;quot;) %&amp;gt;%
  kable_styling(bootstrap_options = c(&amp;quot;striped&amp;quot;, &amp;quot;hover&amp;quot;))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;So, where’s all 111 beers? The API gives us 10 beers at a time. When We make the request to the API, we can tell it where to start that list of 10 beers, alongside our request to look at beers from a certain brewery.&lt;/p&gt;
&lt;p&gt;In my next post, I’ll show how we can get all 111 of those beers and beers from other breweries, programmatically.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://www.beeradvocate.com/community/threads/terms-of-service.101118/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;Beer Advocate&lt;/a&gt;&lt;/strong&gt; expressly forbids scraping and does not have an official API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://untappd.com/terms/api&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;Untappd&lt;/a&gt;&lt;/strong&gt; has an API but they don’t give out API keys to people that are just interested in data. If I build an app, maybe my decision will change, but in the meantime, no using their API. It looks like they may not expressly &lt;a href=&quot;https://untappd.com/terms/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;forbid&lt;/a&gt;scraping or crawling on the site, but scraping has its own challenges. I may cover it in the future, but in the meantime, I want to just use an API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://www.brewerydb.com/developers/docs&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;BeerDB&lt;/a&gt;&lt;/strong&gt; looks like an awesome idea - beer data for developers! Yet, the API doesn’t show ratings and you can only get ABV if you are a premium user.I can get all the information that I am looking for from other APIs, so no need to use &lt;em&gt;and&lt;/em&gt; pay for this one.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://openbeerdb.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;Open Beer Database&lt;/a&gt;&lt;/strong&gt; hasn’t updated the database since 2011. That’s a solid no.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;http://www.thebeerspot.com/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;The Beer Spot&lt;/a&gt;&lt;/strong&gt; looks like it could be a fun community, but considering that A), &lt;a href=&quot;http://www.thebeerspot.com/forum/index.php/topic,17296.msg728792.html#msg728792&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;they may not have very many users&lt;/a&gt; and B) no one has &lt;a href=&quot;http://www.thebeerspot.com/beer/info/yuengling-brewery/yuengling-traditional-lager&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;reviewd Yuengling&lt;/a&gt; (beer geek or not, a Philadelphia area staple) I’m not going to use their API.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[A mission statement]]></title><description><![CDATA[On the left hand side of this page (or when you click the “About” button), I start with one sentence that gets at why I do many things I do. I love going on adventures and I love learning from data. I think that to adventure means to do much more than to go an exotic locale. To me...]]></description><link>https://chris-s-friedman.com//posts/a-mission-statement</link><guid isPermaLink="false">https://chris-s-friedman.com//posts/a-mission-statement</guid><pubDate>Wed, 29 Nov 2017 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;On the left hand side of this page (or when you click the “About” button), I start with one sentence that gets at why I do many things I do. I love going on adventures and I love learning from data.&lt;/p&gt;
&lt;p&gt;I think that to adventure means to do much more than to go an exotic locale. To me, an adventure is to find myself somewhere I’ve never been. Sometimes, adventure takes me to the &lt;a href=&quot;https://www.instagram.com/p/BcGek6DFIEe/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;top of yosemite falls&lt;/a&gt;, to my local climbing gym, or even around the block.&lt;/p&gt;
&lt;p&gt;I’m certainly on an adventure, putting these words to screen.&lt;/p&gt;
&lt;p&gt;Which brings me to why I am here, rebuilding my website for the up-teenth time.&lt;/p&gt;
&lt;p&gt;I’m here to learn something new about the topics that interest me.&lt;/p&gt;
&lt;p&gt;I’m here to get data, analyze it, and communicate it so that others can learn alongside me.&lt;/p&gt;
&lt;p&gt;My mission statement?&lt;/p&gt;
&lt;p&gt;I’m here to adventure.&lt;/p&gt;</content:encoded></item></channel></rss>