{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/getting-ratebeer-data-programmatically","result":{"data":{"markdownRemark":{"id":"e5794674-a6d4-5fed-bac8-a81b5d64ee5c","html":"<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(httr)\nlibrary(jsonlite)\nlibrary(purrr)\nlibrary(rvest)\nknitr::opts_chunk$set(eval = FALSE)\nAPI_key &lt;- Sys.getenv(&quot;rateBeer_API_key&quot;)</code></pre></div>\n<p>In my <a href=\"../collecting-beer-data-using-ratebeer\">last post</a> I showed how using the package <code class=\"language-text\">httr</code>, You can access the RateBeer API to get information about beers made by a brewery. When I left off, I showed a problem - the API only shows 10 beers at a time.</p>\n<p>Today, I’m going to show how we can get more beers at once. After that, I’m going to show how we can use the API, <code class=\"language-text\">rvest</code>, and <code class=\"language-text\">purrr</code> to get beers from all the brewers around me.</p>\n<h2 id=\"updating-the-call-to-the-api\" style=\"position:relative;\"><a href=\"#updating-the-call-to-the-api\" aria-label=\"updating the call to the api permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Updating the call to the API</h2>\n<h3 id=\"setting-the-first-argument\" style=\"position:relative;\"><a href=\"#setting-the-first-argument\" aria-label=\"setting the first argument permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Setting the <code class=\"language-text\">first:</code> argument</h3>\n<p>In the last post, I didn’t mention one of the arguments that can be used when making <code class=\"language-text\">beersByBrewer</code> query. Besides the argument for the brewerID, we can also use the <code class=\"language-text\">first</code> argument to specify how many beers we want to see from a brewer. As you will see below, one of the changes I make to the call to the API includes seting the value for the argument <code class=\"language-text\">first</code> to <code class=\"language-text\">999</code>.</p>\n<p>Ninety-nine beers on the wall? Let’s make it Nine hundred and ninety-nine.</p>\n<h3 id=\"turning-the-call-to-the-api-into-a-function\" style=\"position:relative;\"><a href=\"#turning-the-call-to-the-api-into-a-function\" aria-label=\"turning the call to the api into a function permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Turning the call to the API into a function.</h3>\n<p>As you will see later on, it will be handy to have the call to the API as a function. Below, I declare that function:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">get_beers_from_brewer &lt;- function(brewer_id, api_key) {\n  URL &lt;- &quot;https://api.ratebeer.com/v1/api/graphql&quot;\n  POST(URL,\n       body = list(\n         query = paste0(\n&quot;query{\n  beersByBrewer(brewerId: &quot;, brewer_id, &quot;, first: 999) {\n    totalCount\n      items{\n        name\n        abv\n        averageRating\n        ratingCount\n        isRetired\n        style{\n          name\n        }\n        brewer {\n          id\n          name\n          streetAddress\n          city\n          state {\n            name\n          }\n          zip\n        }\n      }\n     }\n}&quot;),\n       variables = &quot;{}&quot;,\n       operationName = NULL),\n       encode = &quot;json&quot;, # tells httr to encode the body of the request as json\n       add_headers(&quot;content-type&quot; = &quot;application/json&quot;,\n                   &quot;Accept&quot; = &quot;application/json&quot;,\n                   &quot;x-api-key&quot; = api_key))\n}</code></pre></div>\n<h2 id=\"finding-breweries\" style=\"position:relative;\"><a href=\"#finding-breweries\" aria-label=\"finding breweries permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Finding Breweries</h2>\n<p>Okay, so now, I have an easy way to get information about the beers that breweries make. All I need to do now is point the API to the brewer ID of each brewery I want information on.</p>\n<p>Remember, the brewer id can be found by looking at the url for that brewery and is in the form:</p>\n<p><code class=\"language-text\">https://www.ratebeer.com/brewers/&lt;BREWERY_NAME_HERE&gt;/&lt;BREWER_ID_HERE&gt;</code></p>\n<p>The thing is, I like data and want A LOT of it. There’s no way I’m going to hand sort through urls to try to find breweries. Can’t this be automated?</p>\n<p>YES!</p>\n<p>RateBeer maintains lists of breweries by state. For example, the breweries in Pennsylvania can be found <a href=\"https://www.ratebeer.com/breweries/pennsylvania/38/213/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here</a>. Using the package <a href=\"https://github.com/hadley/rvest\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><code class=\"language-text\">rvest</code></a>, we can pull information about all of the breweries in the state, and then use <a href=\"https://github.com/tidyverse/purrr\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><code class=\"language-text\">purrr</code></a> to iterate over that list, using the function, <code class=\"language-text\">get_beers_from_brewer()</code>.   </p>\n<p><code class=\"language-text\">rvest</code> is a package to make harvesting information from the web easy. Below, you see how, in three steps, I have a list of urls that point to all of the breweries in the state.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(rvest)\n\nbrewery_list_url &lt;- &quot;https://www.ratebeer.com/breweries/pennsylvania/38/213/&quot;\n\nbrewery_ids &lt;-  read_html(brewery_list_url) %&gt;%\n  html_nodes(&quot;#brewerTable a:nth-child(1)&quot;) %&gt;%\n  html_attr(&#39;href&#39;)</code></pre></div>\n<p>To explain what happened in the previous code chunk:</p>\n<ol>\n<li><code class=\"language-text\">read_html()</code> loads the url for the list of breweris in PA. This is the same thing that happens if you click this <a href=\"https://www.ratebeer.com/breweries/pennsylvania/38/213/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">link</a>.</li>\n<li><code class=\"language-text\">html_nodes()</code> searches for every place in the html file we navigated to that has a link to a brewery. The text in the argument for the function points to the css selector for where breweries can be found on the page. I found this selector using <a href=\"http://selectorgadget.com/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">selectorgadget</a>. This function gives me a list of the each time that selector shows up.</li>\n<li><code class=\"language-text\">html_attr()</code> searches that list for an attribute of the specified type. In this case I specified a hyperlink.</li>\n</ol>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">yards_url &lt;- grep(&quot;yards&quot;, brewery_ids, value = TRUE)</code></pre></div>\n<p>As I said, this outputs a list of urls. As an example, the url for Yards Brewing looks like this:</p>\n<blockquote>\n<p><code class=\"language-text\">/brewers/yards-brewing-company/166/</code></p>\n</blockquote>\n<p>Now, I can take this list of urls, and use the function <code class=\"language-text\">purrr::map_chr()</code> to get a list of brewer IDs. As a note, I use <code class=\"language-text\">map_chr</code> because it flattens the list of IDs into a single character vector. I highly suggest that you check out the rest of the <code class=\"language-text\">map_*</code> functions.</p>\n<p>Below, I take each item in the list of urls, split each url at any <code class=\"language-text\">&quot;/&quot;</code> and then use <code class=\"language-text\">map_chr()</code> to select the fourth element, the brewer ID.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">brewery_ids &lt;- brewery_ids %&gt;%\n  strsplit(&quot;/&quot;) %&gt;%\n  map_chr(4)</code></pre></div>\n<p>Now I have a list of brewery IDs that I can feed into <code class=\"language-text\">get_beers_from_brewer()</code></p>\n<h2 id=\"getting-all-the-beers\" style=\"position:relative;\"><a href=\"#getting-all-the-beers\" aria-label=\"getting all the beers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Getting ALL THE BEERS!</h2>\n<p>To iterate through the list, we’ll use <code class=\"language-text\">map()</code> and some functions from <code class=\"language-text\">jsonlite</code>, a package that can parse JSON. Then we’ll use <code class=\"language-text\">map()</code> to work through the levels of the response, from it’s highest level (“data”) down to the actual data frame (held in the named object, “items”).</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">brewery_beer_df &lt;- brewery_ids %&gt;%\n  map(function(brewer_id){\n    Sys.sleep(1) # the API restricts to 1 request per second.\n    get_beers_from_brewer(brewer_id, API_key)\n  }) %&gt;%\n  # Here we use jsonlite functions to turn the response of the request into\n  # json\n  map(content, type = &quot;text&quot;) %&gt;%\n  # and then turn that into an r object which has dfs in it from each\n  # brewery.\n  map(fromJSON, flatten = TRUE) %&gt;%\n  # working down through the response levels.\n  map(&quot;data&quot;) %&gt;% map(&quot;beersByBrewer&quot;) %&gt;% map_dfr(&quot;items&quot;)</code></pre></div>\n<p>The object <code class=\"language-text\">brewery_beer_df</code> is the data frame with all the beers from breweries we requested.</p>","fields":{"slug":"/posts/getting-ratebeer-data-programmatically","tagSlugs":["/tag/api/","/tag/beer/","/tag/data-collection/","/tag/httr/","/tag/jsonlite/","/tag/r/","/tag/ratebeer/","/tag/scraping/","/tag/rvest/"]},"frontmatter":{"date":"2017-12-07","description":"I show how we can get more beers at once. After that, I'm going to show how we can use the API, `rvest`, and `purrr` to get beers from all the brewers around me.","tags":["API","beer","data_collection","httr","jsonlite","r","ratebeer","scraping","rvest"],"title":"Getting RateBeer Data…Programmatically","socialImage":null}}},"pageContext":{"slug":"/posts/getting-ratebeer-data-programmatically"}}}