{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/collecting-beer-data-using-ratebeer","result":{"data":{"markdownRemark":{"id":"f37e7c7d-3bf2-5b7a-b066-3ffc24694d0b","html":"<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(knitr)\nlibrary(kableExtra)</code></pre></div>\n<p>A few months ago, I was talking with a friend of mine about the idea for this blog and how I wanted to use data science to explore beer. He suggested that I use the blog as well as beer to learn something new about where I live. So I ask, what can beer teach me about Philadelphia?</p>\n<h2 id=\"the-first-thing-i-need-data\" style=\"position:relative;\"><a href=\"#the-first-thing-i-need-data\" aria-label=\"the first thing i need data permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>The first thing I need? Data!</h2>\n<p>Oddly enough, it’s actually pretty challenging to get access to high quality, current beer data.</p>\n<p>I chose to use <a href=\"https://www.ratebeer.com/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">RateBeer’s</a> data, mostly because they have an easily accessible <a href=\"https://www.ratebeer.com/api.asp\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">API</a>, and meet my needs better than anyone else. They also <a href=\"https://www.ratebeer.com/ratingsqa.asp\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">disclose</a> how they come to their average beer rating, allowing me to see what’s under the hood. In the footnotes, I briefly explain some alternatives<sup id=\"fnref-1\"><a href=\"#fn-1\" class=\"footnote-ref\">1</a></sup></p>\n<h2 id=\"collecting-data\" style=\"position:relative;\"><a href=\"#collecting-data\" aria-label=\"collecting data permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Collecting Data</h2>\n<p>I want to look at breweries in the area. Sadly, the RateBeer API doesn’t have a feature to search for breweries in the area. There is however, a way to query what beers a brewery makes.\nTo get a list of beers a brewery makes though, I need to know what that brewery’s unique ID is. Easy enough to find. the URL for a brewery on RateBeer is of the form:</p>\n<p><code class=\"language-text\">https://www.ratebeer.com/brewers/&lt;BREWERY_NAME_HERE&gt;/&lt;BREWER_ID_HERE&gt;</code></p>\n<p>So, as an example, <code class=\"language-text\">166</code> is the ID for <a href=\"https://www.ratebeer.com/brewers/yards-brewing-company/166/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Yards Brewering Company</a>. The url is:</p>\n<p><code class=\"language-text\">https://www.ratebeer.com/brewers/yards-brewing-company/166/</code></p>\n<p>RateBeer’s API uses the language of <a href=\"http://graphql.org/learn/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">GraphQL</a>. It’s beyond the scope of this post to dive into GraphQL, so instead, I’ll explain how it’s implemented in regards to the queries that I make.</p>\n<p>GraphQL requests are written in JSON format. Basically, I specify that I make a query. Nested in that call is the type of query I want to make (as well as arguments to that query), which then has the responses I wanted nested within the call.</p>\n<p>So, a query for the name of the beer with the ID number <code class=\"language-text\">4934</code> looks like this:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">query {\n  beer(id: 4934) {\n    name\n  }\n}</code></pre></div>\n<p>Similarly, my query for the beers made by Yards will look like this:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">query{\n  beersByBrewer(brewerId: 166) { # 166 is Yards\n    totalCount # This gives the total number of beers in the beer list\n    items{ # For each beer, I want these items...\n      name # the name of the beer\n      abv # the beer&#39;s ABV\n      averageRating # the average rating of the beer\n      ratingCount # the number of ratings the beer has\n      isRetired # is the beer retired?\n      style{\n        # Style needs to be jumped into one level because when I query style, I\n        # can also ask for a description of the style, and can even jump into\n        # recommended glassware. That said, all I want is the style name.\n        name\n      }\n    }\n  }\n}</code></pre></div>\n<p>Now that I know what format to make my request in, it’s time to actually get my first bit of data off the API!</p>\n<h3 id=\"hello-httr\" style=\"position:relative;\"><a href=\"#hello-httr\" aria-label=\"hello httr permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Hello, httr</h3>\n<p><a href=\"https://github.com/r-lib/httr\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><code class=\"language-text\">httr</code></a> is a package developed by Hadley Wickham of RStudio to make it easy to make HTTP requests. the package and the nuances of HTTP wont be gone into here, but some good resources for httr include the <a href=\"https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">quickstart vignette</a>, and Bradley Boehmke’s <a href=\"http://bradleyboehmke.github.io/2016/01/scraping-via-apis.html#httr_api\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">post</a> on using httr.\nThe httr syntax is quite simple. The main functions are curl verbs (httr is a wrapper for <a href=\"https://github.com/jeroen/curl\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">curl</a> functions), and the function arguments all start with the URL and are then followed by things to modify the URL and to send with the URL.</p>\n<p>RateBeer’s API is at <code class=\"language-text\">https://api.ratebeer.com/v1/api/graphql</code>. In the header of the request, content type and response type are required along with an API key. The query itself modifies the URL to call. So, my call to the RateBeer API to get the beers made by yards looks like this:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(httr)\n\nAPI_key &lt;- Sys.getenv(&quot;rateBeer_API_key&quot;)\n\nURL &lt;- &quot;https://api.ratebeer.com/v1/api/graphql&quot;\n\nbeers_by_yards &lt;- POST(URL,\n         body = list(\n           query =\n&quot;query{\n  beersByBrewer(brewerId: 166) {\n    totalCount\n    items{\n      name\n      abv\n      averageRating\n      ratingCount\n      isRetired\n      style{\n        name\n      }\n    }\n  }\n}&quot;,\n           variables = &quot;{}&quot;,\n           operationName = NULL),\n         encode = &quot;json&quot;, # tells httr to encode the body of the request as json\n         add_headers(&quot;content-type&quot; = &quot;application/json&quot;,\n                     &quot;Accept&quot; = &quot;application/json&quot;,\n                     &quot;x-api-key&quot; = API_key))</code></pre></div>\n<h3 id=\"parsing-the-response\" style=\"position:relative;\"><a href=\"#parsing-the-response\" aria-label=\"parsing the response permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Parsing the Response</h3>\n<p><code class=\"language-text\">content()</code> is httr’s function for extracting content from a request. Using the <code class=\"language-text\">type =</code> argument, we can have the function give us the data from the request as valid JSON. then we can use the <code class=\"language-text\">jsonlite</code> package to make the data easier to work with.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(jsonlite)\n\njson &lt;- content(beers_by_yards, type = &quot;text&quot;)\n\nparsed_json &lt;- fromJSON(json, flatten = TRUE)</code></pre></div>\n<p>Recall that the response to the request was supposed to be JSON? This means that all of our items are nested in the same way as we requested them. So, to get the number of beers that Yards makes, as well as a data frame with those beers, we work through those levels.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">beer_count &lt;- parsed_json$data$beersByBrewer$totalCount\n\nbeer_df &lt;- parsed_json$data$beersByBrewer$items\n\nbeer_count\nkable(beer_df, &quot;html&quot;) %&gt;%\n  kable_styling(bootstrap_options = c(&quot;striped&quot;, &quot;hover&quot;))</code></pre></div>\n<p>So, where’s all 111 beers? The API gives us 10 beers at a time. When We make the request to the API, we can tell it where to start that list of 10 beers, alongside our request to look at beers from a certain brewery.</p>\n<p>In my next post, I’ll show how we can get all 111 of those beers and beers from other breweries, programmatically.</p>\n<ul>\n<li><strong><a href=\"https://www.beeradvocate.com/community/threads/terms-of-service.101118/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Beer Advocate</a></strong> expressly forbids scraping and does not have an official API.</li>\n<li><strong><a href=\"https://untappd.com/terms/api\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Untappd</a></strong> has an API but they don’t give out API keys to people that are just interested in data. If I build an app, maybe my decision will change, but in the meantime, no using their API. It looks like they may not expressly <a href=\"https://untappd.com/terms/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">forbid</a>scraping or crawling on the site, but scraping has its own challenges. I may cover it in the future, but in the meantime, I want to just use an API.</li>\n<li><strong><a href=\"http://www.brewerydb.com/developers/docs\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">BeerDB</a></strong> looks like an awesome idea - beer data for developers! Yet, the API doesn’t show ratings and you can only get ABV if you are a premium user.I can get all the information that I am looking for from other APIs, so no need to use <em>and</em> pay for this one.</li>\n<li><strong><a href=\"https://openbeerdb.com/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Open Beer Database</a></strong> hasn’t updated the database since 2011. That’s a solid no.</li>\n<li><strong><a href=\"http://www.thebeerspot.com/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">The Beer Spot</a></strong> looks like it could be a fun community, but considering that A), <a href=\"http://www.thebeerspot.com/forum/index.php/topic,17296.msg728792.html#msg728792\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">they may not have very many users</a> and B) no one has <a href=\"http://www.thebeerspot.com/beer/info/yuengling-brewery/yuengling-traditional-lager\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">reviewd Yuengling</a> (beer geek or not, a Philadelphia area staple) I’m not going to use their API.</li>\n</ul>\n<div class=\"footnotes\">\n<hr>\n<ol>\n<li id=\"fn-1\">\n<a href=\"#fnref-1\" class=\"footnote-backref\">↩</a>\n</li>\n</ol>\n</div>","fields":{"slug":"/posts/collecting-beer-data-using-ratebeer","tagSlugs":["/tag/beer/","/tag/data-collection/","/tag/scraping/","/tag/httr/","/tag/r/","/tag/jsonlite/","/tag/ratebeer/","/tag/api/"]},"frontmatter":{"date":"2017-12-05","description":"A few months ago, I was talking with a friend of mine about the idea for this blog and how I wanted to use data science to explore beer. He suggested that I use the blog as well as beer to learn something new about where I live. So I ask, what can beer teach me about Philadelphia?","tags":["beer","data_collection","scraping","httr","r","jsonlite","ratebeer","API"],"title":"Accessing the RateBeer API with R Using httr","socialImage":null}}},"pageContext":{"slug":"/posts/collecting-beer-data-using-ratebeer"}}}