{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/happy-little-accidents","result":{"data":{"markdownRemark":{"id":"0524bda2-6ffa-59d9-9f98-ca2d0be18577","html":"<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">knitr::opts_chunk$set(message = FALSE, warning = FALSE)</code></pre></div>\n<p>This morning, I just found out about <a href=\"https://github.com/rfordatascience/tidytuesday\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">#tidytuesday</a> and I figured it would be a fun thing to play with.</p>\n<p>For my first foray into tidytuesday, we have data on Bob Ross’s paintings during his show. The data were compiled by fivethirtyeight and reported <a href=\"https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here</a>.</p>\n<p>The data are available <a href=\"https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-08-06\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here</a>. On the info page for the data, they show how to load the data and give an example of some basic tidying. I’ll do that below:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(dplyr)\nlibrary(tidyr)\nlibrary(stringr)\n\nbob_ross &lt;-\n  readr::read_csv(&quot;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-06/bob-ross.csv&quot;)\n\n# to clean up the episode information\nbob_ross &lt;-\n  bob_ross %&gt;%\n  janitor::clean_names() %&gt;%\n  separate(episode, into = c(&quot;season&quot;, &quot;episode&quot;), sep = &quot;E&quot;) %&gt;%\n  mutate(season = str_extract(season, &quot;[:digit:]+&quot;)) %&gt;%\n  mutate_at(vars(season, episode), as.integer)\n\nhead(bob_ross)</code></pre></div>\n<p>There are a couple of paintings that are named the same thing.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">bob_ross &lt;-\n  bob_ross %&gt;%\n  group_by(title) %&gt;%\n  mutate(title_count = 1:group_size(.)) %&gt;%\n  ungroup() %&gt;%\n  mutate(title = if_else(title_count &gt; 1,\n                         paste(title, title_count),\n                         title)) %&gt;%\n  select(-title_count)</code></pre></div>\n<p>There are some columns that are relate to the frame the painting got put into and some columns that relate to elements inside each painting. Like the fivethirtyeight crew, I’m more interested in the elements inside the paintings as opposed to the frames, so i’ll go ahead and drop those columns</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">painting_data &lt;-\n  bob_ross %&gt;%\n  select(-contains(&quot;frame&quot;), -steve_ross, -guest, -diane_andre)</code></pre></div>\n<p>In my professional work, I perform social network analysis, so let’s go ahead and look at networks of elements in Bob Ross’s paintings!</p>\n<h1 id=\"networks-of-bob-ross-paintings\" style=\"position:relative;\"><a href=\"#networks-of-bob-ross-paintings\" aria-label=\"networks of bob ross paintings permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Networks of Bob Ross Paintings</h1>\n<p>To get us looking at social networks, we first need to take the data from this wide format and turn it into an edge list. The edge list will connect each painting to every element that is inside it. From there, we can get a picture of what the network of paintings looks like!</p>\n<h2 id=\"organizing-the-edge-list\" style=\"position:relative;\"><a href=\"#organizing-the-edge-list\" aria-label=\"organizing the edge list permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Organizing the Edge List</h2>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">library(igraph)\nlibrary(ggnetwork)\n\ntitles &lt;- painting_data[[&quot;title&quot;]]\nincidence_mat &lt;-\n  painting_data %&gt;%\n  select(-season, -episode, -title)\n\nincidence_mat &lt;- as.matrix(incidence_mat)\nrownames(incidence_mat) &lt;- titles\n\nincidence_graph &lt;-\n  graph_from_incidence_matrix(incidence_mat, )\n\nggplot(incidence_graph, aes(x = x, y = y, xend = xend, yend = yend)) +\n  geom_edges() +\n  # type is TRUE if a node is an episode and FALSE if it&#39;s an element\n  geom_nodes(aes(color = type)) +\n  theme_blank()</code></pre></div>\n<p>That’s real busy! It looks like there are a few episodes that only have a few elements and some episodes that have many elements in it. There’s also that one episode that shares three elements with another episode and no others.</p>\n<p>Let’s see if we can clean this up a bit! First, i’ll connect episodes by how many elements they share.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">episode_x_episode &lt;- incidence_mat %*% t(incidence_mat)\n\nep_x_ep_graph &lt;-\n  graph_from_adjacency_matrix(episode_x_episode,\n                              # only look at the upper part of the matrix since it is symetrical\n                              mode = &quot;upper&quot;,\n                              # the connections are weighted\n                              weighted = TRUE,\n                              # don&#39;t count self-loops\n                              diag = FALSE)\n\nggnetwork(ep_x_ep_graph, weight = &quot;weight&quot;) %&gt;%\n  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +\n  geom_edges() +\n  geom_nodes() +\n  theme_blank()</code></pre></div>\n<p>Not much to look at. Or better yet, Ross’s paintings tend to share something in common with other paintings.</p>\n<p>Instead of looking at all the features at once, why don’t we look at groups of features. I’ve gone ahead and grouped each feature into different categories. I’ll load that up and then split up the painting df into different categories.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">feature_categories &lt;-\n  readr::read_csv(&quot;../../data/Bob Ross/ross_painting_features.csv&quot;)\n\npainting_categories &lt;-\n  painting_data %&gt;%\n  gather(feature, value, -season, -episode, -title) %&gt;%\n  left_join(feature_categories, by = &quot;feature&quot;) %&gt;%\n  filter(value &gt; 0) %&gt;%\n  count(season, episode, title, category) %&gt;%\n  spread(category, n) %&gt;%\n  mutate_at(vars(-season, -episode, -title), ~if_else(is.na(.), 0, 1))\n\ntitles &lt;- painting_categories[[&quot;title&quot;]]\n\n\nincidence_mat &lt;-\n  painting_categories %&gt;%\n  select(-season, -episode, -title) %&gt;%\n  as.matrix()\nrownames(incidence_mat) &lt;- titles\n\nepisode_x_episode &lt;- incidence_mat %*% t(incidence_mat)\n\n\nep_x_ep_graph &lt;-\n  graph_from_adjacency_matrix(\n    episode_x_episode,\n    # only look at the upper part of the matrix since it is symetrical\n    mode = &quot;upper&quot;,\n    # the connections are weighted\n    weighted = TRUE,\n    # don&#39;t count self-loops\n    diag = FALSE)\n\n# add in season as an attribute of each episode\nvertex_attr(ep_x_ep_graph, &quot;season&quot;) &lt;- painting_categories[[&quot;season&quot;]]</code></pre></div>\n<p>Now to plot! I’m going to build these plots with a cutpoint though, because otherwise they become very unweildy.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">ep_x_ep_graph %&gt;%\n  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &lt;= 2)) %&gt;%\n  delete_vertices(., which(igraph::degree(.) == 0)) %&gt;%  \n  ggnetwork(weight = &quot;weight&quot;) %&gt;%\n  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +\n  geom_edges(color = &quot;gray&quot;) +\n  geom_nodes(aes(color = season)) +\n  theme_blank() +\n  Friedman::scale_color_drexel(discrete = FALSE) +\n  labs(title = &quot;Paintings that share 3, 4, or 5 feature categories&quot;)</code></pre></div>\n<p>Above, I’ve colored nodes by season and only shown connections between episodes if those episodes share more than two classes of feature (e.g. two episodes have a sky feature, tree and plant feature, and a man-made feature).</p>\n<p>What happens if we filter to only show edges if two espodes share 4 features? 5?</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">ep_x_ep_graph %&gt;%\n  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &lt;= 3)) %&gt;%\n  delete_vertices(., which(igraph::degree(.) == 0)) %&gt;%  \n  ggnetwork(weight = &quot;weight&quot;) %&gt;%\n  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +\n  geom_edges(color = &quot;gray&quot;) +\n  geom_nodes(aes(color = season)) +\n  theme_blank() +\n  Friedman::scale_color_drexel(discrete = FALSE)  +\n  labs(title = &quot;Paintings that share 4 or 5 feature categories&quot;)</code></pre></div>\n<p>Looking at the paintings that share more than 3 features really brings about that there is a central group of paintings that all share a lot in common and then a few different groups of paintings that all have different things in common.</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">ep_x_ep_graph %&gt;%\n  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &lt;= 4)) %&gt;%\n  delete_vertices(., which(igraph::degree(.) == 0)) %&gt;%  \n  ggnetwork(weight = &quot;weight&quot;) %&gt;%\n  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +\n  geom_edges(color = &quot;gray&quot;) +\n  geom_nodes(aes(color = season)) +\n  theme_blank() +\n  Friedman::scale_color_drexel(discrete = FALSE) +\n  labs(title = &quot;Paintings that share 5 feature categories&quot;)</code></pre></div>\n<p>Now looking just at paintings that share 5 feature categories, it can really be seen that there is a central set of themes that is very common accross seasons. What are they?</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">episodes_of_interest &lt;-\n  ep_x_ep_graph %&gt;%\n  delete_edges(which(edge_attr(ep_x_ep_graph)$weight &lt;= 4)) %&gt;%\n  delete_vertices(., which(igraph::degree(.) == 0)) %&gt;%\n  vertex_attr(&quot;name&quot;)\n\npainting_categories %&gt;%\n  filter(title %in% episodes_of_interest) %&gt;%\n  mutate(feature_sum = aquatic + clouds + `general nature` +\n           `man made` + nature + sky + `trees and plants`) %&gt;%\n  filter(feature_sum &gt; 4) %&gt;%\n  summarize_at(vars(-season, -episode, -title, -feature_sum), sum)</code></pre></div>\n<p>All (or near all) of these paintings have an aquatic element, a general nature element, and trees and plants. So the real defining factor between the three groups in the plot above probably has to do with the other four categories. Next time I play with these data, I’ll look at that and do a deeper dive on components of the network of Bob Ross paintings.</p>","fields":{"slug":"/posts/happy-little-accidents","tagSlugs":["/tag/tidytuesday/","/tag/r-4-ds/","/tag/five-thirty-eight/","/tag/dplyr/","/tag/tidyr/"]},"frontmatter":{"date":"2019-08-05","description":"Exploring Data on Bob Ross paintings for Tidy Tuesday.","tags":["tidytuesday","r4ds","five_thirty_eight","dplyr","tidyr"],"title":"Happy Little Accidents","socialImage":null}}},"pageContext":{"slug":"/posts/happy-little-accidents"}}}