Clay’s Midterm

This is a graph of the area of the artwork based on medium — I could not embed without spending money. The Y axis is in mm squared.

The Tate, one of the world’s foremost art institutions, has amassed a vast collection of artworks over the years. With a collection of around 70,000 artworks, including those jointly owned with the National Galleries of Scotland as part of ARTIST ROOMS, it’s no surprise that they have a wealth of data related to these artworks. However, making sense of this data is not always an easy task, which is where the power of technology and data science comes in.

As an artist myself, I was curious to explore the relationship between an artwork’s medium and its size. I wanted to use data to answer the question of whether an artist’s chosen medium influenced the size of their work. To do this, I used a combination of R, OpenRefine, and Excel to clean the data and create a proxy that could be used to explore the relationship between medium and area.

Initially, I encountered the challenge that the majority of the artworks did not include depth data. Therefore, I chose to focus solely on the area (length times width) of each artwork, ignoring depth altogether. I used Excel to delete any columns that were missing data in either the area or medium columns, which helped to ensure that the resulting dataset was as clean and reliable as possible.

To make the medium data more manageable, I used Excel to extract only the first word from the medium column, which acted as a better proxy than the original of more descriptive medium labels. I also used OpenRefine to group together similar mediums, dropping the comma at the end of words in the process. By doing so, I was able to create a more manageable set of data that was easier to analyze.

I then used Excel to create an area column by multiplying the height and width columns. I was then able to use this data to create a scatter plot using RawGraphs.io, which showed the most common mediums and their corresponding area counts. Interestingly, I found that there was less of a relationship between an artwork’s medium and its size than I had initially anticipated. This could be due to the fact that I did not spend as much time sifting through the mediums and my failure to use a larger graph to get a more comprehensive count of each type. Additionally, the fact that everything was in millimeters squared on the graph may have made it difficult to interpret the data effectively. Maybe I could have translated this to a better unit.

Despite these limitations, I was able to gain some interesting insights from their data analysis. For example, I noted that the relationship between medium and size might be more pronounced in artworks that are primarily sculptures rather than drawings. I also acknowledged that the inclusion of data related to vehicles, despite there being relatively few examples, helped to illustrate the idea they were exploring.

Overall, this statement highlights the power of data science to make sense of large, complex datasets. By using a combination of tools and techniques, I was able to explore an interesting question related to the relationship between an artwork’s medium and its size. While my analysis was not without limitations, it provides a valuable starting point for further exploration of this topic. This was very different from my data science class as I had to also utilize digital humanities tools to get a grasp of the larger picture. Looking back maybe I picked something that needed more manual work but I think trying things out is part of the process and I was able to get my idea across.

Sources: https://github.com/tategallery/collection/blob/master/artwork_data.csv