Running in patterns

Heatmaps are an incredibly useful tool and ingeneous way of visualising and displaying data and relationships between columns and rows, whatever they may be. Basically, “a heat map is is a graphical representation of data where the individual values contained in a matrix are represented as colors”.

I first came across “heatmaps” in my studies in Molecular Biology at the University of Cape Town where they are very common to show gene expression of microarray data under various conditions and help identify clusters and patterns. For example, see the image below with the typical red and green colour scale used.

Heatmap
Graph was generated by Miguel Andrade with data extracted from the StemBase database of gene expression data.

The term “Heatmap” is quite new, first coined in 1991 to describe a 2D display of some financial market data, but the idea of “shading matrices” has been around for over a century. I was interested in these earlier uses and did a bit of digging (with the help of Sci-Hub to get access to knowledge that should be freely and publicly accessible!), and found the references in Wilkinson & Friendly’s “The history of the cluster heatmap“.

One of the oldest examples if by Loau (1873) to show social statistics (such as national origin, professions, age, social classes) across the districts of Paris.  It was coloured by hand, using a color scale from white (low) through yellow and blue to red (high). How beautiful!

loua
Shaded matrix from Loua (1873).

Another example from the above paper that I found particularly interesting was the “Ten test for efficiency” from Brinton (1914) to measure and rank the school systems in the U.S. states in 1910, based on ten educational features. As always, a sombre reminder of how depsite the rapid advancement of everything else, our educational systems and constructs, and measures thereof, are mostly as they were over a 100 years ago!

tentests
Ranked, shaded display from Brinton (1914).

I have used heatmaps extensively in my work at Siyavula Education in my role in Learning Research and Analytics. And unlike the previous plots, I didn’t do these by hand! There are now many ways to automatically and efficiently generate heatmaps. But, they still require some creativity, espeically when choosing colour scales and ranges, which I also really enjoy experimenting with, depending on what you want to highlight and draw attention to. I predominantly use R programming language and several packages, mostly ggplot, but also base R’s heatmap function.

Some of the ways in which I’ve used heatmaps are:

  • to investigate and understand patterns in user behaviour, for example to track the developmental process of learners and how their use of mobile learning technology during the hour of the day and time of week changes as they get older;
  • for large scale monitoring of policy implementation, for example to assess curriculum progression over the course of the year and whether learners and teachers are following the curriculum as it’s laid out, and if not, where do they make changes;
  • for business intelligence, such as monitoring and visualising trends in account creation and use of the service;
  • to assess the impact of certain campaigns, for example to get a highlevel overview of whether targeted text messages have an impact on user’s behaviour.

Something else I love to do is to run 🙂

And so recently I decided to download all my data from Garmin for the last 3 years (since I’ve had a GPS watch), and do some of my own exploratory data analysis and visualisation – mostly to make some heatmaps for myself and see what patterns I could see in my own running behaviour!

I also used this analysis as a case study for the last R-Ladies Cape Town meetup tht I ran. The meetup was mostly a tutorial about working with dates in R and using the very useful package lubridate, which I make use of in the data manipulation to create heatmaps. Here is a link to the tutorial I created and recently published as a R Markdown notebook on RPubs. I then also wanted to show some real world examples of working with data with dates and creating some visualisations, so I used my running data and also created this notebook published here. Ths is a snap shot in time of what I’ve done so far, and still a work in progress, and I definitely want to do some more soon! All of the code and source data can be found in a repo on my Github account here.

First up, when do I like to run during the week? As can be seen in the following graph where the darker purple-red shows more runs in that hour of the day, I seem impartial to a particular preference for morning or evening runs during the week, but weekend mornings are definitely time for a run. The two hot spots are Tuesday and Friday mornings at 6am and these are probably my most regularly atteneded sessions over the last few years, which relate to a Tuesday track session at Green Point Athletics Stadium with the Atlantic Triathlon Club and my social Friday morning run along Sea Point Promenade with the our Red Sock Friday group. Find out more about the Red Sock story and Shoooops! here, and if you ever happen to be in Cape Town on a Friday, come join us here, followed by a coffee afterwards at Caffe Neo 🙂

heatmap_weekly_counts

What about a weekly pattern in how far my average runs are at different times of the day and week? No surprises here, as seen in the following graph, as is probably a very common running pattern for most runners, with longer runs on the weekend! During the week I generally run in the suburbs, and Saturdays and Sundays are times to head out for a long run, either on the road, or somehwere on the trails in our beautiful Table Mountain National Park, depending on what I’m training for.

heatmap_weekly_dist

The last heatmap I’ve got here is displayed across the calendar months – this is a really useful dislay and you can see the script and how to generate it in my R notebook referenced earlier. I colour coded the following one according to the distance of each run on a particular day. So far, my longest run was the Two Oceans Ultra Marathon this year March. When I first saw this, what also struck me ws I could quite easily see the 4 time periods in the last 3 years when I’ve been injured and had to take some time off! Twice from spraining my left ankle whilst out on the trails (mid 2016 and beginning of this year), and then two other time periods when I had achilles and calf issues.

calendar_heatmap

I really enjoyed doing this analysis and will follow on with some more in future – it’s incredible how much data the device around your wrist can collect now. And whilst Strava does a pretty good job of visualising our activities and data, (and adding that social competitive edge!) it’s definitely fun to do some yourself and see if you run in patterns!

Photos from my recent trip to go run the Tour du Mont Blanc – magical!

“Running is the greatest metaphor for life, because you get out of it what you put into it.”
— Oprah Winfrey

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s