## 50 Shades of Grey According to R

I’ve been joking about R’s “200 shades of grey” on training courses for a long time. The popularity of the book “50 Shades of Grey” has changed the meaning of this statement somewhat. As the film is due to be released on Valentine’s Day I thought this might be worth a quick blog post.

Firstly, where did I get “200 shades of grey” from? This statement was originally derived from the 200 available named colours that contain either “grey” or “gray” in the vector generated by the colours function. As you will see there are in fact 224 shades of grey in R.

```greys <- grep("gr[ea]y", colours(), value = TRUE)

length(greys)

[1] 224```

This is because there are also colours such as slategrey, darkgrey and even dimgrey! So lets now remove anything that is more than just “grey” or “gray”.

```greys <- grep("^gr[ea]y", colours(), value = TRUE)

length(greys)

[1] 204```

So in fact there are 204 that are classified as “grey” or “gray”. If we take a closer look though its clear that there are not 204 unique shades of grey in R as we are doubling up so that we can use both the British, “grey”, and US, “gray”. This is really useful for R users not having to remember to change the way they usually spell grey/gray (you might also notice that I have used the function colours rather than colors) but when it comes to unique greys it means we have to be a little more specific in our search pattern. So stripping back to just shades of “grey”:

```greys <- grep("^grey", colours(), value = TRUE)

length(greys)

[1] 102```

we find we are actually down to just 102. Interestingly we don’t double up on all grey/gray colours, slategrey4 doesn’t exist but slategray4 does!

So really we have 102 shades of grey in R. Of course this is only using the named colours, if we were to define the colour using rgb we can make use of all 256 colour values!

So how can we get 50 shades of grey? Well the colorRampPalette function can help us out by allowing us to generate new colour palettes based on colours we give it. So a palette that goes from grey0 (black) to grey100 (white) can easily be generated.

```shadesOfGrey <- colorRampPalette(c("grey0", "grey100"))

[1] "#000000" "#FFFFFF"```

```fiftyGreys <- shadesOfGrey(50)

mat <- matrix(rep(1:50, each = 50))

image(mat, axes = FALSE, col = fiftyGreys)

box()```

I hear the film is not as “graphic” as the book – but hope this fits bill!

## Christmas starts earlier every year… right?

It is possible to download this information as a CSV file, which could then be read into R and we could have some fun with it. Luckily for us, there is an R package which makes getting this information into R even easier! gtrendsR provides an interface for retrieving this information provided by Google Trends. All the flexibility offered by the website is available via the package: we can look at trends for different areas, change the time range we see data for, even compare up to 5 search terms at once.

Sadly, as you can probably imagine Google Trends can only give us data going back so far, in this case being 2004. I’ve decided that we’ll focus on the last 10 years of data we can get from Google Trends; hopefully any patterns will present themselves over this period.

Some Trendy Data ScienceLet’s start by loading in the gtrendsR package and using its main function, `gtrends`, to pull the search pattern for ‘Christmas’ over the last 10 years. We pass the search term through as a string, to the argument `keyword`, while we define the time range we want to gather data for to the argument `time`. There are lots of predefined strings you can pass to `time`, whether you want the last hour, last day or previous week. If none of these capture what you’re after, you can define your own time range or set `time = "all"` to get all available data.

``````library(gtrendsR)
xmas <- gtrends(keyword = "Christmas",
time = "2009-01-01 2018-12-01")
class(xmas)``````
``## [1] "gtrends" "list"``

We see that `xmas` is an object of type `"gtrend" "list"`. It’s actually a list made up of seven data frames.

``names(xmas)``
``````## [1] "interest_over_time"  "interest_by_country" "interest_by_region"
## [4] "interest_by_dma"     "interest_by_city"    "related_topics"
## [7] "related_queries"``````

If we look at the `related_topics` data frame we see that these results are not just for explicitly searching “Christmas”. It also takes into account related topics, such as ‘Christmas Day’, ‘Gift’ and ‘Tree’.

``````library(dplyr)
``````##   subject related_topics            value   keyword
## 1     100            top    Christmas Day Christmas
## 2       8            top   Christmas tree Christmas
## 3       8            top             Tree Christmas
## 4       5            top             Gift Christmas
## 5       5            top  Christmas music Christmas
## 6       3            top Christmas lights Christmas``````

At the moment we’re interested in the `interest_over_time` data frame. `interest_over_time` provides us with the hits over time for the keyword provided. Remember, this is all relative to the peak popularity over the ten years we’ve pulled data for. The information from this data frame is what we see when we pass the `gtrends` object through the `plot` function.

``plot(xmas)``

Note that the intervals between the observations depend on the size of the time range you provide. Here we’ve provided a large time range of 10 years, which gives us a hits value for every month. This is okay when we’re looking at the data like this, but we want to compare the trend in more detail for each year, to see if it’s changed over the years. Hence we will pull the data for each year separately so that we get a hits value for each week. The downside is that now instead of being relative to the most popular point over the 10 years, the value of hits will only be relative to the most popular point in each year, which is always the observation closest to Christmas day. We can still use this to get a good idea of the trend over the year however, in particular the run-up to Christmas.

So does Christmas start earlier every year? First let’s pull the data for each full year individually, going back to 2009. We’ll do this using a nice for loop that calls `gtrends` for each time frame then `rbind` them all together.

``````dates <- c("2017-01-01 2017-12-31", "2016-01-01 2016-12-31", "2015-01-01 2015-12-31",
"2013-01-01 2013-12-31", "2012-01-01 2012-12-31", "2011-01-01 2011-12-31",
"2010-01-01 2010-12-31", "2009-01-01 2009-12-31")

allXmas <- data.frame(date = character(0),
hits = numeric(0),
keyword = character(0),
geo = character(0),
gprop = character(0),
category = character(0))

for(i in dates) {
trendData <- gtrends(keyword = "Christmas",
time = i)

allXmas <- rbind(allXmas,
trendData\$interest_over_time)
}``````

We need to create a few columns now, one which simply defines the year of that observation and another which details how far into the year we are, we’ll do this by obtaining the day of the year. The `year` function from lubridate makes extracting the year easy enough, while after a bit of Googling of my own I discover the function `strftime` which does the job for the day of the year.

``````library(lubridate)
allXmas <- mutate(allXmas, year = year(date),
doy = as.numeric(strftime(date, format = "%j")))

``````##                  date hits   keyword   geo gprop category year doy
## 1 2017-01-01 01:00:00   12 Christmas world   web        0 2017   1
## 2 2017-01-08 01:00:00    5 Christmas world   web        0 2017   8
## 3 2017-01-15 01:00:00    3 Christmas world   web        0 2017  15
## 4 2017-01-22 01:00:00    3 Christmas world   web        0 2017  22
## 5 2017-01-29 01:00:00    2 Christmas world   web        0 2017  29
## 6 2017-02-05 01:00:00    2 Christmas world   web        0 2017  36``````

Just what we were after.

Now we get to the visualisation, we’ll use ggplot to offer a bit more flexibility.

``````library(ggplot2)
ggplot(data = allXmas,
mapping = aes(x = doy, y = hits, colour = factor(year))) +
geom_line(size = 0.5)``````

Ah…. There doesn’t seem to be much difference there, let’s focus on the latter part of the year.

``````ggplot(data = allXmas,
mapping = aes(x = doy, y = hits, colour = factor(year))) +
geom_line(size = 0.5) +
xlim(200, 365)``````

We can see slightly more in this plot, still it’s hard to discern any differences between the escalation of Christmas hype from 2009 up to 2016. The only clear difference seems to be between 2017 and the other years, where the search trend for Christmas clearly increased sooner than for any other year. The fact that this is the most recent year may just be coincidence, we don’t see a clear scale where year upon year the searches for Christmas have increased earlier in the year.

With 2018 still being in the early stages of the festivities, it doesn’t make sense to pull its Christmas trend data through on its own: the hits value of 100 would be the most recent observation. Instead, we can pull it through alongside the 2017 trend, so we’ll get information relative to the volume of Christmas related searches in 2017.

``````twenty1718 <- gtrends(keyword = "Christmas",
time = "2017-01-01 2018-12-01")
overTime <- mutate(twenty1718\$interest_over_time,
year = year(date),
doy = as.numeric(strftime(date, format = "%j")))

ggplot(data = overTime,
mapping = aes(x = doy, y = hits, colour = factor(year))) +
geom_line(size = 1)``````

The popularity of Christmas this year appears to be increasing at an almost identical rate to last year up to the start of December.

So unfortunately just looking at the visualisations of the trends we can’t deduce much about whether Christmas does come earlier every year. When we looked at the trends from 2009 up to 2017 it did appear as if 2017 had an earlier build up to Christmas and 2018 so far is following the same pattern. If you felt like Christmas started earlier this year and last then you were right! Maybe 2017 was the tipping point and every year from now will either follow the same pattern or the trend will start even earlier? We’ll have to check back in a few years to see how future Christmas’ panned out.

Luckily we don’t have to come away from this analysis empty-handed, we can have a mess around with a few other features gtrendsR has to offer!

Location, Location, LocationAnother great feature of Google Trends is the ability to see how the popularity of a search term varied for a particular location, as well as the ability to compare across multiple locations. For example, let’s compare the relative popularity of Christmas in the UK to the US. To select a particular location we will need its,`country_code` often a shortening of its name, which we then pass through the `gtrends` function to the argument `geo`. To obtain this code we can look it up in the `countries` dataset that comes with gtrendsR.

``````data("countries")
filter(countries, sub_code == "") %>%
``````##   country_code sub_code           name
## 1           AF             AFGHANISTAN
## 2           AL                 ALBANIA
## 3           DZ                 ALGERIA
## 4           AS          AMERICAN SAMOA
## 6           AO                  ANGOLA``````
``filter(countries, name %in% c("UNITED KINGDOM", "UNITED STATES"))``
``````##   country_code sub_code           name
## 1           GB          UNITED KINGDOM
## 2           US           UNITED STATES``````

The code for the United States is “US” as expected, while the code for the UK is actually “GB”. Next job is to pull the data for each country at the same time so we have a direct comparison. We’re going to look at the trends over 2017, as we’re only looking at one year now we don’t need to create a variable for ‘day of the year’ this time.

``````byCountry <- gtrends(keyword = "Christmas",
time = "2017-01-01 2017-12-31",
geo = c("US", "GB"))

ggplot(data = byCountry\$interest_over_time,
mapping = aes(x = date, y = hits, colour = factor(geo))) +
geom_line(size = 1)  ``````

Looks like Christmas is a bigger thing here in the UK than over in America. Weirdly when the search frequency for ‘Christmas’ peaks in the US, its popularity in the UK has already started to decrease. Not only is there a higher peak in the use of ‘Christmas’ as a search term but the increase in popularity also begins sooner in the year.

It’s easy to see how knowing this could make a marketing campaign for Christmas much more efficient. For example, if an ad campaign based around Christmas was released at the start of October, then in theory, this should receive a lot more interest in the UK than in the US. A staggered release could mean that the campaign is most prominent when the sharpest increase in interest in Christmas is happening in both the UK and America.

One possible explanation of the difference in Christmas search trends on either side of the Atlantic could be other holidays, Halloween and Thanksgiving are more popular in America as we see here.

``````otherHols <- gtrends(keyword = c("Halloween", "Halloween", "Thanksgiving", "Thanksgiving"),
time = "2017-01-01 2017-12-31",
geo = c("US", "GB", "US", "GB"))

otherHols <- otherHols\$interest_over_time %>%
mutate(hits = as.numeric(ifelse(hits == "<1", 0, hits)))

ggplot(data = otherHols,
mapping = aes(x = date, y = hits, colour = geo,
linetype = keyword)) +
geom_line(size = 1)``````

Note here that we have to do a bit of juggling; anything with a hits value of less than 1 is reported as “<1”, but as this is a string R then takes `hits`to be a categorical variable rather than numerical variable. So we convert all instances of “<1” to 0 and explicitly tell R that `hits` is numeric.

Halloween is only marginally more searched in the US than the UK, Thanksgiving is where the main difference can be found. Happening on the fourth Thursday of November every year Thanksgiving could be a large part of the reason why there is a delay in the interest in Christmas in the States. Another event which isn’t exactly a holiday but goes hand in hand with Thanksgiving is ‘Black Friday’. Taking place the Friday after Thanksgiving this shopping spectacular is growing in popularity (you only have to look at its Google Trend to see that).

``````bf <- gtrends(keyword = "Black Friday")

bf <- bf\$interest_over_time %>%
mutate(hits = as.numeric(ifelse(hits == "<1", 0, hits)))

ggplot(data = bf,
mapping = aes(x = date, y = hits)) +
geom_line(size = 1, colour = "red")``````

Searches for ‘Black Friday’ doubled from 2014 to 2017, but is this a distraction from Christmas or an event which makes us think about our Christmas shopping earlier? Let’s compare the search trends for Christmas, Thanksgiving, Halloween and Black Friday, splitting by location.

``````#First pull the data for the UK
allEventsUK <- gtrends(keyword = c("Christmas", "Thanksgiving",
"Halloween", "Black Friday"),
time = "2017-01-01 2017-12-31",
geo = "GB")

allEventsUK <- allEventsUK\$interest_over_time %>%
mutate(hits = as.numeric(ifelse(hits == "<1", 0, hits)))

# Then for the US
allEventsUS <- gtrends(keyword = c("Christmas", "Thanksgiving",
"Halloween", "Black Friday"),
time = "2017-01-01 2017-12-31",
geo = "US")

allEventsUS <- allEventsUS\$interest_over_time %>%
mutate(hits = as.numeric(ifelse(hits == "<1", 0, hits)))

# Combine UK and US data
allEvents <- rbind(allEventsUK,
allEventsUS)

# Plot
ggplot(data = allEvents,
mapping = aes(x = date, y = hits, colour = keyword)) +
geom_line(size = 1) +
facet_wrap(~ geo) +
scale_x_datetime(limits = c(as.POSIXct("2017-07-01"), NA))``````

There are lots of differences between search patterns in the UK and US. The main difference in a single holiday being with Thanksgiving, more searched than Halloween in America but barely featured on the UK plot. Surprisingly Black Friday was searched more at its peak than Christmas in the US, although it is a sharper increase and decrease than the gradual rise of Christmas. In both plots, we find an increase in searches relating to Christmas as the popularity of Halloween decreases. This is one of the two sharpest increases in Christmas searches in the UK, with the other occurring as searches for Black Friday decreased.

The UK seems to show that we don’t focus on more than one holiday at once, with Christmas searches plateauing during peak periods for Halloween and Black Friday. Only to rise again once these holidays were over. If the popularity of Black Friday continues to increase at its current pace it will be interesting to see the effect this has on Christmas. From an earlier plot we know that Christmas is less popular at its peak in America than the UK, maybe this is related to the difference in Black Friday popularity in each country. Will Black Friday continue its rapid rise in the UK and result in the overall popularity of Christmas decreasing?

All I want for Christmas is… some more data! Currently, it’s hard to determine whether any of these possible patterns will continue, at the moment 2017 appears to be the anomaly. Confirming any of our hunches will take a few more years of closely monitoring the Google Trends.

I would definitely recommend checking out Google Trends and gtrendsR, it’s not just for Christmas (data). Maybe you’ll be able to make some more solid conclusions!

Witten in 2018

## Strictly hypothesis testing

I’ll get straight to it. I love Strictly. You may know it by “Dancing with the Stars”, “Bailando por un Sueño”, or “Danse avec les stars”. I resisted for a decade, but for the last two years I’ve been fully seduced by the big glitter ball and haven’t missed an episode.

Now as a fully qualified arm-chair critic, I think I’ve noticed that to get a good score on Strictly you just need to do the Charleston. It hides a multitude of sins. You sort of jump about a bit, do something called “swivel”, and fall over at the end. Then you get a minimum 35. Job done.

My family disagree, so I’ve set about to test my hypothesis. The great news is that this is really easy to do with R. In fact, this whole post uses no more R than you’d find in an introduction to R course. My family can’t wait to find out the results. Wherever they went.

So let’s get started and finally put this to rest. Base R has everything we need but we’ll get there quicker with the tidyverse. As most of my notebooks now start, let’s load the main tidyverse packages.

`library(tidyverse) # I've set message=FALSE everywhere`

## The Data

We’re lucky that someone at www.ultimatestrictly.com has been keeping track of every dance that’s ever happened on strictly. And we’re even luckier that they have made it available as a CSV download. It only goes up to 2016 but that’s still 14 series so it should be enough.

With the URL all we need are a couple of tweaks to the default `read_csv` to account properly for an unusual NA string and some sparse columns that guess the data type incorrectly.

```url <- "https://www.ultimatestrictly.com/s/SCD-Results-S14.csv"
raw_results <- read_csv(file = url, na = c("-"), guess_max = 10000)

knitr::kable() # pretty markdown tables```
Couple Dance Song Series Week Order Craig Arlene Len Bruno Alesha Darcey Jennifer Donny Total
Natasha & Brendan Cha cha cha Chain Of Fools 1 1 1 5 7 8 7 NA NA NA NA 27
Lesley & Anton Waltz He Was Beautiful 1 1 2 6 8 8 7 NA NA NA NA 29
Chris & Hanna Cha cha cha Lady Marmalade 1 1 3 4 4 7 4 NA NA NA NA 19
Jason & Kylie Waltz Three Times A Lady 1 1 4 5 5 6 5 NA NA NA NA 21
Verona & Paul Cha cha cha R.E.S.P.E.C.T 1 1 5 7 6 7 7 NA NA NA NA 27
Claire & John Waltz Unchained Melody 1 1 6 7 7 8 5 NA NA NA NA 27

We get every dance, for every week in every series, and individual judges scores as well as the total score.

Let’s start with a little light cleaning. The formatting on the dances is a bit inconsistent. For example we have `Cha cha cha`and `Cha Cha Cha`. Most of it can be fixed by forcing the dance names to be a consistent case. I like the look of title case so we’ll use stringr’s `str_to_title`.

```dances <- raw_results %>%
mutate(Dance = str_to_title(Dance))

dances %>%
select(Dance) %>%
knitr::kable()```
Dance
Cha Cha Cha
Waltz
Cha Cha Cha
Waltz
Cha Cha Cha
Waltz

My next step would usually be to gather this into a tidy long data frame, maybe separate the couples into celebrity and professional. But in this instance we only want the Dance and the Total, so we’ll leave it like it is.

## Top Dances

I’d like to know which dances get the best `Total` score. We can do this with a `group_by``summarise` combination.

```dances %>%
group_by(Dance) %>%
summarise(MeanScore = mean(Total, na.rm = TRUE),
Count = n()) %>%
``````## # A tibble: 3 x 3
##   Dance           MeanScore Count
##   <chr>               <dbl> <int>
## 1 American Smooth      31.6   102
## 2 Argentine Tango      35.7    43
## 3 Cha Cha              32       1
``````

I’m removing anything with less than 10 scores because these tend to be hybrid dances and one-offs.

`  filter(Count >= 10) %>%`

The Show Dance only appears in the final episode and is multi-style. Out it goes.

` filter(Dance != "Show Dance") %>%`

I don’t want any “hops” in there. The hops are those ones where everyone’s on the floor at the same time and you have no idea what’s going on but they all seem to be enjoying themselves. There’s only a Lindy Hop in this data but I don’t want any hops. Ever. No Hops!

`  filter(!str_detect(Dance, " Hop")) %>%`

Putting it all together we get our top dances.

```top_dances <- dances %>%
group_by(Dance) %>%
summarise(MeanScore = mean(Total, na.rm = TRUE),
Count = n()) %>%
filter(Count >= 10)  %>%
filter(Dance != "Show Dance") %>%
filter(!str_detect(Dance, " Hop")) %>%
arrange(desc(MeanScore))

knitr::kable(top_dances)```
Dance MeanScore Count
Argentine Tango 35.67442 43
Viennese Waltz 32.10000 90
Charleston 31.74324 74
American Smooth 31.57843 102
Quickstep 30.71538 130
Foxtrot 30.05738 122
Samba 29.46296 108
Paso Doble 29.21552 116
Tango 28.89552 134
Salsa 28.72549 102
Jive 28.26016 123
Rumba 28.17273 110
Waltz 27.93233 133
Cha Cha Cha 25.44805 154

And there we have it. I’m sort of right. The Charleston is top 3. And who doesn’t like an Argentine Tango? But there’s something about that count… What if they tend to only do the Charleston in later weeks when they’re a bit better?

We know that the scores go up by week. In fact it’s pretty linear.

```dances %>%
ggplot(aes(x = Week, y = Total, group = Week)) +
geom_boxplot()```

Let’s do the same again and add average week. This will give the mean week in which each dance appears. A bigger number will mean they tend to appear later.

```top_dances_week <- dances %>%
group_by(Dance) %>%
summarise(MeanScore = mean(Total, na.rm = TRUE),
MeanWeek = mean(Week, na.rm = TRUE),
Count = n()) %>%
filter(Count >= 10)  %>%
filter(Dance != "Show Dance") %>%
filter(!str_detect(Dance, " Hop")) %>%
arrange(desc(MeanScore))

Dance MeanScore MeanWeek Count
Argentine Tango 35.67442 10.023256 43
Viennese Waltz 32.10000 6.911111 90
Charleston 31.74324 6.891892 74
American Smooth 31.57843 7.333333 102
Quickstep 30.71538 5.823077 130

Rats. My hypothesis is in trouble. The Charleston does indeed have a high `MeanWeek` which means it appears later in the series when the scores are higher. My children have long since gone to bed so let’s take it up a notch and build a statistical model that can account for `Week` and `Dance` at the same time.

## Linear Model

First, we’ll take the individual dance level data and reuse the filtering for the `top_dances`.

```main_dances <- dances %>%
filter(Dance %in% top_dances\$Dance)```

We want a simple linear model that accounts for the increasing score by week and the effect we’re interested in – the dance.

```fit <- lm(Total ~ Week + Dance, data = main_dances)
summary(fit)```
``````##
## Call:
## lm(formula = Total ~ Week + Dance, data = main_dances)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -29.9515  -3.4593   0.4912   3.5637  19.4965
##
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)          23.37653    0.59660  39.183  < 2e-16 ***
## Week                  1.11844    0.04051  27.607  < 2e-16 ***
## DanceArgentine Tango  1.08747    0.95628   1.137  0.25564
## DanceCha Cha Cha     -1.87934    0.68459  -2.745  0.00612 **
## DanceCharleston       0.65854    0.79808   0.825  0.40941
## DanceFoxtrot          0.16272    0.70368   0.231  0.81716
## DanceJive            -1.25416    0.70372  -1.782  0.07492 .
## DancePaso Doble      -1.22840    0.71043  -1.729  0.08399 .
## DanceQuickstep        0.82609    0.69385   1.191  0.23400
## DanceRumba           -1.78227    0.72064  -2.473  0.01350 *
## DanceSalsa           -1.36169    0.73365  -1.856  0.06364 .
## DanceSamba           -1.83586    0.72150  -2.545  0.01104 *
## DanceTango           -0.13999    0.69274  -0.202  0.83988
## DanceViennese Waltz   0.99380    0.75585   1.315  0.18877
## DanceWaltz           -0.03570    0.70003  -0.051  0.95934
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.225 on 1526 degrees of freedom
## Multiple R-squared:  0.3969, Adjusted R-squared:  0.3914
## F-statistic: 71.75 on 14 and 1526 DF,  p-value: < 2.2e-16
``````

After removing the strong effect of week, there’s not much left of significance. It can be tempting to cherry pick individual levels but the only one with much signal is the Cha Cha Cha. Ultimate Strictly describes it as “A cheeky, fun dance, but rarely a show stopper”.

## Conclusion

Unfortunately it looks like the evidence doesn’t support my hypothesis that the Charleston is more favourably marked. Maybe all that swivelling is harder than it looks. Faye and Giovanni did undeniably do an excellent job with their, perfect scoring, Sound of Music Charleston. What we did find is that the Cha Cha Cha might score lower. Although that’s most likely because it’s seen as an easier dance for, well, People like Quentin.

I’m sorry Quentin. I would fare no better.

Well seeing as I’m wrong, and everyone else is asleep. Let’s just pretend this never happened.

## Encouraging Women in Data

In support of encouraging diversity within data science, we proudly profiled three of our own team members at Mango in the lead up to the Women in Data conference last week. All from varying backgrounds, they each arrived at their journey into analytics with varying skillsets and experience, yet are all incredibly inspiring in their own way.

Last week I had the pleasure of attending the very busy and successful Women in Data conference. As resourcing manager for Mango I was keen to learn more about like-minded women in this sector and their reasons for attending this conference.

Through my series of chats, I had the pleasure of meeting the best of women-kind. An incredibly inclusive bunch who were there to inspire and be inspired, to encourage and be encouraged, to learn, to teach, to listen and to talk.

Amongst the popular reasons to attend was to find out what other women in the field were doing and how they were managing their own learning and development journeys. To have the headspace, take a step back and evaluate or benchmark their work and discuss common themes or concerns. Some were working within smaller organisations or teams without role models and so came looking for them.

The highlight of the event for many seemed to be Helen Hunter’s talk. It was her natural humanity and humility that inspired so many. Despite giving a hugely motivational talk, we loved her honesty in admitting that she faced days when she felt she wasn’t good enough. Hugely resonating for many, this offered such comfort to the audience.

Another highlight was the opportunity to network. Women love to share and compare and Women in Data was no exception to this, offering an inclusive, non-threatening environment. Interestingly, I spoke to three women who told me it was their time to give back. They’d always had advocates in their careers, they’d had to be resilient to get where they were and they started to think about what was important to them. They’d specifically come to try and help the next generation of women, supporting them in their data careers.