Blogs home

We are excited to announce the speakers for this year’s EARL London Conference!

Every year, we receive an immense number of excellent abstracts and this year was no different – in fact, it’s getting harder to decide. We spent a lot of time deliberating and had to make some tough choices. We would like to thank everyone who submitted a talk – we appreciate the time taken to write and submit; if we could accept every talk, we would.

This year, we have a brilliant lineup, including speakers from Auto Trader, Marks and Spencer, Aviva, Hotels.com, Google, Ministry of Defence and KPMG. Take a look below at our illustrious list of speakers:

Full length talks
Abigail Lebrecht, Abigail Lebrecht Consulting
Alex Lewis, Africa’s Voices Foundation
Alexis Iglauer, PartnerRe
Amanda Lee, Merkle Aquila
Andrie de Vries, RStudio
Catherine Leigh, Auto Trader
Catherine Gamble, Marks and Spencer
Chris Chapman, Google
Chris Billingham, N Brown PLC
Christian Moroy, Edge Health
Christoph Bodner, Austrian Post
Dan Erben, Dyson
David Smith, Microsoft
Douglas Ashton, Mango Solutions
Dzidas Martinaitis, Amazon Web Services
Emil Lykke Jensen, MediaLytic
Gavin Jackson, Screwfix
Ian Jacob, HCD Economics
James Lawrence, The Behavioural Insights Team
Jeremy Horne, MC&C Media
Jobst Löffler, Bayer Business Services GmbH
Jo-fai Chow, H2O.ai
Jonathan Ng, HSBC
Kasia Kulma, Aviva
Leanne Fitzpatrick, Hello Soda
Lydon Palmer, Investec
Matt Dray, Department for Education
Michael Maguire, Tusk Therapeutics
Omayma Said, WUZZUF
Paul Swiontkowski, Microsoft
Sam Tazzyman, Ministry of Justice
Scott Finnie, Hymans Robertson
Sean Lopp, RStudio
Sima Reichenbach, KPMG
Steffen Bank, Ekstra Bladet
Taisiya Merkulova, Photobox
Tim Paulden, ATASS Sports
Tomas Westlake, Ministry Of Defence
Victory Idowu, Aviva
Willem Ligtenberg, CZ

Lightning Talks
Agnes Salanki, Hotels.com
Andreas Wittmann, MAN Truck & Bus AG
Ansgar Wenzel, Qbiz UK
George Cushen, Shop Direct
Jasmine Pengelly, DAZN
Matthias Trampisch, Boehringer Ingelheim
Mike K Smith, Pfizer
Patrik Punco, NOZ Medien
Robin Penfold, Willis Towers Watson

Some numbers

We thought we would share some stats from this year’s submission process:


This is based on a combination of titles, photos and pronouns.

Agenda

We’re still putting the agenda together, so keep an eye out for that announcement!

Tickets

Early bird tickets are available until 31 July 2018, get yours now.

Blogs home Featured Image

This year at Mango we’re proudly sponsoring the Bath Cats & Dogs Home. To start our fundraising for them, we decided to run a sweepstake on the Grand National. We asked for £2 per horse, which would go to the cats and dogs home and the winner was promised a bottle of wine for their charitable efforts.

Working in a Data Science company I knew that I couldn’t simply pick names out of a hat for the sweepstake, ‘That’s not truly random!’ they would cry. So in my hour of need, I turned to our two university placement students Owen and Caroline to help me randomise the names in R.

Non-starter

To use an appropriate horse-based metaphor, I would class myself as a ‘non-starter’ in R – I’m not even near the actual race! My knowledge is practically non-existent (‘Do you just type alot of random letters?’) and up until this blog I didn’t even have RStudio on my laptop.

The first hurdle

We began by creating a list of the people who had entered the sweepstake. With some people betting on more than one horse their name was entered as many times as needed to correlate to how many bets they had laid down.

people_list <- c("Matt Glover", "Matt Glover", "Ed Gash",
                 "Ed Gash", "Ed Gash", "Lisa S", "Toby",
                 "Jen", "Jen", "Liz", "Liz", "Andrew M",
                 "Nikki", "Chris James", "Yvi", "Yvi",
                 "Yvi", "Beany", "Karina", "Chrissy", "Enrique",
                 "Pete", "Karis", "Laura", "Ryan", "Ryan", "Ryan",
                 "Ryan", "Ryan", "Owen", "Rich", "Rich", "Matt A",
                 "Matt A", "Matt A", "Matt A", "Matt A", "Matt A", 
                 "Matt A", "Matt A")

I had now associated all the names with the object called people_list. Next I created an object that contained numbers 1-40 to represent each horse.

horses_list <- 1:40

With the two sets of values ready to go, I wanted to display them in a table format to make it easier to match names and numbers.

assign_horses <- data.frame(Runners = horses_list, People = people_list)
head(assign_horses)

##   Runners      People
## 1       1 Matt Glover
## 2       2 Matt Glover
## 3       3     Ed Gash
## 4       4     Ed Gash
## 5       5     Ed Gash
## 6       6      Lisa S

Now the data appeared in a table, but had not been randomised. To do this I used the sample function to jumble up the people_list names.

assign_horses <- data.frame(horses_list, sample(people_list))

Free Rein

Success! I had a list of numbers (1-40) representing the horses and a randomly jumbled up list of those taking part in the sweepstake.

At the time of writing (In RMarkdown!), unfortunately fate had randomly selected me the favourite to win. As you can imagine, this is something that will not make you popular in the office.

My First Trot

I hope you enjoyed my first attempt in R. I will definitely use it again to randomise our next sweepstake, though under intense supervision. I can still hear the cries of ‘FIX!’ around the office. It’s always an awkward moment when you win your own sweepstake…

Despite the controversy, it was fun to try out R in an accessible way and it helped me understand some of the basic functions available. Perhaps I’ll sit in on the next LondonR workshop and learn some more!

If you’d like to find out more about the Bath Cats & Dogs Home please visit here.

Blogs home Featured Image

This article was first published on Nic Crane’s Blog and kindly contributed to the Mango Blog.

I’m going to begin this post somewhat backwards, and start with the conclusion: tidy eval is important to anyone who writes R functions and uses dplyr and/or tidyr.

I’m going to load a couple of packages, and then show you exactly why.

library(dplyr)
library(rlang)

Data wrangling with base R

Here’s an example function I have written in base R. Its purpose is to take a data set, and extract values from a single column that match a specific value, with both input and output both being in data frame format.

wrangle_data <- function(data, column, val){

  data[data[[column]]==val, column, drop = FALSE]

}

wrangle_data(iris, "Species", "versicolor") %>%
  head()
##       Species
## 51 versicolor
## 52 versicolor
## 53 versicolor
## 54 versicolor
## 55 versicolor
## 56 versicolor

It works, but it’s not great; the code is clunky and hard to decipher at a quick glance. This is where using dplyr can help.

Data wrangling with dplyr

If I was to run the same code outside of the context of a function, I might do something like this:

one_col <- select(iris, Species)
filter(one_col, Species == "versicolor")  %>%
  head()
##      Species
## 1 versicolor
## 2 versicolor
## 3 versicolor
## 4 versicolor
## 5 versicolor
## 6 versicolor

This has worked, but how can we turn this into a function?

I may naively attempt the solution below

wrangle_data <- function(data, column, val){
  one_col <- select(data, column)
  filter(one_col, column == val)
}

wrangle_data(iris, "Species", "versicolor")  %>%
  head()
## [1] Species
## <0 rows> (or 0-length row.names)

However, this doesn’t work and returns 0 rows. This is due to a special quirk of dplyr that makes typical usage easier, but we need to be aware of when writing functions. This snippet from the programming vignette in dplyr explains it best:

“Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that means they don’t follow the usual R rules of evaluation. Instead, they capture the expression that you typed and evaluate it in a custom way. This has two main benefits for dplyr code:

  • Operations on data frames can be expressed succinctly because you don’t need to repeat the name of the data frame. For example, you can write filter(df, x == 1, y == 2, z == 3) instead of df[df\(x == 1 & df\)y ==2 & df$z == 3, ].
  • dplyr can choose to compute results in a different way to base R. This is important for database backends because dplyr itself doesn’t do any work, but instead generates the SQL that tells the database what to do.”

In other words, because dplyr functions evaluate things differently to base R, by using a concept called quoting, we have to work with them a bit differently. I’d recommend checking out this RStudio webinar and the dplyr programming vignette for more detail.

Let’s go back to the previous example.

wrangle_data <- function(data, column, val){
  one_col <- select(data, column)
  filter(one_col, column == val)
}

wrangle_data(iris, "Species", "versicolor")  %>%
  head()
## [1] Species
## <0 rows> (or 0-length row.names)

This doesn’t work as select and filter are looking for a column called “column” in their inputs, and failing to find them. Therefore we must use tidy evaluation to override this behaviour.

Tidy evaluation with dplyr

Using the !! (bang bang) operator and sym functions from the rlang package, we can change this behaviour to make a version of our function which will run.

wrangle_data <- function(x, column, val){

  one_col <- select(x, !!sym(column))
  filter(one_col, !!sym(column) == val)
}

wrangle_data(iris, "Species", "versicolor") %>%
  head()
##      Species
## 1 versicolor
## 2 versicolor
## 3 versicolor
## 4 versicolor
## 5 versicolor
## 6 versicolor

I’m not going to go into detail about these functions here, but if you want more information, check out my blog posts on using tidy eval in dplyr here and here.

In conclusion, whilst tidy eval is not necessary for all uses of dplyr or tidyr, it quickly becomes an extremely handy tool when working with these packages within the context of function. There are some great resources about tidy eval out there, and as ever, I welcome feedback on this blog post via Twitter.