Blogs home Featured Image

A year before I sat down here and started writing this sentence, I was about three months into a year-long work placement at Mango. I loved what I was doing, I loved the people I was doing it with, and I was generally having a great time.

But at some point last spring, people started to ask me when I was leaving. I couldn’t tell whether it was because they knew that at some point I’d have to go back to university to complete my course, or because they’d had enough of me already and wanted me to go away. Either way, whenever I mentioned that I was thinking about taking advantage of what will probably be the last long summer holiday of my life, everyone told me the same thing: I’d be a fool not to.

Therefore after months of meticulous planning, early one mid-July morning myself and a friend – summoning as much 18th-century spirit as possible – set off to complete a Grand Tour of the continent.

Over the course of 55 days, we visited 22 countries, we covered over 6000 miles of European highway, and due to the fact that I was very busy being on holiday, I wrote precisely 0 lines of code.

So I’m not writing about some cool project I’ve done, or some amazing new tech I’ve been researching, or really anything at all to do with code or a computer. Sorry. I suppose this piece should really be called “some things I learned which definitely have completely nothing at all to do with my job”.

Some things I learned which definitely have completely nothing at all to do with my job

1. Head for high ground

This is probably what you learn on day one in Army Commander School.

You’re in charge. The furious battle is raging on all sides. Suddenly, you realise that you are in serious danger of being completely overwhelmed. This is a good time to employ a tactic commonly referred to as “running away”.

But if this is a battle you want (or need!) to win, it’s probably not a good idea to run away forever. Your opponent isn’t going to hang around for a while wondering where you’ve gone, and then just decide “actually, yeah, fair enough, we lost, never mind”.

Instead, you should run away to somewhere nearby, but as high up as possible. This gives you a chance to widen your view: you can see where you’ve been, where you want to go, what’s going on right now, and how those three things relate to each other. After assessing the situation from this elevated position, it is much easier to see what needs to be done and to refocus your efforts accordingly.

If you want to be a top-level Army Commander one day, you can learn more from this. On your next conquest (or, in my case, unfamiliar city) find that high ground and go on a quick reconnaissance mission as early as possible: identify your goals, think about the best way to get to them, and scan the horizon for any threats (or scary grey clouds) which might be on their way towards you.

Once you have the high ground, it’s over

2. Record what you’ve done

On the whole, we humans are pretty smart. We’re good at figuring out how to do stuff, and once we’ve figured out what to do, we’re good at actually doing it.

Having said that, the same is true of other primates. And crows. And dolphins. And beavers. And so on, and so on.

The reason why we are smarter is that we have an awesome thing called “language”. Language lets us share our ideas and our experiences with other humans, so that they don’t have to come up with the same ideas or go through the same experiences in order to have the same knowledge.

Even better: at some point a few thousand years ago, someone figured out how to convert language into something physical. As a result, those of us who are alive right now have access to virtually all the knowledge developed by all of humankind since that point.

SO WHY YOU NO USE IT? Write down everything! Write down what you’ve done, and how you’ve done it, and why you’ve done it, and why you’ve done it like that, and everything that went wrong before you got it right, and everything you think it could lead to.

Do it for yourself, in anticipation of the moment when in six months’ time you realise you’ve forgotten where you were or who you were with or what the name of that street was.

Do it for other people, so that they don’t have to drive round eastern Prague four times trying to find the car park which was marked in the wrong place on the map.

Do it for the people who will stumble across your hastily scrawled notes years from now and, with a sudden flash of inspiration, will use them as the foundation to build myriad new and wonderful things.

My memory is terrible, but I wrote down all the embarrassing stories so that they’ll never be forgotten

3. Respect experience

Asking questions is a really really good thing to do. It’s one of the best ways to learn about things and you should never be afraid to ask about something you don’t understand.

However, it’s important to remember one thing: “always ask” is not the same as “always ask right now”.

If someone with more experience than you tells you to do something, and if you know that there is almost certainly a good reason, then even if you don’t know what that reason is… you should probably do the thing.

Wait until the pressure has eased a bit before demanding an explanation. You should still ask for one, but perhaps when everyone’s a little bit less stressed.

4. Call a spade a spade

Names can be controversial.

Pavement or sidewalk? Biscuit or cookie? Dinner or tea or supper? Bun or bap or roll? GIF or GIF?

But there are some names that virtually everyone agrees on. In particular, this tends to happen if it is important that everyone agrees on the name.

For example, “police” is an important concept: it represents protection, order, assistance, and a bunch of other useful words. Pretty much all European languages have almost exactly the same spelling and pronunciation for “police” as English does.

How to say “police” in the 18 different European languages which we came across during our trip

This means that if you speak any one of these languages, you can travel to any place where they speak any one of the others; and even if you’re in an unfamiliar environment where your understanding is limited, you aren’t completely on your own. If you need help, you can yell “POLICE!”, and someone in a uniform will probably come running.

Unless you’re in Hungary, because Hungarian is very strange.

… actually, someone will come running even in Hungary, because virtually everyone speaks English as a second language. They have to, because very few people choose to learn it as a second language – Hungarian is only really spoken in Hungary, and as previously mentioned, it really is very strange. Nevertheless, it is the first language of around 12 million people, so there’s a reasonable chance that at some point you’ll need to find a friendly Hungarian to do some translation for you.

I suppose there are two points to take from this little section. Firstly, if you call things by more or less the same name as everyone else does, then this will usually help to improve shared understanding and will aid communication. Secondly, people who can speak multiple languages – and especially less widely-spoken languages – are super super valuable!

5. Call a spade a spade, but that doesn’t mean you should assume/demand that everyone else is going to call every single item in their toolshed by exactly the same names as you call all the things which you have in YOUR toolshed

Just to add an important caveat to the previous section: sure, it’s helpful if someone speaks the same language as you, and even more exciting if you realise they speak it with the same accent. But once you’ve traded your initial stories, that gets boring quite quickly.

Plus, you’re definitely going to struggle to make new friends if you go around loudly insisting that everyone speaks to you in your language, and getting angry or patronising people if they get something “wrong”. Socialise, compromise, learn.

6. New is often exciting, but exciting doesn’t have to be new

Humans have been around for a while now, which means we’ve already gone to most places. If you want to go somewhere no-one else has been before then your options are already fairly limited. If you add the complication of getting there in the first place, and the fairly high probability that you won’t find anything particularly interesting there anyway, then it begins to look like a bit of a daunting prospect.


You don’t have to go somewhere no-one else has been before. You can go to the same places and do the same things that someone else has already done, and as long as you’re enjoying yourself, it really doesn’t matter that someone has been there and done it before. There’s always a slightly different route to the next place, or a slightly different angle to view something from, or something to take inspiration from when you’re planning your next adventure.

Maybe one day in the future, you’ll decide that you want a bigger challenge. Then you can dust off your old maps and start thinking about making that expedition out into the middle of nowhere. But there are plenty of other wonderful places to go and things to do first – and honestly, some of those places really are well worth a visit.

If you see an awesome thing that someone else has already done, don’t be afraid to recreate it yourself (or to take photos of your friend recreating it)

7. Get out there and do stuff

There is so much out there.

No really, there is SO MUCH out there.

Go to places. Meet people. Talk to those people, then find more people. Read stuff, write stuff, look at things, show your friends, share opinions, debate stuff, be creative, demand feedback, ask questions, learn things, challenge yourself, pass on your passion, and while you’re busy doing all that never let anyone take away the thing that makes you you.

Go right now and carry on being awesome.

Blogs home

Another month, another sweepstake to raise money for the Bath Cats & Dogs home!

This time, we picked the Eurovision song contest as our sweepstake of choice. After enjoying my first experience of using R to randomise the names for the previous sweepstake I decided to give it another go, but with a few tweaks.


During my first attempt in R, issues arose when I had been (innocently!) allocated the favourite horse to win. I had no way to prove that the R code had made the selection, as my work was not reproducible.

So with the cries of “cheater!” and “fix!”” still ringing in my ears, we started by setting a seed. This meant that if someone else was to replicate my code they would get the same results; therefore removing the dark smudge against my good name.

At random I selected the number 6 at which to set my seed.


I next compiled my lists of people and Eurovision countries and associated them with correlating objects.

people_list <- c(
    "Andy M",
    "Matty G",
    "Matt A",
countries_list <- c(
    "Czech Rep",
    "The Netherlands",
    "United Kingdom"

Once I had the lists associated with objects, I followed the same steps as my previous attempt in R. I put both objects into data frames and then used the sample function to jumble up the names.

assign_countries <- data.frame(people = people_list,
                               countries = sample(countries_list))

Task complete!

Fate had delivered me Denmark, who were nowhere near the favourites at the point of selection. I sighed with relief knowing that I had no chance of winning again and that perhaps maybe now I could start to re-build my reputation as an honest co-worker...


Before I finished my latest foray into R, we decided to create a function for creating sweepstakes in R.

I was talked down from picking the name SweepstakeizzleR and decided upon the slightly more sensible sweepR.

I entered the desired workings of the function, which followed from the above work in R.

sweepR <- function(a, b, seed = 1234){
 data.frame(a, sample(b))

Once done, I could use my newly created function to complete the work I had done before but in a much timelier fashion.

sweepR(people_list, countries_list)

My very first function worked! Using a function like sweepR will allow me to reliably reproduce the procedures I need for whatever task I'm working on. In this case it has enabled me to create a successfully random sweepstake mix of names and entries.


With great relief Israel won Eurovision and I was very happy to hand over the prize to Amanda.

I really enjoyed learning a little more about R and how I can create functions to streamline my work. Hopefully another reason will come up for me to learn even more soon!

Blogs home Featured Image

This year at Mango we’re proudly sponsoring the Bath Cats & Dogs Home. To start our fundraising for them, we decided to run a sweepstake on the Grand National. We asked for £2 per horse, which would go to the cats and dogs home and the winner was promised a bottle of wine for their charitable efforts.

Working in a Data Science company I knew that I couldn’t simply pick names out of a hat for the sweepstake, ‘That’s not truly random!’ they would cry. So in my hour of need, I turned to our two university placement students Owen and Caroline to help me randomise the names in R.


To use an appropriate horse-based metaphor, I would class myself as a ‘non-starter’ in R – I’m not even near the actual race! My knowledge is practically non-existent (‘Do you just type alot of random letters?’) and up until this blog I didn’t even have RStudio on my laptop.

The first hurdle

We began by creating a list of the people who had entered the sweepstake. With some people betting on more than one horse their name was entered as many times as needed to correlate to how many bets they had laid down.

people_list <- c("Matt Glover", "Matt Glover", "Ed Gash",
                 "Ed Gash", "Ed Gash", "Lisa S", "Toby",
                 "Jen", "Jen", "Liz", "Liz", "Andrew M",
                 "Nikki", "Chris James", "Yvi", "Yvi",
                 "Yvi", "Beany", "Karina", "Chrissy", "Enrique",
                 "Pete", "Karis", "Laura", "Ryan", "Ryan", "Ryan",
                 "Ryan", "Ryan", "Owen", "Rich", "Rich", "Matt A",
                 "Matt A", "Matt A", "Matt A", "Matt A", "Matt A", 
                 "Matt A", "Matt A")

I had now associated all the names with the object called people_list. Next I created an object that contained numbers 1-40 to represent each horse.

horses_list <- 1:40

With the two sets of values ready to go, I wanted to display them in a table format to make it easier to match names and numbers.

assign_horses <- data.frame(Runners = horses_list, People = people_list)

##   Runners      People
## 1       1 Matt Glover
## 2       2 Matt Glover
## 3       3     Ed Gash
## 4       4     Ed Gash
## 5       5     Ed Gash
## 6       6      Lisa S

Now the data appeared in a table, but had not been randomised. To do this I used the sample function to jumble up the people_list names.

assign_horses <- data.frame(horses_list, sample(people_list))

Free Rein

Success! I had a list of numbers (1-40) representing the horses and a randomly jumbled up list of those taking part in the sweepstake.

At the time of writing (In RMarkdown!), unfortunately fate had randomly selected me the favourite to win. As you can imagine, this is something that will not make you popular in the office.

My First Trot

I hope you enjoyed my first attempt in R. I will definitely use it again to randomise our next sweepstake, though under intense supervision. I can still hear the cries of ‘FIX!’ around the office. It’s always an awkward moment when you win your own sweepstake…

Despite the controversy, it was fun to try out R in an accessible way and it helped me understand some of the basic functions available. Perhaps I’ll sit in on the next LondonR workshop and learn some more!

If you’d like to find out more about the Bath Cats & Dogs Home please visit here.

Love Machine: Automating the romantic songwriting process
Blogs home Featured Image
Owen Jones, Placement Student

Songwriting is a very mysterious process. It feels like creating something from nothing. It’s something I don’t feel like I really control.

— Tracy Chapman

It is February. The shortest, coldest, wettest, miserablest month of the British year.

Only two things happen in Britain during February. For a single evening, the people refrain from dipping all their food in batter and deep-frying it, and instead save some time by pouring the batter straight into a frying pan and eating it by itself; and for an entire day, the exchange of modest indications of affection between consenting adults is permitted, although the government advises against significant deviation from the actions specified in the state-issued Approved Romantic Gestures Handbook.

In Section 8.4 (Guidelines for Pre-Marital Communication) the following suggestion is made:

"Written expressions of emotion should be avoided where possible. Should it become absolutely necessary to express emotion in a written format, it should be limited to a 'popular' form of romantic lyricism. Examples of such 'popular' forms include 'love poem' and 'love song'.

Thankfully, for those who have not achieved at least a master’s degree in a related field, writing a poem or song is a virtually impossible task. And following the sustained and highly successful effort to persuade the British youth that a career in the arts is a fast-track to unemployment, the number of applications to study non-STEM subjects at British universities has been falling consistently since the turn of the decade. This ensures that only the very best and most talented songwriters, producing the most creatively ingenuous work, are able to achieve widespread recognition, and therefore the British public are only exposed to high-quality creative influences.

But to us scientists, the lack of method is disturbing. This “creativity” must have a rational explanation. There must be some pattern.

This is unquestionably a problem which can be solved by machine learning, so let’s take the most obvious approach we can: we’ll train a recurrent neural network to generate song lyrics character by character.

You write down a paragraph or two describing several different subjects creating a kind of story ingredients-list, I suppose, and then cut the sentences into four or five-word sections; mix ’em up and reconnect them. You can get some pretty interesting idea combinations like this. You can use them as is or, if you have a craven need to not lose control, bounce off these ideas and write whole new sections.

— David Bowie

To build our neural network I’m going to be using the Keras machine learning interface (which we’re very excited about here at Mango right now – keep an eye out for workshops in the near future!). I’ve largely followed the steps in this example from the Keras for R website, and I’m going to stick to a high-level description of what’s going on, but if you’re the sort of person who would rather dive head-first into the code, don’t feel like you have to hang around here – go ahead and have a play! And if you want to read more about RNNs, this excellent post by Andrej Kaparthy is at least as entertaining and significantly more informative than the one you’re currently reading.

We start by scraping as many love song lyrics as possible from the web – these will form our training material. Here’s the sort of thing we’re talking about:

Well… that’s how they look to us. Actually, after a bit of preprocessing, the computer sees something more like this:

All line breaks are represented by the pair of characters “\n”, and so all the lyrics from all the songs are squashed down into one big long string.

Then we use this string to train the network. We show the network a section of the string, and tell it what comes next.

So the network gradually learns which characters tend to follow a given fixed-length “sentence”. The more of these what-comes-next examples it sees, the better it gets at correctly guessing what should follow any sentence we feed in.

At this point, our network is like a loyal student of a great artist, dutifully copying every brushstroke in minuscule detail and receiving a slap on the wrist and a barked correction every time it slips up. Via this process it appears to have done two things.

Firstly, it seems to have developed an understanding of the “rules” of writing a song. These rules are complex and multi-levelled; the network first had to learn the rules of English spelling and grammar, before it could start to make decisions about when to move to a new line or which rhyming pattern to use.

(Of course, it hasn’t actually “developed an understanding” of these rules. It has no idea what a “word” is, or a “new line”. It just knows that every few characters it should guess " ", and then sometimes it should put in a "\", and whenever it puts in a "\" then it’s got to follow that up with a "n" and then immediately a capital letter. Easy peasy.)

Secondly, and in exactly the same way, the network will have picked up some of the style of the work it is copying. If we were training it on the songs one specific artist, it would have learned to imitate the style of that particular artist – but we’ve gone one better than that and trained it on all the love songs we could find. So effectively, it’s learned how everyone else writes love songs.

But no-one gets famous by writing songs which have already been written. What we need now is some creativity, some passion, a little bit of je ne sais quoi.

Let’s stop telling our network what comes next. Let’s give it the freedom to write whatever it likes.

I don’t think you can ever do your best. Doing your best is a process of trying to do your best.

— Townes van Zandt

It’s interesting to look at the songwriting attempts of the network in the very early stages of training. At first, it is guessing more or less at random what character should come next, so we end up with semi-structured gobbledegook:

fameliawmalYaws. Boflyi, methabeethirts yt
play3mppioty2=ytrnfuunuiYs blllstis
Byyovcecrowth andtpazo's youltpuduc,s Ijd"a]bemob8b>fiume,;Co
Bliovlkfrenuyione (ju'te,'ve ru t Kis
go arLUUs,k'CaufkfR )s'xCvectdvoldes

Avanrvous Ist'dyMe Dolriri

But notice that even in that example, which was taken from a very early training stage, the network has already nailed the “\n” newline combo and has even started to pick up on other consistent structural patterns like closing a “(” with a “)”. Actually, the jumbled nonsense becomes coherent English (or English-esque) ramblings quite quickly.

There is one interesting parameter to adjust when we ask the model to produce some output: the “diversity” parameter, which determines how adventurous the network should be in its choice of character. The higher we set this parameter, the more the network will favour slightly-less-probable characters over the most obvious choice at each point.

If we set the diversity parameter too low, we often degenerate into uncontrolled bursts of la-ing:

la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la
la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la
la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la
(... lots more "la"s)

But set it too high and the network decides dictionary English is too limiting.

Oh, this younan every, drock on
Scridh's tty'
Is go only ealled
You could have like the one don'm I dope
Love me
And woment while you all that
Was it statiinc. I living you must?
We dirls anythor

It’s difficult to find the right balance between syllabic repetition and progressive vocabulary, and there’s a surprisingly fine line between the two – this will probably prove to be a fruitful area for further academic research.

I think that identifying the optimal diversity parameter is probably the key to good songwriting.

Songwriting is like editing. You write down all this stuff – all this bad, stupid stuff – and then you have to get rid of everything except the very best.

— Juliana Hatfield

So, that’s what I did.

Here are some particularly “beautiful” passages taken from the huge amount of (largely poor) material the model produced. I haven’t done any editing other than to isolate a few consecutive lines at a time and in the last few examples, to start the network off with certain sentences.

Automated love

I know your eyes in the morning sun
I feel the name of love
Love is a picked the sun
All my life I can make me wanna be with you
I just give up in your head
And I can stay that you want a life
I’ve stay the more than I do

How long will I love you
As long as there is that songs
All the things that you want to find you
I could say me true
I want to fall in love with you
I want my life
And you’re so sweet
When I see you wanted to that for you
I can see you and thing, baby
I wanna be alone

Oh yeah I tell you somethin’
I think you’ll understand
When I say that somethin’
I thought the dartion hyand
I want me way to hear
All the things what you do

Wise men say
Only fools rush in
But I can hear your love
And I don’t wanna be alone

If I should stay
I would only be in your head
I wanna know that I hope I see the sun
I want a best there for me too
I just see that I can have beautiful
So hold me to you

Wishing you a Happy Valentine’s Day! (And, I don’t recommend reciting this to your loved one, they might run away.)

Blogs home

Data visualisation is a key piece of the analysis process. At Mango, we consider the ability to create compelling visualisations to be sufficiently important that we include it as one of the core attributes of a data scientist on our data science radar.

Although visualisation of data is important in order to communicate the results of an analysis to stakeholders, it also forms a crucial part of the exploratory process. In this stage of analysis, the basic characteristics of the data are examined and explored.

The real value of data analyses lies in accurate insights, and mistakes in this early stage can lead to the realisation of the favourite adage of many statistics and computer science professors: “garbage in, garbage out”.

Whilst it can be tempting to jump straight into fitting complex models to the data, overlooking exploratory data analysis can lead to the violation of the assumptions of the model being fit, and so decrease the accuracy and usefulness of any conclusions to be drawn later.

This point was demonstrated in a beautifully simplified way by statistician Francis Anscombe, who in 1973 designed a set of small datasets, each showing a distinct pattern of results. Whilst each of the four datasets comprising Anscombe’s Quartet have identical or near identical means, variances, correlations between variables, and linear regression lines, they all highlight the inadequacy of using simple summary statistics in exploratory data analysis.

The accompanying Shiny app allows you to view various aspects of each of the four datasets. The beauty of Shiny’s interactive nature is that you can quickly change between each dataset to really get an in-depth understanding of their similarities and differences.

The code for the Shiny app is available on github.

Putting the cat in scatterplot
Blogs home Featured Image
Clara Schartner, Data Scientist

It will come as no surprise that cats and ggplot are among our favourite things here at Mango, luckily there is an easy way to combine both.

Using the function annotation_custom in the popular ggplot2 package it is possible to display images on a plot i.e. points of a scatterplot. This way data can be displayed in a more fun, creative way.

In keeping with the cat theme I have chosen a data set about cats and a cat icon based on Mango the cat. The MASS package provides a data set called cats which contains the body weight, heart weight and sex of adult cats.

cats <- cats[sample(1:144, size = 40),]

First a normal scatterplot is defined on which the images will be plotted later:

sCATter <-ggplot(data = cats, aes(x = Bwt, y = Hwt)) +
geom_point(size = 0, aes(group = Sex, colour = Sex)) +
theme_classic() +
xlab("Body weight") +
ylab("Heart weight") +
ggtitle("sCATterplot") +
theme(plot.title = element_text(hjust = 0.5)) +
# create a legend
values = c("#999999", "#b35900" ),
name = "Cat",
labels = c("Male cat", "Female cat")
) +
guides(colour = guide_legend(override.aes = list(size = 10)))

Any png image can be used for the plot, however images with a transparent background are preferable.

mCat <- readPNG("MaleCat.png")
feCat<- readPNG("FemaleCat.png")

In the last step the cats are iteratively plotted onto the plot using annotation_custom.

for (i in 1:nrow(cats)) {
# distinguishing the sex of the cat
if (cats$Sex[i] == "F") {
image <- feCat
} else{
image <- mCat
sCATter = sCATter +
xmin = cats$Bwt[i] - 0.6,
xmax = cats$Bwt[i] + 0.6,
ymin = cats$Hwt[i] - 0.6,
ymax = cats$Hwt[i] + 0.6

The cat´s paw trail is displaying a linear regression of heart on body weight. This can easily be added by computing a linear Regression, defining a grid to calculate the expected values and plotting cats on top of this data.

LmCat <- lm(Hwt~Bwt, data = cats)

steps <- 20
Reg <- data.frame(Bwt = 
seq(from = min(cats$Bwt), 
to = max(cats$Bwt), 
length.out = steps))
Reg$Hwt <- predict(LmCat, newdata = Reg)
sCATter <- sCATter + 
geom_point(data = Reg, aes(Bwt, Hwt), size = 0)

paw <- readPNG("paw.png")
for (i in 1:nrow(Reg)) {
sCATter = sCATter +
xmin = Reg$Bwt[i] - 0.6,
xmax = Reg$Bwt[i] + 0.6,
ymin = Reg$Hwt[i] - 0.6,
ymax = Reg$Hwt[i] + 0.6

I hope you have as much fun as I did with this ggplot2 package!