Blogs home

Another month, another sweepstake to raise money for the Bath Cats & Dogs home!

This time, we picked the Eurovision song contest as our sweepstake of choice. After enjoying my first experience of using R to randomise the names for the previous sweepstake I decided to give it another go, but with a few tweaks.

Soundcheck

During my first attempt in R, issues arose when I had been (innocently!) allocated the favourite horse to win. I had no way to prove that the R code had made the selection, as my work was not reproducible.

So with the cries of “cheater!” and “fix!”” still ringing in my ears, we started by setting a seed. This meant that if someone else was to replicate my code they would get the same results; therefore removing the dark smudge against my good name.

At random I selected the number 6 at which to set my seed.

set.seed(6)

I next compiled my lists of people and Eurovision countries and associated them with correlating objects.

people_list <- c(
    "Andy M",
    "Adam",
    "Laura",
    "Rachel",
    "Owen",
    "Yvi",
    "Karis",
    "Toby",
    "Jen",
    "Matty G",
    "Tatiana",
    "Amanda",
    "Chrissy",
    "Lisa",
    "Lisa",
    "Ben",
    "Ben",
    "Robert",
    "Toby",
    "Matt A",
    "Lynn",
    "Ruth",
    "Julian",
    "Karina",
    "Colin",
    "Colin")
countries_list <- c(
    "Albania",
    "Australia",
    "Austria",
    "Bulgaria",
    "Cyprus",
    "Czech Rep",
    "Denmark",
    "Estonia",
    "Finland",
    "France",
    "Germany",
    "Hungary",
    "Ireland",
    "Israel",
    "Italy",
    "Lithuania",
    "Moldova",
    "Norway",
    "Portugal",
    "Serbia",
    "Slovenia",
    "Spain",
    "Sweden",
    "The Netherlands",
    "Ukraine",
    "United Kingdom"
  )

Once I had the lists associated with objects, I followed the same steps as my previous attempt in R. I put both objects into data frames and then used the sample function to jumble up the names.

assign_countries <- data.frame(people = people_list,
                               countries = sample(countries_list))

Task complete!

Fate had delivered me Denmark, who were nowhere near the favourites at the point of selection. I sighed with relief knowing that I had no chance of winning again and that perhaps maybe now I could start to re-build my reputation as an honest co-worker...

Encore

Before I finished my latest foray into R, we decided to create a function for creating sweepstakes in R.

I was talked down from picking the name SweepstakeizzleR and decided upon the slightly more sensible sweepR.

I entered the desired workings of the function, which followed from the above work in R.

sweepR <- function(a, b, seed = 1234){
 set.seed(seed)
 data.frame(a, sample(b))
}

Once done, I could use my newly created function to complete the work I had done before but in a much timelier fashion.

sweepR(people_list, countries_list)

My very first function worked! Using a function like sweepR will allow me to reliably reproduce the procedures I need for whatever task I'm working on. In this case it has enabled me to create a successfully random sweepstake mix of names and entries.

WinneR

With great relief Israel won Eurovision and I was very happy to hand over the prize to Amanda.

I really enjoyed learning a little more about R and how I can create functions to streamline my work. Hopefully another reason will come up for me to learn even more soon!

Blogs home Featured Image

A definition of Data Science

Much of my time is spent talking to organisations looking to build a data science capability, or generally looking to use analytics to drive better decision making. As part of this, I’m often asked to present on a range of topics around data science. The two topics I’m asked to present on most are: ‘What is Data science?’ and ‘What is a Data Scientist?’. I thought I’d share how we at Mango define what Data science is, along with the reasoning behind our definition.

Where did the term Data science come from?

Professor Jeff Wu —the Coca-Cola Chair in Engineering Statistics at Georgia Institute of Technology— popularised the term ‘data science’ during a talk in 1997. Before this, the term statistician was widely used instead. Professor Wu felt that the title ‘Statistician’ no longer covered the array of work being done by statisticians, and that ‘Data Scientist’ better encapsulated the multi-facetted role.

So, surely defining what a Data Scientist is and what they do should be a simple task – just bring up an image of Professor Wu and reference his 1997 lecture and ask for questions. However, the original definition has evolved since then and, in fact, most data scientists I meet are unfamiliar with Professor Wu.

What does Data Science mean today?

As mentioned, what ‘Data science’ meant originally and what it means today are two very different things. To develop what Mango’s definition of Data science would be, we looked to the wider community to see what they were saying.

Twitter has given us some great definitions, such as:

One early definition of what a data scientist means, is from Josh Wills, current Director of Data Engineering at Slack. Back in 2012, Josh described a data scientist as follows:

This speaks more directly to the data scientist being a ‘merging’ of different skillsets – a mix of a ‘statistician’ and ‘software engineer’.

Drew Conway, now CEO of Alluvium, took this concept further with a heavily used venn diagram:

Beyond these definitions, I’ve heard a range of blunt comments about what a Data Scientists is and isn’t.

For example, at a recent data science event a speaker announced that “if you haven’t got a PhD then you’re not a data scientist”, which, of course, caused a fair amount of upset across the room of non-PhD-data-scientists!

Our interest at Mango in defining and understanding what a Data Scientist is, stems from the need to hire new talent. How do we describe the job? What skills must they have? Are our expectations too high?

We’ve seen some unrealistic job descriptions that say a data scientist should be able to:

  • Understand every analytic algorithm from the statistical or computer science world, including machine learning, deep learning and whatever other algorithm the hiring company has just read about in a blog post
  • Be an expert in a range of technologies including R, Python, Spark, Julia and a veritable zoo-ful of Apache projects
  • Be equally comfortable discussing complex error structures or speaking to the chief execs about analytic strategy

These people just don’t exist.

To me, the trouble with most definitions of a data scientist seem detached from an agreed definition of data science. If a data scientist is someone who does data science, then surely we need to agree on what that is before understanding the skills needed to perform it successfully?

Drumroll please…

As per my earlier statement, it is clear that today data science has come to represent a lot more than Professor Wu’s original definition. At Mango, after countless arguments heated discussions, we arrived at the following (very carefully worded) definition:

Data Science is…the proactive use of data and advanced analytics to drive better decision making.

The four key parts

‘Data’

I might be stating the obvious here, but we can’t do data science without the data. What’s interesting is that data science is often associated with the extremes of Doug Laney’s famous ‘3 V’s’:

  • Volume – the size of data to be analysed, driving data science’s ongoing associated with the world of ‘big data’
  • Variety – with algorithms focused on analysing a range of structured and unstructured data types (e.g. image, text, video) being developed faster perhaps than the business cases are understood
  • Velocity – the speed at which new data is created and speed of decision therefore required, leading to stream analytics and increased usage of machine learning approaches

However, data science is equally applicable to small, rectangular, static datasets in my mind.

‘Advanced analytics’

Generally, analytics can be thought of in four categories:

  • Descriptive Analytics: the study of ‘what happened?’ This is largely concerned with the reporting of results and summaries via static or interactive (e.g. dashboards) and is more commonly referred to as ‘Business Intelligence’
  • Diagnostic Analytics: a study of why something happened. This typically involves feature engineering, model development etc.
  • Predictive Analytics: the modelling of what might happen under different circumstances. This is a mechanism for understanding possible outcomes and the certainty (or lack of) with which we can make predictions
  • Prescriptive Analytics: the analysis of ‘optimum’ ways to behave in which to ‘minimise’ or ‘maximise’ a desired outcome.

As we progress through these categories, the complexity increases, and hopefully the value added to the business as well. But this isn’t a list of steps – you could jump straight to predictive or prescriptive analytics without touching on either descriptive or diagnostic.

It’s important to distinguish that data science is focused on advanced analytics and using the above definitions, this would mean dealing with everything beyond descriptive analytics.

‘Proactive’

‘Proactive’ was included to distinguish data science from the more traditional ‘statistical analysis’. In my experience, when I started my career as a statistician in industry, an organisation’s analytic function seemed a largely ‘reactive’ practice. Modern data science needs to be an active part of the business function and look for ways to improve the business.

‘To drive better decision making’

I think the last part of the definition is the most important part. If we ignore this, then there’s a danger of doing the expensive cool stuff and not actually adding any value. With organisations investing heavily in data science as an industry, we need to deliver – otherwise we may be in a situation where data science as a phrase becomes associated with high-cost initiatives that never truly add value.

We need to be very clear about something: we can use the best tech, leverage the most clever algorithms, and apply them to the cleanest data, but unless we change the way something is done then we’re not adding value. To move the needle with data science, we need to positively impact the way the business does something.

So, what is a Data Scientist?

Each part of our definition hints at a particular skill that’s needed:

  • Data: ability to manipulate data across a number of dimensions (volume, variety, velocity)
  • Advanced analytics: understanding of a range of analytic approaches
  • Proactive: communication skills that allow us to interact with the business
  • Decision making: the ability to turn analytic thinking (e.g. models) into production code so they can be embedded in systems that deliver insight or action

If data science, as a proactive pursuit, is concerned with the meeting of a range of business challenges, then a data scientist must —understand at least the possibilities— a wider range of analytic approaches.

So… we just need to hire Unicorns?

From what I’ve said earlier it sounds like you just need to hire people who understand every analytic technique, code in every language, etc.

I’ve been interviewing prospective Data Scientists for more than 15 years and I can safely say that data science ‘unicorns’ don’t exist (unless you know one, and they’re interested in a role – in which case, please contact me!).

The fact that unicorns don’t exist leads to a very important part of data science: Data Science is a Team Sport!

While we can’t hire people with all the skills required, we can hire data scientists with some of the required skills, and then create a team of complementary skillsets. This way we can create a team that, as a collective, contains all of the skills required for data science. How to successfully hire this team is whole other blog post (keep your eyes peeled)!

Do you know where you currently sit with your skills and knowledge? Take our Data Science Radar quiz to find out!

If you’re looking at building your company’s data science capabilities, the Mango team have helped organisations across a range of industries around the world build theirs. The key is having the right team and the right guidance to ensure your analytics are in line with your objectives. That’s where we come in, so contact us today: sales@mango-solutions.com
Blogs home Featured Image

This year at Mango we’re proudly sponsoring the Bath Cats & Dogs Home. To start our fundraising for them, we decided to run a sweepstake on the Grand National. We asked for £2 per horse, which would go to the cats and dogs home and the winner was promised a bottle of wine for their charitable efforts.

Working in a Data Science company I knew that I couldn’t simply pick names out of a hat for the sweepstake, ‘That’s not truly random!’ they would cry. So in my hour of need, I turned to our two university placement students Owen and Caroline to help me randomise the names in R.

Non-starter

To use an appropriate horse-based metaphor, I would class myself as a ‘non-starter’ in R – I’m not even near the actual race! My knowledge is practically non-existent (‘Do you just type alot of random letters?’) and up until this blog I didn’t even have RStudio on my laptop.

The first hurdle

We began by creating a list of the people who had entered the sweepstake. With some people betting on more than one horse their name was entered as many times as needed to correlate to how many bets they had laid down.

people_list <- c("Matt Glover", "Matt Glover", "Ed Gash",
                 "Ed Gash", "Ed Gash", "Lisa S", "Toby",
                 "Jen", "Jen", "Liz", "Liz", "Andrew M",
                 "Nikki", "Chris James", "Yvi", "Yvi",
                 "Yvi", "Beany", "Karina", "Chrissy", "Enrique",
                 "Pete", "Karis", "Laura", "Ryan", "Ryan", "Ryan",
                 "Ryan", "Ryan", "Owen", "Rich", "Rich", "Matt A",
                 "Matt A", "Matt A", "Matt A", "Matt A", "Matt A", 
                 "Matt A", "Matt A")

I had now associated all the names with the object called people_list. Next I created an object that contained numbers 1-40 to represent each horse.

horses_list <- 1:40

With the two sets of values ready to go, I wanted to display them in a table format to make it easier to match names and numbers.

assign_horses <- data.frame(Runners = horses_list, People = people_list)
head(assign_horses)

##   Runners      People
## 1       1 Matt Glover
## 2       2 Matt Glover
## 3       3     Ed Gash
## 4       4     Ed Gash
## 5       5     Ed Gash
## 6       6      Lisa S

Now the data appeared in a table, but had not been randomised. To do this I used the sample function to jumble up the people_list names.

assign_horses <- data.frame(horses_list, sample(people_list))

Free Rein

Success! I had a list of numbers (1-40) representing the horses and a randomly jumbled up list of those taking part in the sweepstake.

At the time of writing (In RMarkdown!), unfortunately fate had randomly selected me the favourite to win. As you can imagine, this is something that will not make you popular in the office.

My First Trot

I hope you enjoyed my first attempt in R. I will definitely use it again to randomise our next sweepstake, though under intense supervision. I can still hear the cries of ‘FIX!’ around the office. It’s always an awkward moment when you win your own sweepstake…

Despite the controversy, it was fun to try out R in an accessible way and it helped me understand some of the basic functions available. Perhaps I’ll sit in on the next LondonR workshop and learn some more!

If you’d like to find out more about the Bath Cats & Dogs Home please visit here.

Join Us For Some R And Data Science Knowledge Sharing In 2018
Blogs home Featured Image

We’re proud to be part of the Data Science and R communities.

We recognise the importance of knowledge sharing across industries, helping people with their personal and professional development, networking, and collaboration in improving and growing the community. This is why we run a number of events and participate in many others.

Each year, we host and sponsor events across the UK, Europe and the US. Each event is open everyone —experienced or curious— and aims to help people share and gain knowledge about Data Science and to get them involved with the wider community. To get you started we’ve put together a list of our events you can attend over the next 12 months:

Free community events

LondonR

We host LondonR in central London every two months. At each meet up we have three brilliant R presentations followed by networking drinks – which are on us. Where possible we also offer free workshops about a range of R topics, including Shiny, ggplot2 and the Tidyverse.

The next event is on 27 March at UCL, you can sign up to our mailing list to hear about future events.

Manchester R

Manchester R takes place four times a year. Following the same format as LondonR, you will get three presentations followed by networking drinks on us. We also offer free workshops before the main meeting so you can stay up-to-date with the latest tools.

Our next event is on 6 February where the R-Ladies are taking over for the night. For more information visit the Manchester R website.

Bristol Data Scientists

Our Bristol Data Science events have a wider focus, but they follow the same format as our R user groups – three great presentations from the community and then drinks on us. If you’re interested in Data Science, happen to be a Data Scientist or work with data in some way then you are welcome to join us.

This year, we’re introducing free Data Science workshops before the meeting, so please tell us what you’d like to hear more about.

The Bristol meetup takes place four times a year at the Watershed in central Bristol. If you’d like to come we recommend joining the meetup group to stay in the loop.

BaselR

This meet up is a little further afield, but if you’re based in or near Basel, you’ll catch us twice a year running this R user group. Visit the BaselR websitefor details on upcoming events.

OxfordR

As you may have guessed, we love R, so we try to support the community where we can. We’ve partnered up with OxfordR this year to bring you pizza and wine while you network after the main presentation. OxfordR is held on the first Monday of every month, you can find details here on their website.

BirminghamR

BirminghamR is under new management and we are helping them get started. Their first event for 2018 is coming up on 25 January; for more information check out their meetup page.

Data Engineering London

One of our newest meetup groups focuses on Data Engineering. We hold two events a year that give Data Engineers in London the opportunity to listen to talks on the latest technology, network with fellow engineers and have a drink or two on us. The next event will be announced in the coming months. To stay up-to-date please visit the meetup group.

Speaking opportunities

As well as attending our free events, you can let us know if you’d like to present a talk. If you have something you’d like to share just get in touch with the team by emailing us.

EARL Conferences

Our EARL Conferences were developed on the success of our R User Groups and the rapid growth of R in enterprise. R users in organisations around the country were looking for a place to share, learn and find inspiration. The enterprise focus of EARL makes it ideal for people to come and get some ideas to implement in the workplace. Every year delegates walk away feeling inspired and ready to work R magic in their organisations.

This year our EARL Conference dates are: London: 11-13 September at The Tower Hotel Seattle: 7 November at Loews Hotel 1000 Houston, 9 November at Hotel Derek Boston, 13 November at The Charles Hotel

Speak at EARL

If you’re doing exciting things with R in your organisation, submit an abstract so others can learn from your wins. Accepted speakers get a free ticket for the day they are speaking.

Catch us at…

As well as hosting duties we are proud to sponsor some great community events, including PyData London in April and eRum in May.

Plus, you’ll find members of the Mango team speaking at Data Science events around the country. If you’d love to have one of them present at your event, please do get in touch.

Wherever you’re based we hope we will see you soon.

Field Guide to the R Ecosystem
Blogs home Featured Image
Mark Sellors, Head of Data Engineering

I started working with R around about 5 years ago. Parts of the R world have changed substantially over that time, while other parts remain largely the same. One thing that hasn’t changed however, is that there has never been a simple, high-level text to introduce newcomers to the ecosystem. I believe this is especially important now that the ecosystem has grown so much. It’s no longer enough to just know about R itself. Those working with, or even around R, must now understand the ecosystem as a whole in order to best manage and support its use.

Hopefully the Field Guide to the R Ecosystem goes some way towards filling this gap.

The field guide aims to provide a high level introduction to the R ecosystem. Designed for those approaching the language for the first time, managers, ops staff, and anyone that just needs to get up to speed with the R ecosystem quickly.

This is not a programming guide and contains no information about the language itself, so it’s very definitely not aimed at those already developing with R. However, it is hoped that the guide will be useful to people around those R users. Whether that’s their managers, who’d just like to understand the ecosystem better, or ops staff tasked with supporting R in an enterprise, but who don’t know where to start.

Perhaps, you’re a hobbyist R user, who’d like to provide more information to your company in order to make a case for adopting R? Maybe you’re part of a support team who’ll be building out infrastructure to support R in your business, but don’t know the first thing about R. You might be a manager or executive keen to support the development of an advanced analytics capability within your organisation. In all of these cases, the field guide should be useful to you.

It’s relatively brief and no prior knowledge is assumed, beyond a general technical awareness. The topics covered include, R, packages and CRAN, IDEs, R in databases, commercial versions of R, web apps and APIs, publishing and the community.

I really hope you, or someone around you, finds the guide useful. If you have any feedback, find me on twitter and let me know. If you you’d like to propose changes to the guide itself, you’ll find instructions in the first chapter and the bookdown source on GitHub. Remember, the guide is intentionally high-level and is intended to provide an overview of the ecosystem only, rather than any deep-dive technical discussions. There are already plenty of great guides for that stuff!

I’d also like to say a huge thanks to everyone who has taken time out of their day to proof read this for me and provide invaluable feedback, suggestions and corrections. The community is undoubtedly one of R’s greatest assets.

Originally posted on Mark’s blog, here.

Blogs home Featured Image

Prelude

Maybe you’re looking for a change of scene. Maybe you’re looking for your first job. Maybe you’re stuck in conversation with a relative who you haven’t spoken to since last Christmas and who has astonishingly strong opinions on whether cells ought to be merged or not in Excel spreadsheets.

The fact of the matter is that you have just encountered the term “data science” for the first time, and it sounds like it might be interesting but you don’t have a clue what it is. Something to do with computers? Should you bring a lab coat, or a VR headset? Or both? What is a data and how does one science it?

Fear not. I am here to offer subjective, questionable and most importantly FREE advice from the perspective of someone who was in that very position not such a long time ago. Read on at your peril.

I. Adagio: Hear about data science

This is the hard bit. It’s surprisingly difficult to stumble upon data science unless someone tells you about it.

But the good news is that you’re reading this, so you’ve already done it. Possibly a while ago, or possibly just now; either way, put a big tick next to Step 1. Congratulations!

(By the way, you’ll remember the person who told you about data science. When you grow in confidence yourself, be someone else’s “person who told me about data science”. It’s a great thing to share. But all in good time…)

II. Andante: Find out more

But what actually is data science?

To be honest, it’s a fairly loosely-defined term. There are plenty of articles out there that try to give an overview, but most descend into extended discussions about the existence of unicorns or resort to arranging countless combinations of potentially relevant acronyms in hideous indecipherable Venn diagrams.

You’re much better off finding examples of people “doing” data science. Find some blogs (here are a few awesome ones to get you started) and read about what people are up to in the real world.

Don’t be afraid to narrow down and focus on a specific topic that interests you – there’s so much variety out there that you’re bound to find something that inspires you to keep reading and learning. But equally, explore as many new areas as you can, because the more context you can get about the sector the better your understanding will be and you’ll start to see how different subjects and different roles relate to each other.

Believe it or not, one of the best tools for keeping up to date with the latest developments in the field is Twitter. If you follow all your blog-writing heroes, not only will you be informed whenever they publish a new article but you’ll also get an invaluable glimpse into their day-to-day jobs and working habits, as well as all the cool industry-related stuff they share. Even if you never tweet anything yourself you’ll be exposed to much more than you’d be able to find on your own. If you want to get involved there’s no need to be original – you could just use it to share content you’ve found interesting yourself.

If you’re super keen, you might even want to get yourself some data science books tackling a particular topic. Keep an eye out for free online/ebook versions too!

III. Allegretto: Get hands-on

Observing is great, but it will only get you so far.

Imagine that you’ve just heard about an amazing new thing called “piano”. It sounds great. No, it sounds INCREDIBLE. It’s the sort of thing you really want to be good at.

So you get online and read more about it. Descriptions, analyses, painstaking breakdowns of manual anatomy and contrapuntal textures. You watch videos of people playing pianos, talking about pianos, setting pianos on fire and hurling them across dark fields. You download reams of free sheet music and maybe even buy a book of pieces you really want to learn.

But at some point… you need to play a piano.

The good news is that with data science, you don’t need to buy a piano, or find somewhere to keep it, or worry about bothering your family/friends/neighbours/pets with your late-night composing sessions.

Online interactive coding tutorials are a great place to start if you want to learn a new programming language. Sites like DataCamp and Codecademy offer a number of free courses to get yourself started with data science languages like R and Python. If you are feeling brave enough, take the plunge and run things on your own machine! (I’d strongly recommend using R with RStudio and using Anaconda for Python.) Language-specific “native-format” resources such as [SWIRL]() for R or this set of Jupyter notebooks for Python are a great way to learn more advanced skills. Take advantage of the exercises in any books you have – don’t just skip them all!

Data science is more than just coding though – it’s all about taking a problem, understanding it, solving it and then communicating those ideas to other people. So Part 1 of my Number One Two-Part Top Tip for you today is:

  1. Pick a project and write about it

How does one “pick a project”? Well, find something that interests you. For me it was neural networks (and later, car parks…) but it could be literally anything, so long as you’re going to be able to find some data to work with. Maybe have a look at some of the competitions hosted on Kaggle or see if there’s a group in your area which publishes open data.

Then once you’ve picked something, go for it! Try out that cool package you saw someone else using. Figure out why there are so many missing values in that dataset. Take risks, explore, try new things and push yourself out of your comfort zone. And don’t be afraid to take inspiration from something that someone else has already done: regardless of whether you follow the same process or reach the same outcome, your take on it is going to be different to theirs.

By writing about that project – which is often easier than deciding on one in the first place – you’re developing your skills as a communicator by presenting your work in a coherent manner, rather than as a patchwork of dodgy scripts interspersed with the occasional hasty comment. And even if you don’t want to make your writing public, you’ll be amazed how often you go back and read something you wrote before because it’s come up again in something else you’re working on and you’ve forgotten how to do it.

I’d really encourage you to get your work out there though. Which brings us smoothly to…

IV. Allegro maestoso: Get yourself out there

If you never play the piano for anyone else, no-one’s ever going to find out how good you are! So Part 2 of my Number One Two-Part Top Tip is:

  1. Start a blog

It’s pretty easy to get going with WordPress or similar, and it takes your writing to the next level because now you’re writing for an audience. It may not be a very big audience, but if someone, somewhere finds your writing interesting or useful then surely it’s worth it. And if you know you’re potentially writing for someone other than yourself then you need to explain everything properly, which means you need to understand everything properly. I often learn more when I’m writing up a project than when I’m playing around with the code in the first place.

Also, a blog is a really good thing to have on your CV and to talk about at interviews, because it gives you some physical (well, virtual) evidence which you can point at as you say “look at this thing wot I’ve done”.

(Don’t actually say those exact words. Remember that you’re a Good Communicator.)

If you’re feeling brave you can even put that Twitter account to good use and start shouting about all the amazing things you’re doing. You’ll build up a loyal following amazingly quickly. Yes, half of them will probably be bots, but half of them will be real people who enjoy reading your work and who can give you valuable feedback.

Speaking of real people…

  1. Get involved in the community

Yes, that was indeed Part 3 of my Number One Two-Part Top Tip, but it’s so important that it needs to be in there.

The online data science community is one of the best out there. The R community in particular is super friendly and supportive (check out forums like RStudio Community, community groups like R4DS, and the #rstats tag on Twitter). Get involved in conversations, learn from people already working in the sector, share your own knowledge and make friends.

Want to go one better than online?

Get a Meetup account, sign up to some local groups and go out to some events. It might be difficult to force yourself to go for the first time, but pluck up the courage and do it. Believe me when I say there’s no substitute for meeting up and chatting to people. Many good friends are people I met for the first time at meetups. And of course, it’s the perfect opportunity to network – I met 5 or 6 of my current colleagues through BathML before I even knew about Mango! (If you’re in or near Bristol or London, Bristol Data Scientists and LondonR are both hosted by Mango and new members are always welcome!)

Postlude

Of course, everything I’ve just said is coming from my point of view and is entirely based on my own experiences.

For example, I’ve talked about coding quite a lot because I personally code quite a lot; and I code quite a lot because I enjoy it. That might not be the case for you. That’s fine. In fact it’s more than “fine”; the huge diversity in people’s backgrounds and interests is what makes data science such a fantastic field to be working in right now.

Maybe you’re interested in data visualisation. Maybe you’re into webscraping. Or stats. Or fintech, or NLP, or AI, or BI, or CI. Maybe youare the relative at Christmas dinner who won’t stop banging on about why you should NEVER, under ANY circumstances, merge cells in an Excel spreadsheet (UNLESS it is PURELY for purposes of presentation).

Oh, why not:

  1. Find the parts of data science that you enjoy and arrange them so that they work for you.