Blogs home Featured Image

Becoming a data-driven business is high on the agenda of most companies but it can be difficult to know how to get started or know what the opportunities could be.

Mango Solutions’ has a bespoke workshop, ‘Art of The Possible’ to help senior leadership teams see the potential and overcome some of the common challenges.

Find out more in our new report developed in partnership with IBM.

Blogs home

I always find it difficult to pick highlights from a conference and the eRum 2018 team did a fantastic job of making it difficult for me once again, so here goes…

Day One

The first day offered a huge choice in workshops, but teaching one of them meant we didn’t make it to any of the others. However, everyone we spoke to had great things to say about them all. In fact, we were overwhelmed by the turnout for our own workshop on the keras package (and we have to give a shout out to Mark Sellors for setting up and monitoring the server for us). By the way, if you missed out, you can sign up for the workshop at EARL London in September.

Day Two

Tuesday might have been a rainy start outside but inside we were mesmerised by the transparent roof in the Akvarium Klub. For team Mango, the morning mostly involved cat restocking, so we were really grateful for the live streaming that enabled us to keep up with everything going on in the main room.

My favourite presentations included:

Having newly been introduced to the recipes package I particularly enjoyed seeing Edwin Thoen talk about how to add your own data preparation steps and checks.

Olga Mierzwa-Sulima presented six packages to add functionality to shiny apps. These cover UI aspects like using semantic elements in shiny or easily exchanging themes for the app as well as user management aspects like authentication and controlling the level of access for different users. She also covered additional functionality like making routing possible with shiny and building multi-language apps.

Jeroen Ooms spoke about using Rust code in R packages. Rust is a new system programming language and can be an alternative to C/C++. Jeroen mentioned several advantages including Rust being memory safe and as fast as C/C++ while being far safer. It ships with a native package manager (cargo) and does not need a runtime library which means that the binary (R) package does not depend on Rust or cargo. He stressed that it’s easy to wrap Rust libraries into R packages so hopefully soon the selection of tools available from R will be even more varied.

A particular highlight for Doug Ashton came on Tuesday afternoon with three complementary talks on machine learning. With all the buzz to deal with in the ML world right now, Doug thought the practical talks from three level-headed practioners were very useful:

First Erin LeDell, Chief Machine Learning Scientist at h2o, gave an excellent talk on their automl package – a system they’ve been working hard on to run several different algorithms and select the best. Doug’s favourite part was their automated model ensembling (aka model stacking) that provides the best mix of all the algorithms.

Szilárd Pafka, Chief Scientist at Epoch followed on with a presentation on the provactively titled “Better than deep learning: Gradient Boosted Machines in R”, where he talked about why the majority of ML problems he sees are not best suited for deep learning. Szilárd also gave a nice overview of the best performing algorithms for gbm. (Doug’s note to self was to check out Microsoft’s lightgbm and xgboost with gpu.)


Going back to the theme of automation Andrie de Vries, Solutions Engineer at RStudio, took us through how to tune your tensorflow models using the tfruns package to run grid/random search over the hyperparameter space. This is very timely as we are often asked about how to select the right network topology and until now we’ve largely hand-tuned. Andrie then took us through an example where deep learning certainly is the right choice—image/pattern recognition with convolutional neural nets (CNNs)—and taking our example from the keras workshop significantly improved the accuracy with automated tuning – 👏.

After the rainy to start the day we were all relieved to see it had cleared up by the time the evening event (a river cruise) came around – we were lucky to have stunning views of Budapest from the Danube; the sunset on the Parliament building with stormy skies overhead was incredible.

Day Three

On Wednesday morning Roger Bivand did a great job of talking us all through some of the very important history of R – you all knew “_” was once an assignment operator, right?

RStudio’s Barbara Borges Ribeiro showed off a cool shiny app for drilling down into data and making use of the dynamic insertion of UI. Unfortunately I missed her talk, but managed to get a demo of it later in the day – you can take a look at the app on GitHub.

The biggest highlight for me though were the people. Without a doubt it was one of the friendliest conferences I have attended. Everyone was happy to share their experiences, answer your questions and point you in the direction of tools to look at later. Importantly, everyone was made to feel welcome, from the most-experienced to the newest R users.

The videos from all talks are being made available, check the conference homepage for the link.

Congratulations to the whole organising committee, led by Gergely Daroczi, for putting on such a great event. Mango are certainly looking forward to the next eRum conference!

Blogs home

We’re delighted to announce RStudio’s Garrett Grolemund as one of our Keynote Speakers at this year’s EARL London.

He will join Starcount’s Edwina Dunn and a whole host of brilliant speakers for the 5th EARL London Conference on 11-13 September at The Tower Hotel.

Garrett specialises in teaching people how to use R, which is why you’ll see his name on some brilliant resources, including video courses on Datacamp.com and O’Reilly media, his series of popular R cheat sheets distributed by RStudio, and as co-author of R for Data Science and Hands-On Programming with R. He also wrote the lubridate R package and works for RStudio as an advocate who trains engineers to do data science with R and the Tidyverse.

He earned his Phd in Statistics from Rice University in 2012 under the guidance of Hadley Wickham. Before that, he earned a Bachelor’s degree in Psychology from Harvard University and briefly attended law school before wising up.

Garrett is one of the foremost promoters of Shiny, R Markdown, and the Tidyverse, so we’re really looking forward to his keynote.

Don’t miss out on early bird tickets

Early bird tickets for all EARL Conferences are now available:
London: 11-13 September
Seattle: 7 November
Houston: 9 November
Boston: 13 November

Blogs home

We are excited to announce the speakers for this year’s EARL London Conference!

Every year, we receive an immense number of excellent abstracts and this year was no different – in fact, it’s getting harder to decide. We spent a lot of time deliberating and had to make some tough choices. We would like to thank everyone who submitted a talk – we appreciate the time taken to write and submit; if we could accept every talk, we would.

This year, we have a brilliant lineup, including speakers from Auto Trader, Marks and Spencer, Aviva, Hotels.com, Google, Ministry of Defence and KPMG. Take a look below at our illustrious list of speakers:

Full length talks
Abigail Lebrecht, Abigail Lebrecht Consulting
Alex Lewis, Africa’s Voices Foundation
Alexis Iglauer, PartnerRe
Amanda Lee, Merkle Aquila
Andrie de Vries, RStudio
Catherine Leigh, Auto Trader
Catherine Gamble, Marks and Spencer
Chris Chapman, Google
Chris Billingham, N Brown PLC
Christian Moroy, Edge Health
Christoph Bodner, Austrian Post
Dan Erben, Dyson
David Smith, Microsoft
Douglas Ashton, Mango Solutions
Dzidas Martinaitis, Amazon Web Services
Emil Lykke Jensen, MediaLytic
Gavin Jackson, Screwfix
Ian Jacob, HCD Economics
James Lawrence, The Behavioural Insights Team
Jeremy Horne, MC&C Media
Jobst Löffler, Bayer Business Services GmbH
Jo-fai Chow, H2O.ai
Jonathan Ng, HSBC
Kasia Kulma, Aviva
Leanne Fitzpatrick, Hello Soda
Lydon Palmer, Investec
Matt Dray, Department for Education
Michael Maguire, Tusk Therapeutics
Omayma Said, WUZZUF
Paul Swiontkowski, Microsoft
Sam Tazzyman, Ministry of Justice
Scott Finnie, Hymans Robertson
Sean Lopp, RStudio
Sima Reichenbach, KPMG
Steffen Bank, Ekstra Bladet
Taisiya Merkulova, Photobox
Tim Paulden, ATASS Sports
Tomas Westlake, Ministry Of Defence
Victory Idowu, Aviva
Willem Ligtenberg, CZ

Lightning Talks
Agnes Salanki, Hotels.com
Andreas Wittmann, MAN Truck & Bus AG
Ansgar Wenzel, Qbiz UK
George Cushen, Shop Direct
Jasmine Pengelly, DAZN
Matthias Trampisch, Boehringer Ingelheim
Mike K Smith, Pfizer
Patrik Punco, NOZ Medien
Robin Penfold, Willis Towers Watson

Some numbers

We thought we would share some stats from this year’s submission process:


This is based on a combination of titles, photos and pronouns.

Agenda

We’re still putting the agenda together, so keep an eye out for that announcement!

Tickets

Early bird tickets are available until 31 July 2018, get yours now.

EARL Seattle Keynote Speaker Announcement: Julia Silge
Blogs home Featured Image

We’re delighted to announce that Julia Silge will be joining us on 7 November in Seattle as our Keynote speaker.

Julia is a Data Scientist at Stack Overflow, has a PhD in astrophysics and an abiding love for Jane Austen (which we totally understand!). Before moving into Data Science and discovering R, Julia worked in academia and ed tech, and was a NASA Datanaut. She enjoys making beautiful charts, programming in R, text mining, and communicating about technical topics with diverse audiences. In fact, she loves R and text mining so much, she literally wrote the book on it: Text Mining with R: A Tidy Approach!

We can’t wait to see what Julia has to say in November.

Submit an abstract

Abstract submissions are open for both the US Roadshow in November and London in September. You could be on the agenda with Julia in Seattle as one of our speakers if you would like to share the R successes in your organisation.

Submit your abstract here.

Early bird tickets now available

Tickets for all EARL Conferences are now available:
London: 11-13 September
Seattle: 7 November
Houston: 9 November
Boston: 13 November

Join Us For Some R And Data Science Knowledge Sharing In 2018
Blogs home Featured Image

We’re proud to be part of the Data Science and R communities.

We recognise the importance of knowledge sharing across industries, helping people with their personal and professional development, networking, and collaboration in improving and growing the community. This is why we run a number of events and participate in many others.

Each year, we host and sponsor events across the UK, Europe and the US. Each event is open everyone —experienced or curious— and aims to help people share and gain knowledge about Data Science and to get them involved with the wider community. To get you started we’ve put together a list of our events you can attend over the next 12 months:

Free community events

LondonR

We host LondonR in central London every two months. At each meet up we have three brilliant R presentations followed by networking drinks – which are on us. Where possible we also offer free workshops about a range of R topics, including Shiny, ggplot2 and the Tidyverse.

The next event is on 27 March at UCL, you can sign up to our mailing list to hear about future events.

Manchester R

Manchester R takes place four times a year. Following the same format as LondonR, you will get three presentations followed by networking drinks on us. We also offer free workshops before the main meeting so you can stay up-to-date with the latest tools.

Our next event is on 6 February where the R-Ladies are taking over for the night. For more information visit the Manchester R website.

Bristol Data Scientists

Our Bristol Data Science events have a wider focus, but they follow the same format as our R user groups – three great presentations from the community and then drinks on us. If you’re interested in Data Science, happen to be a Data Scientist or work with data in some way then you are welcome to join us.

This year, we’re introducing free Data Science workshops before the meeting, so please tell us what you’d like to hear more about.

The Bristol meetup takes place four times a year at the Watershed in central Bristol. If you’d like to come we recommend joining the meetup group to stay in the loop.

BaselR

This meet up is a little further afield, but if you’re based in or near Basel, you’ll catch us twice a year running this R user group. Visit the BaselR websitefor details on upcoming events.

OxfordR

As you may have guessed, we love R, so we try to support the community where we can. We’ve partnered up with OxfordR this year to bring you pizza and wine while you network after the main presentation. OxfordR is held on the first Monday of every month, you can find details here on their website.

BirminghamR

BirminghamR is under new management and we are helping them get started. Their first event for 2018 is coming up on 25 January; for more information check out their meetup page.

Data Engineering London

One of our newest meetup groups focuses on Data Engineering. We hold two events a year that give Data Engineers in London the opportunity to listen to talks on the latest technology, network with fellow engineers and have a drink or two on us. The next event will be announced in the coming months. To stay up-to-date please visit the meetup group.

Speaking opportunities

As well as attending our free events, you can let us know if you’d like to present a talk. If you have something you’d like to share just get in touch with the team by emailing us.

EARL Conferences

Our EARL Conferences were developed on the success of our R User Groups and the rapid growth of R in enterprise. R users in organisations around the country were looking for a place to share, learn and find inspiration. The enterprise focus of EARL makes it ideal for people to come and get some ideas to implement in the workplace. Every year delegates walk away feeling inspired and ready to work R magic in their organisations.

This year our EARL Conference dates are: London: 11-13 September at The Tower Hotel Seattle: 7 November at Loews Hotel 1000 Houston, 9 November at Hotel Derek Boston, 13 November at The Charles Hotel

Speak at EARL

If you’re doing exciting things with R in your organisation, submit an abstract so others can learn from your wins. Accepted speakers get a free ticket for the day they are speaking.

Catch us at…

As well as hosting duties we are proud to sponsor some great community events, including PyData London in April and eRum in May.

Plus, you’ll find members of the Mango team speaking at Data Science events around the country. If you’d love to have one of them present at your event, please do get in touch.

Wherever you’re based we hope we will see you soon.

Field Guide to the R Ecosystem
Blogs home Featured Image
Mark Sellors, Head of Data Engineering

I started working with R around about 5 years ago. Parts of the R world have changed substantially over that time, while other parts remain largely the same. One thing that hasn’t changed however, is that there has never been a simple, high-level text to introduce newcomers to the ecosystem. I believe this is especially important now that the ecosystem has grown so much. It’s no longer enough to just know about R itself. Those working with, or even around R, must now understand the ecosystem as a whole in order to best manage and support its use.

Hopefully the Field Guide to the R Ecosystem goes some way towards filling this gap.

The field guide aims to provide a high level introduction to the R ecosystem. Designed for those approaching the language for the first time, managers, ops staff, and anyone that just needs to get up to speed with the R ecosystem quickly.

This is not a programming guide and contains no information about the language itself, so it’s very definitely not aimed at those already developing with R. However, it is hoped that the guide will be useful to people around those R users. Whether that’s their managers, who’d just like to understand the ecosystem better, or ops staff tasked with supporting R in an enterprise, but who don’t know where to start.

Perhaps, you’re a hobbyist R user, who’d like to provide more information to your company in order to make a case for adopting R? Maybe you’re part of a support team who’ll be building out infrastructure to support R in your business, but don’t know the first thing about R. You might be a manager or executive keen to support the development of an advanced analytics capability within your organisation. In all of these cases, the field guide should be useful to you.

It’s relatively brief and no prior knowledge is assumed, beyond a general technical awareness. The topics covered include, R, packages and CRAN, IDEs, R in databases, commercial versions of R, web apps and APIs, publishing and the community.

I really hope you, or someone around you, finds the guide useful. If you have any feedback, find me on twitter and let me know. If you you’d like to propose changes to the guide itself, you’ll find instructions in the first chapter and the bookdown source on GitHub. Remember, the guide is intentionally high-level and is intended to provide an overview of the ecosystem only, rather than any deep-dive technical discussions. There are already plenty of great guides for that stuff!

I’d also like to say a huge thanks to everyone who has taken time out of their day to proof read this for me and provide invaluable feedback, suggestions and corrections. The community is undoubtedly one of R’s greatest assets.

Originally posted on Mark’s blog, here.

Blogs home Featured Image

Prelude

Maybe you’re looking for a change of scene. Maybe you’re looking for your first job. Maybe you’re stuck in conversation with a relative who you haven’t spoken to since last Christmas and who has astonishingly strong opinions on whether cells ought to be merged or not in Excel spreadsheets.

The fact of the matter is that you have just encountered the term “data science” for the first time, and it sounds like it might be interesting but you don’t have a clue what it is. Something to do with computers? Should you bring a lab coat, or a VR headset? Or both? What is a data and how does one science it?

Fear not. I am here to offer subjective, questionable and most importantly FREE advice from the perspective of someone who was in that very position not such a long time ago. Read on at your peril.

I. Adagio: Hear about data science

This is the hard bit. It’s surprisingly difficult to stumble upon data science unless someone tells you about it.

But the good news is that you’re reading this, so you’ve already done it. Possibly a while ago, or possibly just now; either way, put a big tick next to Step 1. Congratulations!

(By the way, you’ll remember the person who told you about data science. When you grow in confidence yourself, be someone else’s “person who told me about data science”. It’s a great thing to share. But all in good time…)

II. Andante: Find out more

But what actually is data science?

To be honest, it’s a fairly loosely-defined term. There are plenty of articles out there that try to give an overview, but most descend into extended discussions about the existence of unicorns or resort to arranging countless combinations of potentially relevant acronyms in hideous indecipherable Venn diagrams.

You’re much better off finding examples of people “doing” data science. Find some blogs (here are a few awesome ones to get you started) and read about what people are up to in the real world.

Don’t be afraid to narrow down and focus on a specific topic that interests you – there’s so much variety out there that you’re bound to find something that inspires you to keep reading and learning. But equally, explore as many new areas as you can, because the more context you can get about the sector the better your understanding will be and you’ll start to see how different subjects and different roles relate to each other.

Believe it or not, one of the best tools for keeping up to date with the latest developments in the field is Twitter. If you follow all your blog-writing heroes, not only will you be informed whenever they publish a new article but you’ll also get an invaluable glimpse into their day-to-day jobs and working habits, as well as all the cool industry-related stuff they share. Even if you never tweet anything yourself you’ll be exposed to much more than you’d be able to find on your own. If you want to get involved there’s no need to be original – you could just use it to share content you’ve found interesting yourself.

If you’re super keen, you might even want to get yourself some data science books tackling a particular topic. Keep an eye out for free online/ebook versions too!

III. Allegretto: Get hands-on

Observing is great, but it will only get you so far.

Imagine that you’ve just heard about an amazing new thing called “piano”. It sounds great. No, it sounds INCREDIBLE. It’s the sort of thing you really want to be good at.

So you get online and read more about it. Descriptions, analyses, painstaking breakdowns of manual anatomy and contrapuntal textures. You watch videos of people playing pianos, talking about pianos, setting pianos on fire and hurling them across dark fields. You download reams of free sheet music and maybe even buy a book of pieces you really want to learn.

But at some point… you need to play a piano.

The good news is that with data science, you don’t need to buy a piano, or find somewhere to keep it, or worry about bothering your family/friends/neighbours/pets with your late-night composing sessions.

Online interactive coding tutorials are a great place to start if you want to learn a new programming language. Sites like DataCamp and Codecademy offer a number of free courses to get yourself started with data science languages like R and Python. If you are feeling brave enough, take the plunge and run things on your own machine! (I’d strongly recommend using R with RStudio and using Anaconda for Python.) Language-specific “native-format” resources such as [SWIRL]() for R or this set of Jupyter notebooks for Python are a great way to learn more advanced skills. Take advantage of the exercises in any books you have – don’t just skip them all!

Data science is more than just coding though – it’s all about taking a problem, understanding it, solving it and then communicating those ideas to other people. So Part 1 of my Number One Two-Part Top Tip for you today is:

  1. Pick a project and write about it

How does one “pick a project”? Well, find something that interests you. For me it was neural networks (and later, car parks…) but it could be literally anything, so long as you’re going to be able to find some data to work with. Maybe have a look at some of the competitions hosted on Kaggle or see if there’s a group in your area which publishes open data.

Then once you’ve picked something, go for it! Try out that cool package you saw someone else using. Figure out why there are so many missing values in that dataset. Take risks, explore, try new things and push yourself out of your comfort zone. And don’t be afraid to take inspiration from something that someone else has already done: regardless of whether you follow the same process or reach the same outcome, your take on it is going to be different to theirs.

By writing about that project – which is often easier than deciding on one in the first place – you’re developing your skills as a communicator by presenting your work in a coherent manner, rather than as a patchwork of dodgy scripts interspersed with the occasional hasty comment. And even if you don’t want to make your writing public, you’ll be amazed how often you go back and read something you wrote before because it’s come up again in something else you’re working on and you’ve forgotten how to do it.

I’d really encourage you to get your work out there though. Which brings us smoothly to…

IV. Allegro maestoso: Get yourself out there

If you never play the piano for anyone else, no-one’s ever going to find out how good you are! So Part 2 of my Number One Two-Part Top Tip is:

  1. Start a blog

It’s pretty easy to get going with WordPress or similar, and it takes your writing to the next level because now you’re writing for an audience. It may not be a very big audience, but if someone, somewhere finds your writing interesting or useful then surely it’s worth it. And if you know you’re potentially writing for someone other than yourself then you need to explain everything properly, which means you need to understand everything properly. I often learn more when I’m writing up a project than when I’m playing around with the code in the first place.

Also, a blog is a really good thing to have on your CV and to talk about at interviews, because it gives you some physical (well, virtual) evidence which you can point at as you say “look at this thing wot I’ve done”.

(Don’t actually say those exact words. Remember that you’re a Good Communicator.)

If you’re feeling brave you can even put that Twitter account to good use and start shouting about all the amazing things you’re doing. You’ll build up a loyal following amazingly quickly. Yes, half of them will probably be bots, but half of them will be real people who enjoy reading your work and who can give you valuable feedback.

Speaking of real people…

  1. Get involved in the community

Yes, that was indeed Part 3 of my Number One Two-Part Top Tip, but it’s so important that it needs to be in there.

The online data science community is one of the best out there. The R community in particular is super friendly and supportive (check out forums like RStudio Community, community groups like R4DS, and the #rstats tag on Twitter). Get involved in conversations, learn from people already working in the sector, share your own knowledge and make friends.

Want to go one better than online?

Get a Meetup account, sign up to some local groups and go out to some events. It might be difficult to force yourself to go for the first time, but pluck up the courage and do it. Believe me when I say there’s no substitute for meeting up and chatting to people. Many good friends are people I met for the first time at meetups. And of course, it’s the perfect opportunity to network – I met 5 or 6 of my current colleagues through BathML before I even knew about Mango! (If you’re in or near Bristol or London, Bristol Data Scientists and LondonR are both hosted by Mango and new members are always welcome!)

Postlude

Of course, everything I’ve just said is coming from my point of view and is entirely based on my own experiences.

For example, I’ve talked about coding quite a lot because I personally code quite a lot; and I code quite a lot because I enjoy it. That might not be the case for you. That’s fine. In fact it’s more than “fine”; the huge diversity in people’s backgrounds and interests is what makes data science such a fantastic field to be working in right now.

Maybe you’re interested in data visualisation. Maybe you’re into webscraping. Or stats. Or fintech, or NLP, or AI, or BI, or CI. Maybe youare the relative at Christmas dinner who won’t stop banging on about why you should NEVER, under ANY circumstances, merge cells in an Excel spreadsheet (UNLESS it is PURELY for purposes of presentation).

Oh, why not:

  1. Find the parts of data science that you enjoy and arrange them so that they work for you.