Blogs home Featured Image

We hope you enjoyed EARL London as much as we did! We’re just putting the finishing touches on our highlights recap, but until then you can view the presentations we have available here, and all the photos here.

We’re so proud of the incredible speakers at this year’s EARL Conference – from tackling human trafficking, to benefitting the NHS, it’s always incredible to hear about the many different ways data science can be used for good.

Let’s take a look at some of the feedback and comments we received on this year’s London EARL:

From social:

Great to have been on the same speaking bill as experts from @BMW @sainsburys the BBC and others talking about #opensource, #R and #bigdata. World class speakers on cutting edge topics to an international audience. Thanks @earlconf– see you next year!

Had a brilliant day today at my 3rd @earlconf Have come away with a million ideas, in particular on workflows and productionising R code, and it was brilliant to both see @juliasilge speak and then have her sit in on my presentation! Thanks @MangoTheCatfor a great #earlconf

Dang, #earlconf is over again? I really need to bribe my boss to send me there one year…

Harold Selman also wrote a great round up of the conference on his Linkedin page.

From delegate feedback:

94% of delegates who replied to the feedback said they got what they wanted from EARL London, we were also left some lovely comments:

My whole data science team attended the conference and gained from both a wider learning experience and team building. The team got to discuss potential new projects out of the office environment, which really enhanced our brainstorming! This will add huge value to our future business benefit.

Fantastic conference. EARL never lets me down in providing an insightful, applicable and fun experience to learn from other companies on how they apply R in their enterprises.

EARL is a must for any organisation that wants to understand how R and technologies around R can enable them to achieve better results.

As an R user I’ve been lucky enough to attend EARL now for the last three years. Each time I come away with a fizz of ideas and energised for my year ahead

And not forgetting our mention in the Telegraph! 

A thank you

To all of our speakers, sponsors, delegates and the whole Mango team for another fantastic EARL. We will get started on EARL 2020 in early January – so keep an eye out for the call for abstracts then! If you have any comments or ideas either tweet us or drop us an email.

Blogs home Featured Image

When technical capabilities and company culture combine, IoT-fed data lakes become a powerful brain at the heart of the business

Internet-enabled devices have led to an explosion in the growth of data. On its own, this data has some value, however, the only way to unlock its full potential is by combining it with other data that businesses already hold.

Together, pre-existing data and newly-minted IoT data can provide a full picture of specific insights around a single consumer. It is paramount, however, that companies don’t prioritise innovation at the expense of ethics. Sourcing and analytics must be done correctly – with the right context that respects consumer privacy and wishes around data usage.

The insights gained from successfully blending these two different data sources also unlock secondary benefits including new product development, possible upsells or the ability to build customer goodwill through advice-driven service delivery.

It’s a winning combination, but the challenge is how to actually merge device data with regular customer information.

No easy fit

This problem arises from the fact that IoT device data is a different “shape” to data in traditional customer records.

If you think of a customer record in a sales database as one long row of information, IoT collected information is more like an entire column of time series information, with a supporting web of additional detail. Trying to directly join the two is near impossible, and it is likely that some valuable semantic information could end up lost in the process.

But if IoT information fundamentally resists structure, and existing business databases are built on rigid structures, how do you find an environment that works for both? The answer is a data lake.

Pooling insight

A data lake is a more “fluid” approach to storing and connecting data. It is a central repository where data can be stored in the form it’s generated, whether that is in a relational database format or entirely unstructured. Analytics can then be applied over the top to connect different pieces of information and derive useful business insights.

However, there is more complexity involved in setting up a data lake than just combining all of an organisation’s data and hoping for the best. If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful.

This can be avoided using the expertise of dedicated data engineers. These are the masterminds who build the framework for a data lake and manage the process of extracting data from its source, before transforming it into a usable format and then loading it into the data lake environment. Done properly, this will ensure data provenance, with appropriate metadata to guide users on allowable use cases and analysis.

“If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful”

This sounds like a significant undertaking, and there’s no getting around the fact that doing data lakes right does take time and effort, but it is possible to take a staged approach. Many organisations start with a data “puddle” – a small collection of computers hosting a limited amount of data — and then slowly add to this, increasing the number of computers over time to form the full data lake.

A question of culture

In addition, technical considerations are just one side of the coin. The other side is one of culture. At the core of the problem is that businesses will not succeed with commercialising their IoT data if users are either unaware of, or distrusting of, the data lake and its potential.

While investment in big data continues to grow, a recent NewVantage Partners survey on Big Data and AI found that just 31 percent of organisations consider themselves data driven — the second year in a row that the number has fallen. Data lake technology has been around for several years now, and should be more than capable of enabling these types of organisations, but without the right culture in place, its benefits are seldom felt.

How do you create a culture that centres on being data-driven? As any management team knows, culture shifts are never easy, but a data-driven culture boils down to improving collaboration, communication and understanding between data professionals and business functions.

With a successful technical implementation of a data lake, you then need data professionals to advocate its benefits, and liaise with business departments to understand the types of insights that would be most useful to inform strategic decisions.

This then reinforces business confidence in the data function, and allows the data teams to expand their contributions to the business and be recognised for their hard work. When supported by senior buy-in, this positive feedback loop generates a growing culture of data savviness and data-driven approaches within the organisation.

Brain of the organisation

When technical capabilities and company culture combine, data lakes can become a powerful brain at the heart of the business. With the right analytics tools layered over the top, data lakes can reduce the time to finding insights and surface powerful information. These insights can serve business needs better and faster and are an outright win for any organisation. In short, they are well worth the time and investment.

Author: Dean Wood, Principal Data Scientist

Blogs home Featured Image

We were thrilled to host Hadley Wickham who delivered, as ever, a funny and engaging talk to a packed house at LondonR in August. In fact, to give you an idea of how much anticipated this event was, tickets to see Hadley sold out in under two hours!

It’s always fascinating for us elder members of the R community who remember the good old days, to witness the move from academic tools through to commercial adoption and engagement. For many years, R was proposed and rejected by many organisations due to the environment and architecture that existed. We used to spend time trying to work out data sizes and whether things would help.

I remember talking to Hadley at the first EARL in the US about creating toolsets that allowed organisations who didn’t “Love” R to use it and deploy it internally, comfortably. Hadley’s and latterly his team’s work, has allowed the ecosystem around R to develop from introspection, to a wide view of the analytic landscape, and his talk reflected I felt on some of these shifts.

Hadley’s insight into the mistakes he has made rang very true when considering the scale of the user base today compared to when he started developing packages. That moment of clarity when you realise that you need to prepare things in order for people you don’t know to pick up and use them efficiently, lies at the heart of good programming practice but sometimes is easily forgotten. This has driven Hadley on, to create better and easier codebases that are central platforms but also initiating others thoughts and developments.

It was great to hear someone like Hadley acknowledge that innovation isn’t a straight line and that forking and dead ends are essential parts of the process. Speaking to attendees afterwards, this message was highly prized and it felt as though there was an increased confidence with many attendees to go out and try things without the fear of failure.

All in all a fantastic evening that reinforced just how great the R community is.

If you’d like to view Hadley’s LondonR presentation, you can download it here.