BigDataLDN
Blogs home Featured Image

Big Data LDN (London), Mango Solutions’ big event takeaways: The ideal forum for sharing data expertise  

What better glimpse of the 2020’s data and analytics community than at the UK’s largest data event, Big Data LDN – and how great to be back! With 130 leading technology vendors and over 180 expert speakers, the London show on 23/24 September generated thousands of data and technology-led conversations around effective data initiatives, with real world use cases and panel debates. Everything from leadership, data culture and communities to the ethics of AI, data integration, data science, adoption of the data cloud, hyper automation, governance & sovereignty, and much more…

With a captive group of decision-makers, Mango Solutions conducted a Data Maturity floor survey of Big Data LDN participants. Respondents were asked to identify where their organisations are on their data journey, including the opportunities for data science strategy articulation, data communities, and what their biggest challenge is; with even the most data-advanced organisations modestly commenting in some areas that they had ‘room for improvement’. The fascinating results of this survey will be revealed shortly.

Conversations were approached with eagerness, around ‘harnessing data’, ‘getting more of it’, ‘data platforms’, ‘monetising data’, ‘kicking-off some AI’ – Big Data LDN being the first in-person analytics event since Covid-19 up-scaled our worlds of work and data consumption. From bright computer science graduates, to startlingly young data analysts, through to a surprising amount of MD and C-suite representation at the other end of the scale. Companies from Harrods to UBS, University of St Andrews – the best of the best of brands came out to share insight and learn – buzzwords aside – the most investable routes from potential to advantage.

A low-key event, but highly intelligent, Big Data LDN gave us a feel for the new ambitious, hybrid workers. Back to face-to-face networking, sizeable audience sessions, hot-desking in the cafe; there was a positive hungry-to-learn feel right through the event. Everyone, whoever they were, seemed on a mission to share intelligence and opinion on data and advanced analytics. No one job title or company outclassed the other – just a mutually supportive community of technology experts; less ego and more mutual respect.

There was the ability to attend a suitable meet-up with practical hands-on use case examples. Mango hosted a data science meet-up with guest speakers from The Gym Group and The Bank of England as well as our own Mango Consultant sharing best practice, which was well attended.

Chief Data Scientist at Mango Solutions, Rich Pugh’s talk on finding fantastic data initiatives was well-received by the audience. This focused on prioritising high impact data initiatives with buy-in from across the business. In particular how the role of data domains, supported by data and literacy, can create bridges between business ambition and data investment. In our experience, too often pitfalls exist on selecting the right data initiatives or even trying to answer the wrong question with data; so keeping a focus on achieving business goals and objectives is critical.

But as well as finding out about the latest advanced data services and technologies, this event was a great opportunity to step back from the day-to-day and be reminded of what needs to come first in any data strategy – achieving company goals, and measuring progress, in the form of clear KPIs which we can impact with data and analytics, before jumping in to make hasty data decisions. In this way, with a clear link between business goals and data, we can measure the return from data investments.

My main ‘don’t forget’ take-aways from the event:

  • It’s about aligning data initiatives with business aims. At Big Data LDN, there was consistent messaging around an ‘outcome-first mindset’. To make sure you start with business outcomes and objectives before considering the data, software and platform. It’s a valuable mantra to operate by, to make sure you always deliver the right data science, BI or software.
  • Focus on dis-benefits as well as benefits. As much as we focus on what things we should do to get value from data, we mustn’t lose focus on stopping doing the things we shouldn’t keep doing, or considering the opportunity and benefits lost when we overlook or don’t implement a particular data capability.
  • A robust data strategy must show the benefits for each stakeholder. Different people need different things from a data strategy. To gain buy-in and approval from all stakeholders, you must clearly show each of them the value they will get. In return, you will get not only their sign-off, but also, more valuably, their advocacy.

A parting word

We all learned many things from Big Data LDN 2021 – For me, I’ve learned our industry collaboration and community of data experts is unquestionable. We know remote working works, but you can’t beat the energy of this in-person event to confirm why you love doing what you do. For me, sharing our data science insight, and learning more about our peers’ technologies and services was truly rewarding.

Big Data Challenge – Our survey respondents were asked their no.1 post-Covid data priority, and we received a real mixture of enlightening and some more predictable responses. We will share these challenges with you and explain how we can support in our analytic leadership series.

As a relatively new member of the Consulting Team, working alongside Rich Pugh, Jon will be working with our clients and supporting them through their data journeys. He brings significant expertise to the team from his experience in data maturity, data & analytics, data visualisations and implementing strategies that inform successful business change and improvement, from his previous roles at Oracle, EY, HP and IBM.

If you would like to speak to a member of our consulting team about your data-driven journey, please email us.

 

Blogs home Featured Image

Mango’s ‘Meet-Up’ at Big Data London on 22nd September features guest speaker Adam Hughes, Data Scientist for The Bank of England, whose remit involves working with incredibly rich datasets, feeding into strategic decision-making on monetary policy. You can read about Adam’s incredibly interesting data remit and his team’s journey through Covid-19, in this short Q&A.

Can you tell us about your interest in data and your role at Bank of England?

Working at the Bank it’s hard not to be interested in data! So much of what we do as an organisation is data driven, with access to some incredibly rich datasets enabling interesting analysis. In Advanced Analytics, we leverage a variety of data science skills to support policy-making and facilitate the effective use of big, complex and granular data sets. As a data scientist, I get involved in all of this, working across the data science workflow.

What’s the inspiration for your talk  – effectively data science at speed?

As with so much recently – Covid. With how fast things have been moving and changing, traditional data sources that policymakers were relying on weren’t being updated fast enough to reflect the situation.

Can you tell us about your data team’s journey through covid-19 and the impact it has had?

In a recent survey, the Bank of England sought to understand how Covid has affected the adoption and use of ML and DS across UK Banks. Half of the banks surveyed reported an increase in the importance of ML and DS as a result of the pandemic. Covid created a lot of demand for DS skills and expertise within the Bank of England too. Initially this led to some long hours, but it was motivating and generally rewarding to work on something so clearly important. Working remotely 100% of the time was a challenge at first, but generally the transition away from the office has been remarkably smooth in terms of day-to-day working (though there are still disadvantages due to the lack of face-to-face contact). As outputs have subsequently been developed and shared widely in the organisation, they have been an excellent advert for data science, showing the value it can add. In particular, it’s been great to see the business areas we worked with building up their local data science skills as a consequence.

What’s the talk about and what are the key takeaways?

The talk will cover some of the techniques we used to get, process and use new data sources under time pressure, including what we’ve learnt from the process. The key takeaways are:

  • Non-traditional datasets contain some really useful information – and can form part of the toolkit even in normal times;
  • Building partnerships is key;
  • A suite of useful building blocks, such as helper packages or code adapted from cleverer people helps speed things up;
  • Working fast doesn’t mean worse outcomes.

We look forward to seeing you at Mango’s Big Data London, Meet Up, 22nd September 6-8pm, Olympia ML Ops Theatre. You can sign up here.

Guest speaker, Adam Hughes is one of The Bank of England’s Data Scientists, https://www.linkedin.com/in/adam-james-hughes/

NHS-R Community
Blogs home Featured Image

The NHS is one of the UK’s most valued institutions and serves as the healthcare infrastructure for millions of people. Mango has had the pleasure of supporting their internal NHS-R community over the last few years, supporting the initiative from its inception and sharing our knowledge and expertise at their events as they seek to promote the wider usage and adoption of R and develop best practice solutions to NHS problems.

According to a recent survey by Udemy, 62% of organisations are focusing on closing skills gaps, essential to keeping teams competitive, up to date and armed with the relevant skills to adapt to future challenges.  For many institutions, an important first step is connecting their analytics teams and data professionals to encourage the collaboration and sharing of knowledge. With ‘Data literacy’ fast becoming the new computer literacy, workforces with strong data skills are fast realising the strength and value of such skills across the whole organisation.

As the UK’s largest employer, comprising 207 clinical commissioning groups, 135 acute non-specialist trusts and 17 acute specialist trusts in England alone, the NHS faces a particularly daunting task when it comes to connecting their data professionals, a vast group which includes clinicians as well as performance, information and health analysts.

The NHS-R community was the brainchild of Professor Mohammed Mohammed, Principal Consultant (Strategy Unit), Professor of Healthcare, Quality & Effectiveness at the University of Bradford. He argues,  “I’m pretty sure there is enough brain power in NHS to tackle any analytical challenge, but what we have to do is harness that power, promoting R as the incredible tool that it is, and one that can enable the growing NHS analytics community to work collaboratively, rather than in silos”.

Three years in and the NHS-R Community has begun to address that issue, bringing together once disparate groups and individuals to create a community, sharing insights, use cases, best practices and approaches, designed to create better outputs across the NHS with a key aim of improving patient outcomes.  Having delivered workshops at previous NHS-R conferences, Mango consultants were pleased to support the most recent virtual conference with two workshops – An Introduction to the Tidyverse and Text Analysis in R. These courses proved to be a popular choice with the conference attendees, attracting feedback such as “The workshop has developed my confidence for using R in advanced analysis” and “An easy to follow and clear introduction to the topic.”

Liz Mathews, Mango’s Head of Community, has worked with Professor Mohammed from the beginning, sharing information and learnings from our own R community work and experience.  Professor Mohammed commented:

“The NHS-R community has, from its very first conference, enjoyed support from Mango who have a wealth of experience in using R for government sector work and great insight in how to develop and support R based communities. Mango hosts the annual R in Industry conference (EARL) to which NHS-R Community members are invited and from which we have learned so much. We see Mango as a friend and a champion for the NHS-R Community.”

Blogs home Featured Image

Linux containers, of which Docker is the most well known, can be a really great way to improve reproducibility on your data projects (for more info see here), and create portable, reusable applications. But how would we manage the deployment of multiple containerised applications?

Kubernetes is an open source container management platform that automates the core operations for you. It allows you to automatically deploy and scale containerised applications and removes the manual steps that would otherwise be involved. Essentially, you cluster together groups of hosts running Linux containers, and Kubernetes helps you easily and efficiently manage those clusters. This is especially effective in cloud based environments.

Why use kubernetes in your data stack?

Since Kubernetes orchestrates containers and since containers are a great way to bundle up your applications with their dependencies — thus improving reproducibility — Kubernetes is a natural fit if you’re aiming for high levels of automation in your stack.

Kubernetes allows you to manage containerised apps that span multiple containers as well as scale and schedule the containers as necessary across the cluster.

For instance, if you’re building stateless microservices in flask (Python) and plumber (R) it’s easy to initially treat running them in containers as though they were running in a simple virtual machine. However, once these containers are in a production environment and scale becomes much more important, you’ll likely need to run multiple instances of the containers and Kubernetes can take care of that for you.

Automation is a key driver

When container deployments are small it can be tempting to try to manage them by hand. Starting and stopping the containers that are required to service your application. But this approach is very inflexible and beyond the smallest of deployments such an approach is not really practical. Kubernetes is designed to manage the complexity of looking after production scale container deployments. This takes away the complexity of trying to manage such systems by hand as they can quickly reach a size and level of complexity that does not lend itself to error-prone manual management.

Scheduling is another often overlooked feature of Kubernetes in data processing pipelines, as you could, for example, schedule refreshes of models in order to keep them fresh. Such processes could be scheduled for times when you know the cluster will be otherwise quiet (such as overnight, or on weekends), with the refreshed model being published automatically.

The Case for Kubernetes in your data stack

More broadly, it helps you fully implement and rely on a container-based infrastructure in production environments. This is especially beneficial when you’re trying to reduce infrastructure costs as it allows you to keep your cluster size at the bare minimum required to run your applications, which in turn saves you money on wasted compute resource.

The features of Kubernetes are too long to list here, but the key things to take away is that it can be used to run containerised apps across multiple hosts, can scale applications on the fly, can auto-restart applications that have fallen over and help automate deployments.

The wider Kubernetes ecosystem relies on many other projects to deliver these fully orchestrated services. These additional projects provide such additional features as registry services for your containers, networking, security and so on.

Kubernetes offers a rich toolset to manage complex application stacks and with data science, engineering and operations becoming increasingly large scale, automation is a key driver for many new projects. If you’re not containerising your apps yet, jumping into Kubernetes can seem daunting, but if you start small by building out some simple containerised applications to start with, the benefits of this approach should become clear pretty quickly.

For an in-depth technical look at running Kubernetes, this post by Mark Edmondson offers an excellent primer.