NHS-R Community
Blogs home Featured Image

The NHS is one of the UK’s most valued institutions and serves as the healthcare infrastructure for millions of people. Mango has had the pleasure of supporting their internal NHS-R community over the last few years, supporting the initiative from its inception and sharing our knowledge and expertise at their events as they seek to promote the wider usage and adoption of R and develop best practice solutions to NHS problems.

According to a recent survey by Udemy, 62% of organisations are focusing on closing skills gaps, essential to keeping teams competitive, up to date and armed with the relevant skills to adapt to future challenges.  For many institutions, an important first step is connecting their analytics teams and data professionals to encourage the collaboration and sharing of knowledge. With ‘Data literacy’ fast becoming the new computer literacy, workforces with strong data skills are fast realising the strength and value of such skills across the whole organisation.

As the UK’s largest employer, comprising 207 clinical commissioning groups, 135 acute non-specialist trusts and 17 acute specialist trusts in England alone, the NHS faces a particularly daunting task when it comes to connecting their data professionals, a vast group which includes clinicians as well as performance, information and health analysts.

The NHS-R community was the brainchild of Professor Mohammed Mohammed, Principal Consultant (Strategy Unit), Professor of Healthcare, Quality & Effectiveness at the University of Bradford. He argues,  “I’m pretty sure there is enough brain power in NHS to tackle any analytical challenge, but what we have to do is harness that power, promoting R as the incredible tool that it is, and one that can enable the growing NHS analytics community to work collaboratively, rather than in silos”.

Three years in and the NHS-R Community has begun to address that issue, bringing together once disparate groups and individuals to create a community, sharing insights, use cases, best practices and approaches, designed to create better outputs across the NHS with a key aim of improving patient outcomes.  Having delivered workshops at previous NHS-R conferences, Mango consultants were pleased to support the most recent virtual conference with two workshops – An Introduction to the Tidyverse and Text Analysis in R. These courses proved to be a popular choice with the conference attendees, attracting feedback such as “The workshop has developed my confidence for using R in advanced analysis” and “An easy to follow and clear introduction to the topic.”

Liz Mathews, Mango’s Head of Community, has worked with Professor Mohammed from the beginning, sharing information and learnings from our own R community work and experience.  Professor Mohammed commented:

“The NHS-R community has, from its very first conference, enjoyed support from Mango who have a wealth of experience in using R for government sector work and great insight in how to develop and support R based communities. Mango hosts the annual R in Industry conference (EARL) to which NHS-R Community members are invited and from which we have learned so much. We see Mango as a friend and a champion for the NHS-R Community.”

Blogs home Featured Image

Linux containers, of which Docker is the most well known, can be a really great way to improve reproducibility on your data projects (for more info see here), and create portable, reusable applications. But how would we manage the deployment of multiple containerised applications?

Kubernetes is an open source container management platform that automates the core operations for you. It allows you to automatically deploy and scale containerised applications and removes the manual steps that would otherwise be involved. Essentially, you cluster together groups of hosts running Linux containers, and Kubernetes helps you easily and efficiently manage those clusters. This is especially effective in cloud based environments.

Why use kubernetes in your data stack?

Since Kubernetes orchestrates containers and since containers are a great way to bundle up your applications with their dependencies — thus improving reproducibility — Kubernetes is a natural fit if you’re aiming for high levels of automation in your stack.

Kubernetes allows you to manage containerised apps that span multiple containers as well as scale and schedule the containers as necessary across the cluster.

For instance, if you’re building stateless microservices in flask (Python) and plumber (R) it’s easy to initially treat running them in containers as though they were running in a simple virtual machine. However, once these containers are in a production environment and scale becomes much more important, you’ll likely need to run multiple instances of the containers and Kubernetes can take care of that for you.

Automation is a key driver

When container deployments are small it can be tempting to try to manage them by hand. Starting and stopping the containers that are required to service your application. But this approach is very inflexible and beyond the smallest of deployments such an approach is not really practical. Kubernetes is designed to manage the complexity of looking after production scale container deployments. This takes away the complexity of trying to manage such systems by hand as they can quickly reach a size and level of complexity that does not lend itself to error-prone manual management.

Scheduling is another often overlooked feature of Kubernetes in data processing pipelines, as you could, for example, schedule refreshes of models in order to keep them fresh. Such processes could be scheduled for times when you know the cluster will be otherwise quiet (such as overnight, or on weekends), with the refreshed model being published automatically.

The Case for Kubernetes in your data stack

More broadly, it helps you fully implement and rely on a container-based infrastructure in production environments. This is especially beneficial when you’re trying to reduce infrastructure costs as it allows you to keep your cluster size at the bare minimum required to run your applications, which in turn saves you money on wasted compute resource.

The features of Kubernetes are too long to list here, but the key things to take away is that it can be used to run containerised apps across multiple hosts, can scale applications on the fly, can auto-restart applications that have fallen over and help automate deployments.

The wider Kubernetes ecosystem relies on many other projects to deliver these fully orchestrated services. These additional projects provide such additional features as registry services for your containers, networking, security and so on.

Kubernetes offers a rich toolset to manage complex application stacks and with data science, engineering and operations becoming increasingly large scale, automation is a key driver for many new projects. If you’re not containerising your apps yet, jumping into Kubernetes can seem daunting, but if you start small by building out some simple containerised applications to start with, the benefits of this approach should become clear pretty quickly.

For an in-depth technical look at running Kubernetes, this post by Mark Edmondson offers an excellent primer.