Blogs home Featured Image

Sara Hamilton Shortlisted for Women in Tech Excellence Awards 

We’re delighted that Sara’s efforts have been recognised in her recent shortlist nomination for this award, in the Transformation Leader:Tech category. 

Despite a record number of entries this year, Deputy Director, Product & Managed Service’s Sara’s technically accomplished team mentor, coach and passionate approach for user experience, alongside her desire to streamline processes, improve efficiency and develop products from conception – made her worthy of reaching the shortlist. 

As an active member of Ascent’s Women in Tech user group, Sara actively aims to contribute to change towards workplace diversity, with her belief that it can bring huge benefits to tech companies in terms of creativity, improved problem solving and improved products. As an active social campaigner, Sara believes how employers should emphasise how there is no heroism in martyrdom or being seen to be working and the real money is in working effectively and sustainably – something she actively encourages as part of her team.  

Experienced in agile transformation, scrum implementation, product and programme ownership, internal audits and quality management, Sara has been responsible for introducing agile processes into product development at Mango. With colleagues, describing Sara as a ‘fantastic team lead, who has instigated change at a team and process level – testing technology delivery and bringing a genuine culture of ownership to the team’, it’s not surprising Sara’s talent’s have been recognised.  

We look forward to the attending the awards dinner on 24th November and congratulating Sara on her achievements at Mango, alongside colleague Layla Marshall, Ascent’s Director of Product & Marketing who has also been shortlisted for the ‘Outstanding Returner Award’.     

 

British Science Week
Blogs home Featured Image

As a data science consultancy, we’d like to celebrate British Science Week #BSW21 and the innovation in science, technology, maths and engineering, as well as the diversity of these roles. Many of our graduate consultants join Mango from a Maths and Statistics background, but also equally from a science background where many of the data and analytic approaches, including R are introduced.

Can data science be classified as a Science? We asked our Consultants, and the answer was a resounding ‘yes’, as our Graduate Data Scientist, Elizabeth Brown explains. “In my opinion, Data Science is a Science. The goal of Science is to gain a better understanding of the world around us, to explain why things happen or to describe the relationship between concepts. A big part of science is taking this understanding and applying it to real world situations, whether that be making advancements in medicine as we have witnessed with the vaccine development, or a modern way of introducing scientific methods to automate processes and make more intelligent decisions – ‘innovating for the future’, just as the theme for British Science Week this year. Like a science, we make observations, come up with hypotheses and through experimentation, test our hypotheses”.

Rich Pugh, Mango’s Chief Data Scientist agrees, “When done right, data science should be based on, and resemble, the scientific method. We formulate a “hypothesis” (although the structure of this can vary across application), then we use data to test that hypothesis”.

“Data is everywhere and being able to use it effectively to improve our understanding of the world is very exciting – expanding data-driven decision making, scientific discovery and  automation”, explains Elizabeth.

The growing capabilities of AI and machine learning are paving the way for real world solutions such as self-driving cars, much needed fraud prevention and addressing climate change. Fundamentally, data science isn’t just solving business problems, as a career it can support initiatives to create a healthier, greener and kinder world.

If you are interested in a data science career with Mango, click here to find out more.

Blogs home Featured Image

Linux containers, of which Docker is the most well known, can be a really great way to improve reproducibility on your data projects (for more info see here), and create portable, reusable applications. But how would we manage the deployment of multiple containerised applications?

Kubernetes is an open source container management platform that automates the core operations for you. It allows you to automatically deploy and scale containerised applications and removes the manual steps that would otherwise be involved. Essentially, you cluster together groups of hosts running Linux containers, and Kubernetes helps you easily and efficiently manage those clusters. This is especially effective in cloud based environments.

Why use kubernetes in your data stack?

Since Kubernetes orchestrates containers and since containers are a great way to bundle up your applications with their dependencies — thus improving reproducibility — Kubernetes is a natural fit if you’re aiming for high levels of automation in your stack.

Kubernetes allows you to manage containerised apps that span multiple containers as well as scale and schedule the containers as necessary across the cluster.

For instance, if you’re building stateless microservices in flask (Python) and plumber (R) it’s easy to initially treat running them in containers as though they were running in a simple virtual machine. However, once these containers are in a production environment and scale becomes much more important, you’ll likely need to run multiple instances of the containers and Kubernetes can take care of that for you.

Automation is a key driver

When container deployments are small it can be tempting to try to manage them by hand. Starting and stopping the containers that are required to service your application. But this approach is very inflexible and beyond the smallest of deployments such an approach is not really practical. Kubernetes is designed to manage the complexity of looking after production scale container deployments. This takes away the complexity of trying to manage such systems by hand as they can quickly reach a size and level of complexity that does not lend itself to error-prone manual management.

Scheduling is another often overlooked feature of Kubernetes in data processing pipelines, as you could, for example, schedule refreshes of models in order to keep them fresh. Such processes could be scheduled for times when you know the cluster will be otherwise quiet (such as overnight, or on weekends), with the refreshed model being published automatically.

The Case for Kubernetes in your data stack

More broadly, it helps you fully implement and rely on a container-based infrastructure in production environments. This is especially beneficial when you’re trying to reduce infrastructure costs as it allows you to keep your cluster size at the bare minimum required to run your applications, which in turn saves you money on wasted compute resource.

The features of Kubernetes are too long to list here, but the key things to take away is that it can be used to run containerised apps across multiple hosts, can scale applications on the fly, can auto-restart applications that have fallen over and help automate deployments.

The wider Kubernetes ecosystem relies on many other projects to deliver these fully orchestrated services. These additional projects provide such additional features as registry services for your containers, networking, security and so on.

Kubernetes offers a rich toolset to manage complex application stacks and with data science, engineering and operations becoming increasingly large scale, automation is a key driver for many new projects. If you’re not containerising your apps yet, jumping into Kubernetes can seem daunting, but if you start small by building out some simple containerised applications to start with, the benefits of this approach should become clear pretty quickly.

For an in-depth technical look at running Kubernetes, this post by Mark Edmondson offers an excellent primer.

 

Blogs home

(Or, how to write a Shiny app.R file that only contains a single line of code)

This post is long overdue. The information contained herein has been built up over years of deploying and hosting Shiny apps, particularly in production environments, and mainly where those Shiny apps are very large and contain a lot of code.

Last year, during some of my conference talks, I told the story of Mango’s early adoption of Shiny and how it wasn’t always an easy path to production for us. In this post I’d like to fill in some of the technical background and provide some information about Shiny app publishing and packaging that is hopefully useful to a wider audience.

I’ve figured out some of this for myself, but the most pivotal piece of information came from Shiny creator, Joe Cheng. Joe told me some time ago, that all you really need in an app.R file is a function that returns a Shiny application object. When he told me this, I was heavily embedded in the publication side and I didn’t immediately understand the implications.

Over time though I came to understand the power and flexibility that this model provides and, to a large extent, that’s what this post is about.

What is Shiny?

Hopefully if you’re reading this you already know, but Shiny is a web application framework for R. It allows R users to develop powerful web applications entirely in R without having to understand HTML, CSS and JavaScript. It also allows us to embed the statistical power of R directly into those web applications.

Shiny apps generally consist of either a ui.R and a server.R (containing user interface and server-side logic respectively) or a single app.R which contains both. Why package a Shiny app anyway?

If your app is small enough to fit comfortably in a single file, then packaging your application is unlikely to be worth it. As with any R script though, when it gets too large to be comfortably worked with as a single file, it can be useful to break it up into discrete components.

Publishing a packaged app will be more difficult, but to some extent that will depend on the infrastructure you have available to you.

Pros of packaging

Packaging is one of the many great features of the R language. Packages are fairly straightforward, quick to create and you can build them with a host of useful features like built-in documentation and unit tests.

They also integrate really nicely into Continuous Integration (CI) pipelines and are supported by tools like Travis. You can also get test coverage reports using things like codecov.io.

They’re also really easy to share. Even if you don’t publish your package to CRAN, you can still share it on GitHub and have people install it with devtools, or build the package and share that around, or publish the package on a CRAN-like system within your organisation’s firewall.

Cons of packaging

Before you get all excited and start to package your Shiny applications, you should be aware that — depending on your publishing environment — packaging a Shiny application may make it difficult or even impossible to publish to a system like Shiny Server or RStudio Connect, without first unpacking it again.

* Since time of writing this information is now incorrect. Check out https://thinkr-open.github.io/golem/articles/c_deploy.html for more information on deploying packaged shinyapps to shiny server, shinyapps.io and rsconnect.

A little bit of Mango history

This is where Mango were in the early days of our Shiny use. We had a significant disconnect between our data scientists writing the Shiny apps and the IT team tasked with supporting the infrastructure they used. This was before we’d committed to having an engineering team that could sit in the middle and provide a bridge between the two.

When our data scientists would write apps that got a little large or that they wanted robust tests and documentation for, they would stick them in packages and send them over to me to publish to our original Shiny Server. The problem was: R packages didn’t really mean anything to me at the time. I knew how to install them, but that was about as far as it went. I knew from the Shiny docs that a Shiny app needs certain files (server.R and ui.R or an app.R) file, but that wasn’t what I got, so I’d send it back to the data science team and tell them that I needed those files or I wouldn’t be able to publish it.

More than once I got back a response along the lines of, “but you just need to load it up and then do runApp()”. But, that’s just not how Shiny Server works. Over time, we’ve evolved a set of best practices around when and how to package a Shiny application.

The first step was taking the leap into understanding Shiny and R packages better. It was here that I started to work in the space between data science and IT.

How to package a Shiny application

If you’ve seen the simple app you get when you choose to create a new Shiny application in RStudio, you’ll be familiar with the basic structure of a Shiny application. You need to have a UI object and a server function.

If you have a look inside the UI object you’ll see that it contains the html that will be used for building your user interface. It’s not everything that will get served to the user when they access the web application — some of that is added by the Shiny framework when it runs the application — but it covers off the elements you’ve defined yourself.

The server function defines the server-side logic that will be executed for your application. This includes code to handle your inputs and produce outputs in response.

The great thing about Shiny is that you can create something awesome quite quickly, but once you’ve mastered the basics, the only limit is your imagination.

For our purposes here, we’re going to stick with the ‘geyser’ application that RStudio gives you when you click to create a new Shiny Web Application. If you open up RStudio, and create a new Shiny app — choosing the single file app.R version — you’ll be able to see what we’re talking about. The small size of the geyser app makes it ideal for further study.

If you look through the code you’ll see that there are essentially three components: the UI object, the server function, and the shinyApp() function that actually runs the app.

Building an R package of just those three components is a case of breaking them out into the constituent parts and inserting them into a blank package structure. We have a version of this up on GitHub that you can check out if you want.

The directory layout of the demo project looks like this:

|-- DESCRIPTION
|-- LICENSE
|-- NAMESPACE
|-- R
|   |-- launchApp.R
|   |-- shinyAppServer.R
|   `-- shinyAppUI.R
|-- README.md
|-- inst
|   `-- shinyApp
|       `-- app.R
|-- man
|   |-- launchApp.Rd
|   |-- shinyAppServer.Rd
|   `-- shinyAppUI.Rd
`-- shinyAppDemo.Rproj

Once the app has been adapted to sit within the standard R package structure we’re almost done. The UI object and server function don’t really need to be exported, and we’ve just put a really thin wrapper function around shinyApp() — I’ve called it launchApp() — which we’ll actually use to launch the app. If you install the package from GitHub with devtools, you can see it in action.

library(shinyAppDemo)
launchApp()

This will start the Shiny application running locally.

The approach outlined here also works fine with Shiny Modules, either in the same package, or called from a separate package.

And that’s almost it! The only thing remaining is how we might deploy this app to Shiny server (including Shiny Server Pro) or RStudio Connect.

Publishing your packaged Shiny app

We already know that Shiny Server and RStudio Connect expect either a ui.R and a server.R or an app.R file. We’re running our application out of a package with none of this, so we won’t be able to publish it until we fix this problem.

The solution we’ve arrived at is to create a directory called ‘shinyApp’ inside the inst directory of the package. For those of you who are new to R packaging, the contents of the ‘inst’ directory are essentially ignored during the package build process, so it’s an ideal place to put little extras like this.

The name ‘shinyApp’ was chosen for consistency with Shiny Server which uses a ‘shinyApps’ directory if a user is allowed to serve applications from their home directory.

Inside this directory we create a single ‘app.R’ file with the following line in it:

shinyAppDemo::launchApp()

And that really is it. This one file will allow us to publish our packaged application under some circumstances, which we’ll discuss shortly.

Here’s where having a packaged Shiny app can get tricky, so we’re going to talk you through the options and do what we can to point out the pitfalls.

Shiny Server and Shiny Server Pro

Perhaps surprisingly — given that Shiny Server is the oldest method of Shiny app publication — it’s also the easiest one to use with these sorts of packaged Shiny apps. There are basically two ways to publish on Shiny Server. From your home directory on the server — also known as self-publishing — or publishing from a central location, usually the directory ‘/srv/shiny-server’.

The central benefit of this approach is the ability to update the application just by installing a newer version of the package. Sadly though, it’s not always an easy approach to take.

Apps served from home directory (AKA self-publishing)

The first publication method is from a users’ home directory. This is generally used in conjunction with RStudio Server. In the self-publishing model, Shiny Server (and Pro) expect apps to be found in a directory called ‘ShinyApps’, within the users home directory. This means that if we install a Shiny app in a package the final location of the app directory will be inside the installed package, not in the ShinyApps directory. In order to work around this, we create a link from where the app is expected to be, to where it actually is within the installed package structure.

So in the example of our package, we’d do something like this in a terminal session:

# make sure we’re in our home directory
cd
# change into the shinyApps directory
cd shinyApps
# create a link from our app directory inside the package
ln -s /home/sellorm/R/x86_64-pc-linux-gnu-library/3.4/shinyAppDemo/shinyApp ./testApp

Note: The path you will find your libraries in will differ from the above. Check by running .libPaths()[1] and then dir(.libPaths()[1]) to see if that’s where your packages are installed.

Once this is done, the app should be available at ‘http://<server-address>:3838//’ and can be updated by updating the installed version of the package. Update the package and the updates will be published via Shiny Server straight away.

Apps Server from a central location (usually /srv/shiny-server)

This is essentially the same as above, but the task of publishing the application generally falls to an administrator of some sort.

Since they would have to transfer files to the server and log in anyway, it shouldn’t be too much of an additional burden to install a package while they’re there. Especially if that makes life easier from then on.

The admin would need to transfer the package to the server, install it and then create a link — just like in the example above — from the expected location, to the installed location.

The great thing with this approach is that when updates are due to be installed the admin only has to update the installed package and not any other files.

RStudio Connect

Connect is the next generation Shiny Server. In terms of features and performance, it’s far superior to its predecessor. One of the best features is the ability to push Shiny app code directly from the RStudio IDE. For the vast majority of users, this is a huge productivity boost, since you no longer have to wait for an administrator to publish your app for you.

Since publishing doesn’t require anyone to directly log into the server as part of the publishing process, there aren’t really any straightforward opportunities to install a custom package. This means that, in general, publishing a packaged shiny application isn’t really possible.

There’s only one real workaround for this situation that I’m aware of. If you have an internal CRAN-like repository for your custom packages, you should be able to use that to update Connect, with a little work.

You’d need to have your dev environment and Connect hooked up to the same repo. The updated app package needs to be available in that repo and installed in your dev environment. Then, you could publish and then update the single line app.R for each successive package version you publish.

Connect uses packrat under the hood, so when you publish the app.R the packrat manifest will also be sent to the server. Connect will use the manifest to decide which packages are required to run your app. If you’re using a custom package this would get picked up and installed or updated during deployment.

shinyapps.io

It’s not currently possible to publish a packaged application to shinyapps.io. You’d need to make sure your app followed the accepted conventions for creating Shiny apps and only uses files, rather than any custom packages.

Conclusion

Packaging Shiny apps can be a real productivity boon for you and your team. In situations where you can integrate that process into other processes, such as automatically running your unit tests or automated publishing it can also help you adopt devops-style workflows.

However, in some instances, the practice can actually make things worse and really slow you down. It’s essential to understand what the publishing workflow is in your organisation before embarking on any significant Shiny packaging project as this will help steer you towards the best course of action.

If you would like to find out how we can help you with Shiny, get in touch with us: sales@mango-solutions.com

Love Machine: Automating the romantic songwriting process
Blogs home Featured Image
Owen Jones, Placement Student

Songwriting is a very mysterious process. It feels like creating something from nothing. It’s something I don’t feel like I really control.

— Tracy Chapman

It is February. The shortest, coldest, wettest, miserablest month of the British year.

Only two things happen in Britain during February. For a single evening, the people refrain from dipping all their food in batter and deep-frying it, and instead save some time by pouring the batter straight into a frying pan and eating it by itself; and for an entire day, the exchange of modest indications of affection between consenting adults is permitted, although the government advises against significant deviation from the actions specified in the state-issued Approved Romantic Gestures Handbook.

In Section 8.4 (Guidelines for Pre-Marital Communication) the following suggestion is made:

"Written expressions of emotion should be avoided where possible. Should it become absolutely necessary to express emotion in a written format, it should be limited to a 'popular' form of romantic lyricism. Examples of such 'popular' forms include 'love poem' and 'love song'.

Thankfully, for those who have not achieved at least a master’s degree in a related field, writing a poem or song is a virtually impossible task. And following the sustained and highly successful effort to persuade the British youth that a career in the arts is a fast-track to unemployment, the number of applications to study non-STEM subjects at British universities has been falling consistently since the turn of the decade. This ensures that only the very best and most talented songwriters, producing the most creatively ingenuous work, are able to achieve widespread recognition, and therefore the British public are only exposed to high-quality creative influences.

But to us scientists, the lack of method is disturbing. This “creativity” must have a rational explanation. There must be some pattern.

This is unquestionably a problem which can be solved by machine learning, so let’s take the most obvious approach we can: we’ll train a recurrent neural network to generate song lyrics character by character.

You write down a paragraph or two describing several different subjects creating a kind of story ingredients-list, I suppose, and then cut the sentences into four or five-word sections; mix ’em up and reconnect them. You can get some pretty interesting idea combinations like this. You can use them as is or, if you have a craven need to not lose control, bounce off these ideas and write whole new sections.

— David Bowie

To build our neural network I’m going to be using the Keras machine learning interface (which we’re very excited about here at Mango right now – keep an eye out for workshops in the near future!). I’ve largely followed the steps in this example from the Keras for R website, and I’m going to stick to a high-level description of what’s going on, but if you’re the sort of person who would rather dive head-first into the code, don’t feel like you have to hang around here – go ahead and have a play! And if you want to read more about RNNs, this excellent post by Andrej Kaparthy is at least as entertaining and significantly more informative than the one you’re currently reading.

We start by scraping as many love song lyrics as possible from the web – these will form our training material. Here’s the sort of thing we’re talking about:

Well… that’s how they look to us. Actually, after a bit of preprocessing, the computer sees something more like this:

All line breaks are represented by the pair of characters “\n”, and so all the lyrics from all the songs are squashed down into one big long string.

Then we use this string to train the network. We show the network a section of the string, and tell it what comes next.

So the network gradually learns which characters tend to follow a given fixed-length “sentence”. The more of these what-comes-next examples it sees, the better it gets at correctly guessing what should follow any sentence we feed in.

At this point, our network is like a loyal student of a great artist, dutifully copying every brushstroke in minuscule detail and receiving a slap on the wrist and a barked correction every time it slips up. Via this process it appears to have done two things.

Firstly, it seems to have developed an understanding of the “rules” of writing a song. These rules are complex and multi-levelled; the network first had to learn the rules of English spelling and grammar, before it could start to make decisions about when to move to a new line or which rhyming pattern to use.

(Of course, it hasn’t actually “developed an understanding” of these rules. It has no idea what a “word” is, or a “new line”. It just knows that every few characters it should guess " ", and then sometimes it should put in a "\", and whenever it puts in a "\" then it’s got to follow that up with a "n" and then immediately a capital letter. Easy peasy.)

Secondly, and in exactly the same way, the network will have picked up some of the style of the work it is copying. If we were training it on the songs one specific artist, it would have learned to imitate the style of that particular artist – but we’ve gone one better than that and trained it on all the love songs we could find. So effectively, it’s learned how everyone else writes love songs.

But no-one gets famous by writing songs which have already been written. What we need now is some creativity, some passion, a little bit of je ne sais quoi.

Let’s stop telling our network what comes next. Let’s give it the freedom to write whatever it likes.

I don’t think you can ever do your best. Doing your best is a process of trying to do your best.

— Townes van Zandt

It’s interesting to look at the songwriting attempts of the network in the very early stages of training. At first, it is guessing more or less at random what character should come next, so we end up with semi-structured gobbledegook:

fameliawmalYaws. Boflyi, methabeethirts yt
play3mppioty2=ytrnfuunuiYs blllstis
Byyovcecrowth andtpazo's youltpuduc,s Ijd"a]bemob8b>fiume,;Co
Bliovlkfrenuyione (ju'te,'ve ru t Kis
go arLUUs,k'CaufkfR )s'xCvectdvoldes

4So
Avanrvous Ist'dyMe Dolriri

But notice that even in that example, which was taken from a very early training stage, the network has already nailed the “\n” newline combo and has even started to pick up on other consistent structural patterns like closing a “(” with a “)”. Actually, the jumbled nonsense becomes coherent English (or English-esque) ramblings quite quickly.

There is one interesting parameter to adjust when we ask the model to produce some output: the “diversity” parameter, which determines how adventurous the network should be in its choice of character. The higher we set this parameter, the more the network will favour slightly-less-probable characters over the most obvious choice at each point.

If we set the diversity parameter too low, we often degenerate into uncontrolled bursts of la-ing:

la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la
la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la
la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la la
(... lots more "la"s)

But set it too high and the network decides dictionary English is too limiting.

Oh, this younan every, drock on
Scridh's tty'
Is go only ealled
You could have like the one don'm I dope
Love me
And woment while you all that
Was it statiinc. I living you must?
We dirls anythor

It’s difficult to find the right balance between syllabic repetition and progressive vocabulary, and there’s a surprisingly fine line between the two – this will probably prove to be a fruitful area for further academic research.

I think that identifying the optimal diversity parameter is probably the key to good songwriting.

Songwriting is like editing. You write down all this stuff – all this bad, stupid stuff – and then you have to get rid of everything except the very best.

— Juliana Hatfield

So, that’s what I did.

Here are some particularly “beautiful” passages taken from the huge amount of (largely poor) material the model produced. I haven’t done any editing other than to isolate a few consecutive lines at a time and in the last few examples, to start the network off with certain sentences.

Automated love

I know your eyes in the morning sun
I feel the name of love
Love is a picked the sun
All my life I can make me wanna be with you
I just give up in your head
And I can stay that you want a life
I’ve stay the more than I do

How long will I love you
As long as there is that songs
All the things that you want to find you
I could say me true
I want to fall in love with you
I want my life
And you’re so sweet
When I see you wanted to that for you
I can see you and thing, baby
I wanna be alone

Oh yeah I tell you somethin’
I think you’ll understand
When I say that somethin’
I thought the dartion hyand
I want me way to hear
All the things what you do

Wise men say
Only fools rush in
But I can hear your love
And I don’t wanna be alone

If I should stay
I would only be in your head
I wanna know that I hope I see the sun
I want a best there for me too
I just see that I can have beautiful
So hold me to you

Wishing you a Happy Valentine’s Day! (And, I don’t recommend reciting this to your loved one, they might run away.)