centralized collaboration to futureproof your data scinece team
Blogs home Featured Image

The data science industry like any IT or tech company have the largest share of remote employees with teams based all over the world. Yet, like other teams projects are implemented effectively, collaboratively and on time.

With the rapidly evolving COVID-19 crisis, companies have been forced to adopt working from home policies. Our technology and digital infrastructure has never been more important. Newly formed ‘remote data science teams’ through this, need to maintain productivity and continue to drive effective stakeholder communication and business value and the only way to achieve this is through appropriate infrastructure and well-defined ways of working.

Whether your workforce works remotely or otherwise, centralising platforms and enabling a cloud based infrastructure for data science will lead to more opportunities for collaboration. It may even reduce IT spend in terms of equipment and maintenance overhead, thus future-proofing your data science infrastructure for the long run.

So when it comes to implementing long-lived platform, here are some things to keep in mind:

Collaboration through a centralised data & analytics platform

A centralized platform, such as RStudio Server Pro, means all your data scientists will have access to an appropriate platform and be working within the same environment. Working in this way means that a package written by one developer can work with a minimum of effort in all your developers’ environments allowing simpler collaboration. There are other ways of achieving this with technologies such as virtualenv for Python, but this requires that each project set up its own environment, thereby increasing overhead. Centralizing this effort ensures that  there is a well-understood way of creating projects, and each developer is working in the same way.

Best practices when using a centralized platform

  1. Version control. If you are writing code of any kind, even just scripts, it should be versioned religiously and have clear commit messages. This ensures that users can see each change made in scripts if anything breaks and can reproduce your results on their own.
  2. Packages. Whether you are working in Python or R, code should be packaged and treated like the valuable commodity it is. At Mango Solutions, a frequent challenge we address with our clients is to debug legacy code where a single ‘expert’ in a particular technology has written some piece of process which has become mission critical and then left the business. There is then no way to support, develop, or otherwise change this process without the whole business grinding to a halt. Packaging code and workflows helps to document and enforce dependencies, which can make legacy code easier to manage. These packages can then be maintained by RStudio Package Manager or Artifactory.
  3. Reusability. By putting your code in packages and managing your environments with renv, you’re able to make your data science reusable. Creating this institutional knowledge means that you can avoid a Data Scientist becoming a single point of failure, and with the job market still incredibly buoyant in the data sector, when a data scientist does leave, you won’t be left with a model that nobody understands or can’t run. As Lou Bajuk explained in his blog post, Driving Real, Lasting Value with Serious Data Science, durable code is a significant criteria for future-proofing your data science organization.

Enabling a Cloud-based environment

In addition to this institutional knowledge benefit, running this data science  platform on a cloud instance allows us to scale up the platform easily. With the ability to deploy to Kubernetes, scaling your deployment as your data science team grows is a huge benefit while only requiring you to pay for what you need to, when you need it.

This move to cloud comes with some tangential benefits which are often overlooked. Providing your data science team with a cloud-based environment has a number of benefits:

  1. The cost of hardware for your data science staff can be reduced to low cost laptops rather than costly high end on-premise hardware.
  2. By providing a centralized development platform, you allow remote and mobile work which is a key discriminator for hiring the best talent.
  3. By enhancing flexibility, you are better positioned to remain productive in unforeseen circumstances.

This last point cannot be overstated. At the beginning of the Covid-19 lockdown, a nationwide company whose data team was tied to desktops found themselves struggling to provide enough equipment to continue working through the lockdown. As a result, their data science team could not function and were unable to provide insights that would have been  invaluable through these changing times. By contrast, here at Mango, our data science platform strategy allowed us to switch seamlessly to remote working, add value to our partners and deliver insights when they were needed most.

Building agility into your basic ways of working means that you are well placed to adapt to unexpected events and adopt new platforms which are easier to update as technology moves on.

Once you have a centralized analytics platform and cloud-based infrastructure in place, how are you going to convince the business to use it? This is where the worlds of Business Intelligence and software dev-ops come to the rescue.

Analytics-backed dashboards using technologies like Shiny and RStudio Connect or Plotly and Dash for Python means you can quickly and easily create front ends for business users to access results from your models. You can also easily expose APIs that allow your websites to be backed by scalable models, potentially creating new ways for customers to engage with your business.

A word of caution here: Doing this without considering how you are going to maintain and update what have now become software products can be dangerous. Models may go out of date, functionality can become  irrelevant,  and the business can become disillusioned. Fortunately, these are solved problems in the web world, and solutions such as containers and Kubernetes alongside CI/CD tools make this a simpler challenge. As a consultancy, we have a tried and tested solutions that expose APIs from R or Python that back high-throughput websites from across a number of sectors for our customers.

Collaborative forms of communications

The last piece of the puzzle for your data science team to be productive has nothing to do with data science but is instead about empathy and communication. Your data science team may create insights from your data, but they are like a rudderless ship without input from the business. Understanding business problems and what has value to the wider enterprise requires good communication. This means that your data scientists have to partner with people who understand the sales and marketing strategy. And if you are to embrace the ethos of flexibility as protection against the future, then good video-conferencing and other technological communications are essential.

Written by Dean Wood, Principal Data Science Consultant at Mango Solutions, also published as guest blog on our partner RStudio’s website.



Hyperautomation symbol
Blogs home Featured Image

 In October, Gartner released a report on the Top 10 Strategic Technology Trends for 2020.  In somewhat prophetic fashion, Gartner identified “Hyperautomation” as the #1 trend for 2020 – as we plan for the post-COVID commercial environment, with leadership looking to create more streamlined organisations, their timing couldn’t have been better.

“Hyperautomation” is found at the intersection of Robotic Process Automation (RPA) and Machine Learning (ML) – it combines RPA’s approach to automating business processes, with ML’s ability to drive insight from data.  It the potential to significantly reduce costs  by automating processes that may include intelligent decisioning, at a time where leaders everywhere need to create more efficient, smarter organisations.

In many ways, there is nothing new here – a key goal of data science investment for the last ~10 years has been to automate decision-making processes based on AI and ML.  As an organisation, we are frequently asked to deliver initiatives that aim to automate decision making processes using a combination of data science and software engineering.

What is new here is the perspective – the “RPA-first” approach that underpins “Hyperautomation” is another tool in the arsenal when we look at automating process, and drives increased collaboration across analytic and IT functions.

Perhaps the most important aspect of the rise of “Hyperautomation” is its impeccable timing.  Not only are we needing to create more streamlined organisations (due to COVID and the impending recession), but it comes at a time when (in some quarters) serious questions are being raised about the value generated from investment in data science.  With talk of an impending AI-Winter, and anecdotal stories of data science teams struggling to deliver realisable business value, talk of “Hyperautomation” provides a great opportunity – a chance to deliver on the potential of analytics to drive measurable cost reduction.

“Hyperautomation” is an opportunity to capture the imagination and focus of the business – to more deeply engage with them in a collaborative fashion to explore possible processes that could be automated.  And when we find high-value process that could be automated, then we have more tools in our arsenal with which to build a solution.

To use a recent example, we were engaged by a client to automate their “price comparison” process, where customers would email details of a quote and ask whether our client could beat it.  Using a mixture of technologies and machine learning, we were able to dynamic read and understand the given quote, generate a comparative quote, and use NLP to dynamically create a response using an appropriate tone of voice.  The initial automation “success” was low, with only 8% of cases being full automated.  However, that already delivered sufficient business value to demonstrate a return on investment in a few months.  Moreover, the data generated by the “manual” process is already being used to dynamically improve the model, leading to an increased success rate and more savings.  All in all, this “humans pretending to be AI pretending to be humans” model really provides a platform for ongoing efficiency gains and cost reductions.

 As businesses emerge post-COVID, we’re all going to be in a difficult financial position, in an ultra-competitive landscape with lots of unknowns. To get through, companies will be looking to drive costs efficiencies wherever they can, making it a great time to talk about the application of “Hyperautomation” as a way to reduce the unnecessary day to day burden of process heavy tasks.

Reducing Costs With AI & Hyperautomation in a Post-COVID World Webinar

Join Rich Pugh as he provides an insight into AI and Hyperautomation – how businesses will be adopting this very technique as they strive to reduce costs.



graduate data scientist placement
Blogs home Featured Image

Pure Planet Placement

Climate change and the rise of machine learning are two dominating paradigm shifts in today’s business environment. Pure Planet sits at the intersection of the two – it is a data-driven, app-based, renewable energy supplier. They provide clean renewable energy in a new, technology focused way.

Pure Planet are further developing their data science capability, with a hoard of data from their automated chat bot ‘WattBot’, among other sources, they are positioning themselves to gain real value from plumbing this data into business decisions to better support their customers. Mango have been working with Pure Planet and their data science team to build up this capability and have developed the infrastructure (and knowledge) to get this data into the hands of those that need it – be it the marketing department, finance, or the operations teams – they all have access the insights produced.

Thanks to this great relationship, Mango and Pure Planet were able to organise a Graduate Placement and I was able to spend a month integrated into their data science team in Bath.

Consumer Energy is Very Price Sensitive

To a lot of consumers, energy is the same whoever supplies it (provided it is green…) and so price becomes the one of the dominating factors in whether a customer switches to or from, Pure Planet.

With the rise of price comparison websites, evaluating the market and subsequently switching is becoming easier than ever for consumers, and consequently the rate customers are switch is increasing. Ofgem, the UK energy regulator, states: ‘The total number of switches in 2019 for both gas and electricity was the highest recorded since 2003.’ – https://www.ofgem.gov.uk/data-portal/retail-market-indicators

Pure Planet knows this, and regularly reviews its price position with respect to the market, but the current process is too manual, not customer specific, and hard to integrate into the data warehouse. Ideally, competitor tariff data could be digested and easily pushed to various parts of the business, such as in finance to assess Pure Planet’s market position for our specific customer base, or to operations as an input in a predictive churn model to assess each customer’s risk of switching.

It is clear just how valuable this data is to making good strategic decisions – it is just a matter of getting it to where it needs to be.

Can We Extract Competitor Quotes?

Market data on prices from all the energy providers in the UK is available to Pure Planet from a third party supplier, making it possible to get data on the whole market. Currently, it is possible to manually get discrete High/Medium/Low usage quotes only. These are average usages defined by Ofgem.

An alternative was found by accessing the underlying data itself and re-building quotes. This would allow us to reconstruct quotes for the whole market for any given customer usage – far more useful when looking at our real position in the market for our customers.

The data exists in two tables: tariff data and discount data. From this it should be possible to reconstruct any quote from any supplier.

An Introduction to the Tariff Data

The two data files consist of the tariff data and the discount data.

The tariff data gives the standing charge and per-kilowatt hour cost of a given fuel, for a given region, for each tariff plan. This is further filtered by meter type (Standard, or Economy 7), single/dual fuel (if both gas and electricity are supplier, or just one), and payment method (monthly direct debit, on receipt of bill, etc.). Tariff data is further complicated by the Inclusion of Economy 7 night rates, and multi-tiered tariffs.

The discount data describes the value of a given discount, and on what tariffs the discount applies. This is typically broken down into a unique ID containing the company and tariff plan, along with the same filters as above.

Most quotes rely on discounts to both entice customers in, and to offer competitive rates. As a result, they are key to generating competitor quotes. However, joining the discount and tariff data correctly, to align a discount with the correct tariff it applies to, presented a significant challenge during this project.

The way the discounts had been encoded meant that it was impossible for a machine to join them to the tariffs without some help. To solve this problem a function had to be developed that captured all the possible scenarios and transformed the discounts into a more standard data structure.

The Two Deliverables

After an initial investigation phase, two key deliverables were determined. The first was a python package to help the users easily process the discounts data into a form that could easily and accurately join onto the tariff data. The second was a robust understanding of how quotes can be generated from the data. The idea being the package would be used in the ETL stage to process the data before storing it in the data base, and the knowledge would be mapped from python to SQL and applied when fetching a quote in other processes.

Although most tariffs and discounts were straight forward, for the few remaining there were several complications. As ever in life, it was these tricky ones that were the most interesting from a commercial perspective – hence the need to get this right!

The Methodology

Investigation and package development were undertaken in Jupyter notebooks, written in python, primarily using the `pandas` package. Here, functions were developed to process the discounts data into the preferred form. During development, tests were written with the `pytest` framework to check the function was doing the logic as intended. Each test tested a specific piece of logic as it was added to the function. This was a true blessing, as on more than one occasion the whole function needed re-writing as new edge cases were found, proving initial assumptions wrong. The new function was simply run through all the previous tests to check it still worked, saving vast amounts of time, and ensuring robustness for future development and deployment.

Once developed, the functions (along with their tests) were structured into a python package. Clear documentation was written to describe both the function logic, but also higher-level how-to guides to enable use of the package. All development was under version control using git and pushed to bit bucket for sharing with the Pure Planet data team.

Pure Planet uses Amazon Web Services for their cloud infrastructure, and as a result I became much more aware of this technology and what it can do. For example, using the Amazon Web Services Client to access data stored in shared S3 buckets. It was great to see how their data pipeline was set up, and just how effective it was.

To prove the understanding of how quotes were built up, a notebook was written to validate generated quotes by comparing these to the quote data fetched manually. This incorporated the newly developed package to processes the discount data and join this to the tariff data, followed by implementing the quote logic in pandas to generate quotes. It was then possible to compare the generated quotes to the manual quotes to prove the success of the project.

And Finally…

Big thanks to Doug Ashton and the fellow Data Science team at Pure Planet for making my time there so enjoyable. I really felt part of the team from day one. I would also like to extend my thanks to those at Mango and Pure Planet who made this graduate placement opportunity possible.

Author: Duncan Leng, Graduate Data Scientist at Mango

stakeholder engagement
Blogs home Featured Image

Aligning effective stakeholder engagement as a driving force to delivering business value from data science.


Opening up the opportunities with data

If you’re trying to help your organisation become more data-driven you’ll almost certainly have come across pockets of indifference, or active resistance, to integrating data & analytics into the way their team works.

While it can be tempting to leave those teams behind in the hope that they will jump on the bandwagon when success stories from the more enthusiastic areas of the business start to be generated, in reality their inertia tends to slow the pace of organisational change down even indirectly. Instead you have to tread a middle ground – one where you don’t expend too much energy trying to enthuse a hard to please audience but do enough to keep them on board in an efficient way.

To achieve this it’s important to understand and address the reasons colleagues may not see the value in data, and the motivations of those who actively push back on it.


Effective stakeholder engagement: Making sure advanced analytics is properly understood

It can be frustrating trying to engage colleagues who don’t understand the value data can bring them. To someone passionate about the possibilities of analytics it seems so obvious! In my experience the best way to solve this is to focus the conversation on trying to understand their role and the issues they are facing so that you can make that connection between business value and data yourself, rather than trying to explain analytics and asking them to make that leap between abstract concept and practical outcome. If they won’t engage then sharing case studies from the relevant areas of other organisations is a great way to warm them up to the conversation.

Data detractors are even harder to work with – why are they resisting positive change when all we’re trying to do is help? The answer to this tends to be one of two things – either they have been burned by failed attempts at data-driven change in the past, or they feel threatened by the idea of data playing a larger role in decision making processes.


Early adopters

The only way to bring round the first group is to prove that value can be delivered successfully, words won’t matter to those with entrenched views whereas actions just might. Start small and make sure you do something that adds value to them early in the journey, you’ll either gain their buy-in or, at worst, build up a track record of success that reduces their negative influence on your ability to progress.

The second group most often feel threatened because they perceive the consequence of increased organisational reliance on data over intuition as reducing their role and power. A senior leader will often see the value they bring as being their ability to make good decisions based on the experience and knowledge they have built up over their career, telling them that data can make better decisions can therefore be seen as an almost existential threat to their position! The key here is to be sensitive to this – try to engage them 1-1 where they are less exposed to being shown up in front of colleagues and show them how data can augment, rather than replace, their experience in decision making.


Aligning analytic opportunities and business strategy

Often the confusion around the jargon of data science causes a lack of understanding and a propensity to create barriers from within the business.  Through a deeper understanding, business leaders can understand the value and potential that advanced analytics can bring. Mango works with businesses to help inspire and align effective stakeholder engagement to the possibilities of becoming a data-driven organisation.

Keen to know more?  Our Never Mind the Buzzwords webinar provides an insight into the benefits of close alignment between analytic opportunities and business strategy and could help address any specific barriers to change.

Author : Dave Gardner, Deputy Director