Graduate Data Science Placement at Pure Planet

Blogs home Featured Image

Pure Planet Placement

Climate change and the rise of machine learning are two dominating paradigm shifts in today’s business environment. Pure Planet sits at the intersection of the two – it is a data-driven, app-based, renewable energy supplier. They provide clean renewable energy in a new, technology focused way.

Pure Planet are further developing their data science capability, with a hoard of data from their automated chat bot ‘WattBot’, among other sources, they are positioning themselves to gain real value from plumbing this data into business decisions to better support their customers. Mango have been working with Pure Planet and their data science team to build up this capability and have developed the infrastructure (and knowledge) to get this data into the hands of those that need it – be it the marketing department, finance, or the operations teams – they all have access the insights produced.

Thanks to this great relationship, Mango and Pure Planet were able to organise a Graduate Placement and I was able to spend a month integrated into their data science team in Bath.

Consumer Energy is Very Price Sensitive

To a lot of consumers, energy is the same whoever supplies it (provided it is green…) and so price becomes the one of the dominating factors in whether a customer switches to or from, Pure Planet.

With the rise of price comparison websites, evaluating the market and subsequently switching is becoming easier than ever for consumers, and consequently the rate customers are switch is increasing. Ofgem, the UK energy regulator, states: ‘The total number of switches in 2019 for both gas and electricity was the highest recorded since 2003.’ – https://www.ofgem.gov.uk/data-portal/retail-market-indicators

Pure Planet knows this, and regularly reviews its price position with respect to the market, but the current process is too manual, not customer specific, and hard to integrate into the data warehouse. Ideally, competitor tariff data could be digested and easily pushed to various parts of the business, such as in finance to assess Pure Planet’s market position for our specific customer base, or to operations as an input in a predictive churn model to assess each customer’s risk of switching.

It is clear just how valuable this data is to making good strategic decisions – it is just a matter of getting it to where it needs to be.

Can We Extract Competitor Quotes?

Market data on prices from all the energy providers in the UK is available to Pure Planet from a third party supplier, making it possible to get data on the whole market. Currently, it is possible to manually get discrete High/Medium/Low usage quotes only. These are average usages defined by Ofgem.

An alternative was found by accessing the underlying data itself and re-building quotes. This would allow us to reconstruct quotes for the whole market for any given customer usage – far more useful when looking at our real position in the market for our customers.

The data exists in two tables: tariff data and discount data. From this it should be possible to reconstruct any quote from any supplier.

An Introduction to the Tariff Data

The two data files consist of the tariff data and the discount data.

The tariff data gives the standing charge and per-kilowatt hour cost of a given fuel, for a given region, for each tariff plan. This is further filtered by meter type (Standard, or Economy 7), single/dual fuel (if both gas and electricity are supplier, or just one), and payment method (monthly direct debit, on receipt of bill, etc.). Tariff data is further complicated by the Inclusion of Economy 7 night rates, and multi-tiered tariffs.

The discount data describes the value of a given discount, and on what tariffs the discount applies. This is typically broken down into a unique ID containing the company and tariff plan, along with the same filters as above.

Most quotes rely on discounts to both entice customers in, and to offer competitive rates. As a result, they are key to generating competitor quotes. However, joining the discount and tariff data correctly, to align a discount with the correct tariff it applies to, presented a significant challenge during this project.

The way the discounts had been encoded meant that it was impossible for a machine to join them to the tariffs without some help. To solve this problem a function had to be developed that captured all the possible scenarios and transformed the discounts into a more standard data structure.

The Two Deliverables

After an initial investigation phase, two key deliverables were determined. The first was a python package to help the users easily process the discounts data into a form that could easily and accurately join onto the tariff data. The second was a robust understanding of how quotes can be generated from the data. The idea being the package would be used in the ETL stage to process the data before storing it in the data base, and the knowledge would be mapped from python to SQL and applied when fetching a quote in other processes.

Although most tariffs and discounts were straight forward, for the few remaining there were several complications. As ever in life, it was these tricky ones that were the most interesting from a commercial perspective – hence the need to get this right!

The Methodology

Investigation and package development were undertaken in Jupyter notebooks, written in python, primarily using the `pandas` package. Here, functions were developed to process the discounts data into the preferred form. During development, tests were written with the `pytest` framework to check the function was doing the logic as intended. Each test tested a specific piece of logic as it was added to the function. This was a true blessing, as on more than one occasion the whole function needed re-writing as new edge cases were found, proving initial assumptions wrong. The new function was simply run through all the previous tests to check it still worked, saving vast amounts of time, and ensuring robustness for future development and deployment.

Once developed, the functions (along with their tests) were structured into a python package. Clear documentation was written to describe both the function logic, but also higher-level how-to guides to enable use of the package. All development was under version control using git and pushed to bit bucket for sharing with the Pure Planet data team.

Pure Planet uses Amazon Web Services for their cloud infrastructure, and as a result I became much more aware of this technology and what it can do. For example, using the Amazon Web Services Client to access data stored in shared S3 buckets. It was great to see how their data pipeline was set up, and just how effective it was.

To prove the understanding of how quotes were built up, a notebook was written to validate generated quotes by comparing these to the quote data fetched manually. This incorporated the newly developed package to processes the discount data and join this to the tariff data, followed by implementing the quote logic in pandas to generate quotes. It was then possible to compare the generated quotes to the manual quotes to prove the success of the project.

And Finally…

Big thanks to Doug Ashton and the fellow Data Science team at Pure Planet for making my time there so enjoyable. I really felt part of the team from day one. I would also like to extend my thanks to those at Mango and Pure Planet who made this graduate placement opportunity possible.

Author: Duncan Leng, Graduate Data Scientist at Mango