RStudio::Conf 2020
Blogs home Featured Image

Dude: Where are my Cats?  RStudio::Conf 2020

It may not have been the start to the conference that we planned as RStudio Full Service Certified partners. – did you see the lonely guy on social media? Yes, that was me, and I’m here to tell the tale…

Eventful as it was at the time, I have to say this was the first RStudio Conference I have had the pleasure to attend since joining Mango Solutions. The things that really stood out for me were the event’s ubiquitous and thought-through inclusivity and the fantastically run and well organised event for nearly 2400 R users worldwide. Here’s a summary of our time in San Francisco, what it had to offer and why we are immensely proud partners of RStudio.

Cat rehoming 10:41am San Francisco time

Held up in customs, the conference started without our exhibition stand, materials and conference goodies, the famous Mango cats. I remain ever thankful to the whole #rstats community, who despite this little hiccup, took pity on us and came to visit us anyway. What I was able to quickly grasp, was that this is a community that is so quickly available to support others, present a forum to share ideas and learn how to solve problems, in particular learn how others are benefiting from using R.

Public Benefit Corporation

A vital and impressive moment of the conference was the standing ovation for J.J Allaire after his announcement that RStudio had become a Public Benefit Corporation. You could feel the appreciation in the room for RStudio’s innovation and how it had pushed the R Community forward.  He discussed their future plans which provides growth opportunities for the community.

From a content perspective, the RStudio::conf was a great event, filled with informative and well organised workshops and talks. As hard as it is to pick out one particular talk, it was probably Jenny Bryan’s talk: “Object of type ‘closure’ is not subsettable”; this was all about debugging in R – best approaches, available tools and hints on how to write more informative error messages in your own functions. It was engaging, informative, witty and it was relevant to pretty much every single R developer on this planet, let alone present in the room.

Amongst other things, the Mango team of Data Scientists really appreciated these packages which the RStudio team featured as part of their workshops:

  • The best ways to scale up you API using plumber package
  • Custom styling of Shiny apps using bootstraplib package
  • Effective R code parallelization using future and furrr packages
  • Load testing using loadtest package

 

Inclusivity all round

Inclusivity was felt not only with the RLadies breakfast, but also in having prayer rooms, quiet rooms for neurodiverse attendees, the gender-neutral bathrooms, diversity scholarships and very frequent reminders of the event’s code of conduct that revolves heavily around inclusivity and tolerance. Great organisation was shown not only in a suitable venue, but also in every effort that went into ensuring that queues for food/buses didn’t stay long, that there was enough time to change rooms between the talks and via the great entertainment/perks throughout the event.

Endless networking opportunities

RStudio::conf 2020 was a fantastic place to meet and connect with other people in the industry and gain insight into how other companies, data science teams and individuals are using R and the underlying infrastructure that supports it. For Ben our Data Platform Consultant, it was interesting and exciting to hear from a platform perspective about the needs of data science teams, and how we could potentially solve the challenges they are facing. A recurring issue seemed to be in scaling R in a production environment and the best way to do this. Ben found the Renv talk interesting and hopes to be using it more this year in place of Packrat.

For Mango it was a real pleasure to discuss at large the wealth of opportunity presented by ValidR in our validated production-ready version of R.

A huge thank you to everyone at RStudio for supporting my first conference with RStudio.  It was truly a pleasure to meet the team in person and has really given Mango and RStudio the opportunity to consolidate our partnership to the next level.

 

Author: Rich Adams, Solutions Specialist

Blogs home

ABOUT THE BOOK:

With the open source R programming language and its immense library of packages, you can perform virtually any data analysis task. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you’ll need to import, manipulate, summarize, model, and plot data with R, formalize analytical code; and build powerful R packages using current best practices.

Each short, easy lesson builds on all that’s come before: you’ll learn all of R’s essentials as you create real R solutions.

R in 24 hours, Sams Teach Yourself covers the entire data analysis workflow from the viewpoint of professionals whose code must be efficient, reproducible and suitable for sharing with others.

 

WHAT YOU’LL LEARN:

You’ll learn all this, and much more:

  • Installing and configuring the R environment
  • Creating single-mode and multi-mode data structures
  • Working with dates, times, and factors
  • Using common R functions, and writing your own
  • Importing, exporting, manipulating, and transforming data
  • Handling data more efficiently, and writing more efficient R code
  • Plotting data with ggplot2 and Lattice graphics
  • Building the most common types of R models
  • Building high-quality packages, both simple and complex – complete with data and documentation
  • Writing R classes: S3, S4, and beyond
  • Using R to generate automated reports
  • Building web applications with Shiny

Step-by-step instructions walk you through common questions, issues, and tasks; Q & As, Quizzes, and Exercises build and test your knowledge; “Did You Know?” tips offer insider advice and shortcuts and “Watch Out!” alerts help you avoid pitfalls.

By the time you’re finished, you’ll be comfortable going beyond the book to solve a wide spectrum of analytical and statistical problems with R.

If you are finding that you have some time on your hands and would like to enhance your skills, why not Teach yourself R in 24 hours?

The data and scripts to accompany the book can be accessed on GitHub here and the accompanying MangoTraining package can be installed from CRAN using the following in R:  install.packages(“mangoTraining”)

 

ORDERING A COPY OF THIS BOOK:

If you’d like to order a copy use the following ISBN codes:

ISBN-13: 978-0-672-33848-9

ISBN-10: 0-672-33848-3

Authors: Andy Nicholls, Richard Pugh and Aimee Gott.

data science & star wars
Blogs home Featured Image

 

Not many people know this, but the First Order (the bad guys from the latest Star Wars films) once created a Data Science team.  It all ended very badly, but based on some intel smuggled by a chippy R2 unit, we were able to piece together the story …

 

 

Analytics: Expectation vs Reality

Now, of course this is just (data) science fiction, but the basic plot will be familiar to many of you.

The marketing hype around AI and Data Science over the last few years has really raised the stakes in the analytics world.  It’s easy to see why – if you’re a salesperson selling AI software for £1m, then you’re going to need to be bullish about how many millions it is going to make/save the customer.

The reality though is that Data Science can add enormous value to an organisation, but:

  • It isn’t magic
  • It won’t happen overnight
  • It’s very difficult if the building blocks aren’t in place
  • It’s more about culture and change than algorithms and tech

So, how do we deal with a situation where leaders (whether they be evil Sith overlords or just impatient executives) have inflated expectations about what is possible (and have possibly over-invested on that basis)?

 

Education is key

With so much buzz and hype around analytics, it’s unsurprising that leadership are bombarded with an array of confusing terminology and unrealistic promises.  To counter that, it is important that Data Science teams look to educate the business and leadership on what these terms really mean.  In particular, we need to educate the business on the “practical” application of data science, what the possibilities are, and the potential barriers to success that exist.

 

Create a repeatable process

Once we’ve educated the business about the possibilities of analytics, we need to create a repeatable delivery process that is understood from both analytic AND business perspectives.  This moves the practice of analytics away from “moments of magic” producing anecdotal success to a process that is understandable, repeatable and produces consistent success.  Within this, we can establish shared understanding about how we will prioritise effort, measure success, and overcome the barriers to delivering initiatives (e.g. data, people, change).

 

Be consistent

Having established the above, we must engage with the business and leadership using our new consistent language and approach.  This will ensure the business understands the steps that are being carried out and the risk of success and failure.  After all, if there’s no signal in your data you can’t conjure accuracy from nowhere – ensuring that your stakeholders understand this (without getting into the detail of accuracy measures) is an important enabler to engaging effectively with them.

 

Summary

Being in a situation where the value and possibilities of data science have been significantly over-estimated can be very challenging.  The important thing is to educate the business, create a repeatable process for successful delivery and be consistent and clear about the realities and practicalities of applying data science.

Then again, if your executive sponsor starts wielding a Lightsaber – I’d get out quickly.

 

Blogs home Featured Image

 

Following on from the success of our recent graduate intake, we are already looking to find three more graduates and one yearlong placement to join us in September 2020.  Our placements and interns have been an integral part at Mango for several years now, and we’re proud to say that every single intern has come back once they’ve finished university and joined us as a permanent employee.

Mango hosted our very first graduate assessment day recently.  We thought that an assessment day would give us a better chance to really get to know the applicants, and to really show them what life at Mango is like – and it certainly did just that!

As wonderful as our current graduate intake is, I have to admit that all four of them are male.  As signatories of the Tech Talent Charter, and supporters of Women in Data, we were determined to change that statistic this year.  I’m pleased to say that of the eight candidates at the assessment day, there were four males and four females.  Also, Mango is also justifiably proud of the diversity of the background of our data science – and this cohort was similarly diverse – we had representatives from five different subjects, and four different universities.

Following the recent Data Science Skills Survey – created in partnership with Women In Data UK and Datatech – that highlighted a national data science skills shortage, we were delighted that we had over 60 applicants for the three graduate roles and we have already whittled these down to the top six candidates who will move forward to the next stage of the application process to become a Mango graduate.

The next part of the process is about assessing skills and we do this by defining what we call a Minimally Viable Data Scientist – this is what we expect our graduates to achieve by the end of the graduate program.  We put exercises in place throughout the day to assess current skills as well as potential skills.

The more ‘technical’ skills were assessed at interview, whilst the softer skills, which are essential for our consultancy projects, were tested in individual and group exercises. We tasked the candidates with imagining a new project with Bath Cats and Dogs home, thinking about how that might play out.

We’re proud of some of the feedback that we received at the end of the day.  We consciously set out for this day to be two way – we wanted the candidates to want to work for Mango, just as much as we wanted to employ them. Some candidate’s feedback revealed that the day was “refreshingly open”, “actually enjoyable” and “not as daunting as I’d thought an assessment day would be”.

We’ve now got the incredibly difficult decision of which of the brilliant candidates to make offers to!

are you on a data-driven coddiwomple?
Blogs home Featured Image

If like me, you attend data conferences, then there’s one word you will hear time and time again: “journey.  It’s an incredibly popular word in the data-driven transformation world, and it’s common to hear a speaker talking about the “journey” their business is on.  But, for me, I often struggle with that word.

 

Journey

The Oxford Dictionary defines a “journey” as follows:

Journey:
the act of travelling from one place to another

So to be on a journey, I feel we need to have a very clear understanding of (1) where we are travelling from and (2) what our destination is.

For example, as I’m writing this, I’m on a “journey” – I am travelling by train from my home (my starting point) to our London office (my destination).  Knowing my starting point and destination allows me to select the most appropriate route, estimate the time I need to invest to reach my destination, and allows me to understand my current progress along that route.  And, if I encounter any difficulties or delays on my journey (surely not on British trains!) then I know how to adjust and reset my expectations to ensure my route is appropriate and understood.

If we compare this to the use of the word “journey” in the context of data-driven transformation, I’m not entirely sure it fits.  When I speak with data and analytic leaders who are on a data-driven journey, it is surprising how often there is a lack of clarity over the destination, or their current position, which makes it very difficult to plan and measure progress.

But I see how the word journey has become so common – it conjures a sense of momentum and change which really fits the world of data-driven transformation.

 

Coddiwomple

However, I recently came across this incredible word, which I think may be more fitting. The origins of the word are unknown, but it is defined as follows:

Coddiwomple: 
to travel in a purposeful manner towards a vague destination

Despite being a lovely word to use, I think it is a far more appropriate description of many data-driven “journeys” I have encountered.

Know your destination

So if you’re currently on a “data-driven coddiwomple” and want to be on a “data-driven journey”, then you need only decide on a destination – in other words, what does a “data-driven” version of your current business look like?  In my experience, this can vary significantly – I’ve worked with organisations who see the destination as everything from a fully autonomous company to a place with highly disruptive business models.

Once this is decided, then you can build data-driven maturity models to measure your value and inform downstream investments – in the meantime, “Happy Coddiwompling!!”

 

data scientist or data engineer - what's the difference?
Blogs home Featured Image

Author: Principal Consultant, Dean Wood

When it was floated that I should write this article, I approached it with trepidation. There is no better way to start an argument in the world of data than by trying to define what a Data Scientist is or isn’t – by adding in the complication of the relatively newly appearing role of Data engineer, there is no way this is not going to end in supposition and a lot of sentences starting “I reckon”.

Nevertheless, understanding the value of these two vital roles is important to any organisation that seeks to unlock the value found in its data – without being able to describe these roles, it’s next to impossible to recruit appropriately. With that in mind, here is what I reckon.

‘By thinking in terms of rigidly defined boxes we are missing the point. A Data team should be thought of as covering a spectrum of the range of skills you need for effective data management and analytics. Simple boxes like Data Scientist and Data Engineer are useful, but should not be too rigidly defined.’

Reams have been written attempting to define what a Data Scientist is. The data science community has careered from expecting a Data Scientist to know everything from Dev Ops to statistics, to a Data Scientist needing to have a PhD which leads to large institutions giving up and just rebranding their BI professionals as Data Scientists. All of this misses the point.

Then arise the Data Engineer. No longer is your IT department the custodians of the data. The role has become too specialist and critical to the business for those who have worked really hard to understand traditional IT systems and think things like 3rd Normal Form is something in gymnastics and Hadoop is a noise you make after eating a kebab. Completely understandably, data has outgrown your average IT professional, but what do you need to make sure your data is corralled properly? Can’t we just throw a Data Scientist at it and get them to look after the Data? Again, I think this misses the point.

Human beings are good at putting things into boxes and categories. It is how we manage the world and it is largely how we are trained to manage our businesses. Our management accountants take care of the finances and our HR department takes care of our employees. However, by putting people in these boxes with fairly rigid boundaries, there is a risk that necessary skills are missed and you end up with a team across your organisation that cannot provide what the business needs.

This is particularly true when we come to think of Data Scientists and Data Engineers. Rather than thinking of people in terms of the box to put them in, when looking at building your data team it is preferable to think of a spectrum of skills that you need to cover. These can be broadly put into the boxes of Data Scientist and Data Engineer, however the crossover can be high as can be seen in the diagram above.

In your Data Engineering team you will need individuals with a leaning towards the world of Dev Ops and you will need team members who are close to Machine Learning engineers. Likewise, in your Data Science team you will need members who are virtually statisticians, and team members who know something about deploying a model in a production environment – making sure your team as a whole and your individual project teams cover this skill mix, can be a real challenge.

So in summary, I reckon that we need to stop thinking about the boxes we put people in quite so much, and start looking at the skills we actually need in our teams to make our projects a success. Understanding the Data Scientist/Data Engineer job roles as a spectrum of skills where you may need Data Engineer-like Data Scientists, and Data Scientist-like Data Engineers, will give you more success when it comes to building your data team and delivering value from your data.

integrating Python and R
Blogs home Featured Image

For a conference in the R language, the EARL Conference sees a surprising number of discussions about Python. I like to think that at least some of these are to do with the fact that we have run 3-hour workshops outlining various strategies for integrating Python and R – here’s how:

  • outline the basic strategy for integrating Python and R;
  • run through the different steps involved in this process; and
  • give a real example of how and why you would want to do this.

This post kicks everything off by:

  • covering the reasons why you may want to include both languages in a pipeline;
  • introducing ways of running R and Python from the command line; and
  • showing how you can accept inputs as arguments and write outputs to various file formats.

Why “And” not “Or”?

From a quick internet search for articles about “R Python”, of the top 10 results, only 2 discuss the merits of using both R and Python rather than pitting them against each other. This is understandable; from their inception, both have had very distinctive strengths and weaknesses. Historically, though, the split has been one of educational background: statisticians have preferred the approach that R takes, whereas programmers have made Python their language of choice. However, with the growing breed of data scientists, this distinction blurs:

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. — twitter @josh_wills

With the wealth of distinct library resources provided by each language, there is a growing need for data scientists to be able to leverage their relative strengths. For example: Python tends to outperform R in such areas as:

  • Web scraping and crawling: though rvest has simplified web scraping and crawling within R, Python’s beautifulsoup and Scrapy are more mature and deliver more functionality.
  • Database connections: though R has a large number of options for connecting to databases, Python’s sqlachemy offers this in a single package and is widely used in production environments.

Whereas R outperforms Python in such areas as:

  • Statistical analysis options: though Python’s combination of ScipyPandas and statsmodels offer a great set of statistical analysis tools, R is built specifically around statistical analysis applications and so provides a much larger collection of such tools.
  • Interactive graphics/dashboardsbokehplotly and intuitics have all recently extended the use of Python graphics onto web browsers, but getting an example up and running using shiny and shiny dashboard in R is faster, and often requires less code.

Further, as data science teams now have a relatively wide range of skills, the language of choice for any application may come down to prior knowledge and experience. For some applications – especially in prototyping and development – it is faster for people to use the tool that they already know.

Flat File “Air Gap” Strategy

In this series of posts we are going to consider the simplest strategy for integrating the two languages, and step though it with some examples. Using a flat file as an air gap between the two languages requires you to do the following steps.

  1. Refactor your R and Python scripts to be executable from the command line and accept command line arguments.
  2. Output the shared data to a common file format.
  3. Execute one language from the other, passing in arguments as required.

Pros

  • Simplest method, so commonly the quickest
  • Can view the intermediate outputs easily
  • Parsers already exist for many common file formats: CSV, JSON, YAML

Cons

  • Need to agree upfront on a common schema or file format
  • Can become cumbersome to manage intermediate outputs and paths if the pipeline grows.
  • Reading and writing to disk can become a bottleneck if data becomes large.

Command Line Scripting

Running scripts from the command line via a Windows/Linux-like terminal environment is similar in both R and Python. The command to be run is broken down into the following parts,

<command_to_run> <path_to_script> <any_additional_arguments>

where:

  • <command> is the executable to run (Rscript for R code and Python for Python code),
  • <path_to_script> is the full or relative file path to the script being executed. Note that if there are any spaces in the path name, the whole file path must me enclosed in double quotes.
  • <any_additional_arguments> This is a list of space delimited arguments parsed to the script itself. Note that these will be passed in as strings.

So for example, an R script is executed by opening up a terminal environment and running the following:

Rscript path/to/myscript.R arg1 arg2 arg3

A Few Gotchas

  • For the commands Rscript and Python to be found, these executables must already be on your path. Otherwise the full path to their location on your file system must be supplied.
  • Path names with spaces create problems, especially on Windows, and so must be enclosed in double quotes so they are recognised as a single file path.

Accessing Command Line Arguments in R

In the above example where arg1arg2 and arg3 are the arguments parsed to the R script being executed, these are accessible using the commandArgsfunction.

## myscript.R

# Fetch command line arguments
myArgs <- commandArgs(trailingOnly = TRUE)

# myArgs is a character vector of all arguments
print(myArgs)
print(class(myArgs))

By setting trailingOnly = TRUE, the vector myArgs only contains arguments that you added on the command line. If left as FALSE (by default), there will be other arguments included in the vector, such as the path to the script that was just executed.

Accessing Command Line Arguments in Python

For a Python script executed by running the following on the command line

python path/to/myscript.py arg1 arg2 arg3

the arguments arg1arg2 and arg3 can be accessed from within the Python script by first importing the sys module. This module holds parameters and functions that are system specific, however we are only interested here in the argv attribute. This argv attribute is a list of all the arguments passed to the script currently being executed. The first element in this list is always the full file path to the script being executed.

# myscript.py
import sys

# Fetch command line arguments
my_args = sys.argv

# my_args is a list where the first element is the file executed.
print(type(my_args))
print(my_args)

If you only wished to keep the arguments parsed into the script, you can use list slicing to select all but the first element.

# Using a slice, selects all but the first element
my_args = sys.argv[1:]

As with the above example for R, recall that all arguments are parsed in as strings, and so will need converting to the expected types as necessary.

Writing Outputs to a File

You have a few options when sharing data between R and Python via an intermediate file. In general for flat files, CSVs are a good format for tabular data, while JSON or YAML are best if you are dealing with more unstructured data (or metadata), which could contain a variable number of fields or more nested data structures. All these are very common data serialisation formats, and parsers already exist in both languages. In R the following packages are recommended for each format:

And in Python:

The csv and json modules are part of the Python standard library, distributed with Python itself, whereas PyYAML will need installing separately. All R packages will also need installing in the usual way.

Summary

So passing data between R and Python (and vice-versa) can be done in a single pipeline by:

  • using the command line to transfer arguments, and
  • transferring data through a commonly-structured flat file.

However, in some instances, having to use a flat file as an intermediate data store can be both cumbersome and detrimental to performance.

Authors: Chris Musselle and Kate Ross-Smith

Blogs home Featured Image

When technical capabilities and company culture combine, IoT-fed data lakes become a powerful brain at the heart of the business

Internet-enabled devices have led to an explosion in the growth of data. On its own, this data has some value, however, the only way to unlock its full potential is by combining it with other data that businesses already hold.

Together, pre-existing data and newly-minted IoT data can provide a full picture of specific insights around a single consumer. It is paramount, however, that companies don’t prioritise innovation at the expense of ethics. Sourcing and analytics must be done correctly – with the right context that respects consumer privacy and wishes around data usage.

The insights gained from successfully blending these two different data sources also unlock secondary benefits including new product development, possible upsells or the ability to build customer goodwill through advice-driven service delivery.

It’s a winning combination, but the challenge is how to actually merge device data with regular customer information.

No easy fit

This problem arises from the fact that IoT device data is a different “shape” to data in traditional customer records.

If you think of a customer record in a sales database as one long row of information, IoT collected information is more like an entire column of time series information, with a supporting web of additional detail. Trying to directly join the two is near impossible, and it is likely that some valuable semantic information could end up lost in the process.

But if IoT information fundamentally resists structure, and existing business databases are built on rigid structures, how do you find an environment that works for both? The answer is a data lake.

Pooling insight

A data lake is a more “fluid” approach to storing and connecting data. It is a central repository where data can be stored in the form it’s generated, whether that is in a relational database format or entirely unstructured. Analytics can then be applied over the top to connect different pieces of information and derive useful business insights.

However, there is more complexity involved in setting up a data lake than just combining all of an organisation’s data and hoping for the best. If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful.

This can be avoided using the expertise of dedicated data engineers. These are the masterminds who build the framework for a data lake and manage the process of extracting data from its source, before transforming it into a usable format and then loading it into the data lake environment. Done properly, this will ensure data provenance, with appropriate metadata to guide users on allowable use cases and analysis.

“If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful”

This sounds like a significant undertaking, and there’s no getting around the fact that doing data lakes right does take time and effort, but it is possible to take a staged approach. Many organisations start with a data “puddle” – a small collection of computers hosting a limited amount of data — and then slowly add to this, increasing the number of computers over time to form the full data lake.

A question of culture

In addition, technical considerations are just one side of the coin. The other side is one of culture. At the core of the problem is that businesses will not succeed with commercialising their IoT data if users are either unaware of, or distrusting of, the data lake and its potential.

While investment in big data continues to grow, a recent NewVantage Partners survey on Big Data and AI found that just 31 percent of organisations consider themselves data driven — the second year in a row that the number has fallen. Data lake technology has been around for several years now, and should be more than capable of enabling these types of organisations, but without the right culture in place, its benefits are seldom felt.

How do you create a culture that centres on being data-driven? As any management team knows, culture shifts are never easy, but a data-driven culture boils down to improving collaboration, communication and understanding between data professionals and business functions.

With a successful technical implementation of a data lake, you then need data professionals to advocate its benefits, and liaise with business departments to understand the types of insights that would be most useful to inform strategic decisions.

This then reinforces business confidence in the data function, and allows the data teams to expand their contributions to the business and be recognised for their hard work. When supported by senior buy-in, this positive feedback loop generates a growing culture of data savviness and data-driven approaches within the organisation.

Brain of the organisation

When technical capabilities and company culture combine, data lakes can become a powerful brain at the heart of the business. With the right analytics tools layered over the top, data lakes can reduce the time to finding insights and surface powerful information. These insights can serve business needs better and faster and are an outright win for any organisation. In short, they are well worth the time and investment.

Author: Dean Wood, Principal Data Scientist

50 shades of R
Blogs home Featured Image

 

I’ve been joking about R’s “200 shades of grey” on training courses for a long time. The popularity of the book “50 Shades of Grey” has changed the meaning of this statement somewhat. As the film is due to be released on Valentine’s Day I thought this might be worth a quick blog post.

Firstly, where did I get “200 shades of grey” from? This statement was originally derived from the 200 available named colours that contain either “grey” or “gray” in the vector generated by the colours function. As you will see there are in fact 224 shades of grey in R.

greys <- grep("gr[ea]y", colours(), value = TRUE)

length(greys)

[1] 224

 

This is because there are also colours such as slategrey, darkgrey and even dimgrey! So lets now remove anything that is more than just “grey” or “gray”.

 

greys <- grep("^gr[ea]y", colours(), value = TRUE)

length(greys)

[1] 204

 

So in fact there are 204 that are classified as “grey” or “gray”. If we take a closer look though its clear that there are not 204 unique shades of grey in R as we are doubling up so that we can use both the British, “grey”, and US, “gray”. This is really useful for R users not having to remember to change the way they usually spell grey/gray (you might also notice that I have used the function colours rather than colors) but when it comes to unique greys it means we have to be a little more specific in our search pattern. So stripping back to just shades of “grey”:

 

greys <- grep("^grey", colours(), value = TRUE)

length(greys)

[1] 102

 

we find we are actually down to just 102. Interestingly we don’t double up on all grey/gray colours, slategrey4 doesn’t exist but slategray4 does!

So really we have 102 shades of grey in R. Of course this is only using the named colours, if we were to define the colour using rgb we can make use of all 256 colour values!

 

 

So how can we get 50 shades of grey? Well the colorRampPalette function can help us out by allowing us to generate new colour palettes based on colours we give it. So a palette that goes from grey0 (black) to grey100 (white) can easily be generated.

 

shadesOfGrey <- colorRampPalette(c("grey0", "grey100"))

shadesOfGrey(2)

[1] "#000000" "#FFFFFF"

 

And 50 shades of grey?

 

R 50 shades of grey

 

fiftyGreys <- shadesOfGrey(50)

mat <- matrix(rep(1:50, each = 50))

image(mat, axes = FALSE, col = fiftyGreys)

box()

 

I hear the film is not as “graphic” as the book – but hope this fits bill!

 

Author: Andy Nicholls, Data Scientist

 

Blogs home Featured Image

Finding the Use Cases

So you’ve gathered the data, maybe hired some data scientists, and you’re looking to make a big impact.

The next step is to look for some business problems that can be solved with analytics – after all, without solving some real business challenges you’re not going to add much value to your organisation!

As you start to look for analytics use cases to work on, you may soon find yourself inundated with a range of possible projects. But which ones should you work on? How do you prioritise?

Mango have spent a lot of time over the last few years helping organisations to identify, evaluate and prioritise Analytic Use Cases. Picking the right projects-—particularly early on in your data-driven adventure-—will have a significant impact on the success of your analytic initiative. This article is based on some of the ways in which we coach companies around the building of Analytic Portfolios and what to look for in projects.

Evaluating Analytic Use Cases

The prioritisation of analytic use cases will be largely driven by the reason your data initiative was created and what ‘success’ for your team really looks like.

However, for this post, I’m going to assume the aim of your initiative is ultimately to add value to the organisation, where success is measured in financial terms (either saving money or adding revenue).

Generally, you’ll probably want a mixture of tactical and strategic initiatives – get some quick wins under your belt while you’re working on those bigger, longer-term challenges. However, when you’re looking at projects to work on you should consider a number of aspects:

  1. The Problem is Worth Solving

This might sound obvious, but a big factor in assessing an analytic use case is the potential value it could add. Delivering a multi-million pound project that decides what colour to paint the boardroom isn’t going to win many fans.

Ensure you understand:

  • How delivering this project would add value to your organisation
  • Exactly how that value will be measured
  1. The Building Blocks are in place

Understanding the ‘readiness’ (or otherwise) of a project to be delivered is a major factor in determining whether to prioritise it. Key aspects to consider include:

  • Data – is there enough data of sufficient data to solve this challenge?
  • Platform – is the technical platform in place to enable insight to be derived?
  • Skills – do you have the skills required to implement the solution?
  • Deliver – is there a mechanism in place to deliver any insight to decision makers?
  1. The Analytic Use Case is Solvable

The world of analytics is awash with marketing right now, promising silver-bullet solutions based on Machine Learning, AI or Cognitive Computing. However, the simplicity or otherwise of a potential solution should be considered when prioritising a use case. You don’t want to end up with a portfolio of projects whose solutions are at the periphery of what’s currently possible.

  1. The Business is Ready to Change

This is–without doubt–the primary factor in the success (or otherwise) of an analytic project. You could have the best data, write the best code and implement the best algorithm – but if the business users don’t behave differently once the solution is implemented, the value you’re seeking won’t be realised.

Before you build, make sure the business is willing to change their behaviour.

Evaluating possible projects in this way can help you to build a portfolio of Analytic Use Cases that will add significant, measurably value to your organisation. Moreover, making the right decisions early can help you build momentum around data-driven change, leading to a more-engaged business community ready for change.

Mango Solutions can help you navigate this process successfully. Based on insight and experience gained over 15 years working with the world’s leading companies, we have developed 3 workshops to help overcome some of the common challenges and roadblocks at different stages of your journey.

Find out which of the three workshops would be valuable to your organisation here.