minimally viable data scientist
Blogs home Featured Image

Identifying World-Class Data Science Capabilities with Data Science Radar

Demand for workers with specialist data and analytic skills such as data scientists and data engineers has more than tripled over five years (+231%), according to a labour market analysis commissioned for Dynamics of data science skills, a Royal Society report. By contrast, the demand for all types of workers grew by 36% over the same period.

As organisations across all sectors embrace data-driven transformation, the need to identify such capabilities and upskill internal data communities has become ever more urgent.

Drawing on nearly two decades of analytical expertise honed from working with some of the world’s most complex industries, we have developed a comprehensive assessment of data science competence, Data Science Radar.

Data Science Radar assesses individuals and teams against 6 core data science traits which can help define the existing data science competencies, align strategic approaches to learning & development and resource projects to maximise value and help retain talent.

The six core capabilities of data scientists and consolidate the analytic skill required for data science teams include:

Communicator

Data doesn’t sell itself; it needs a communicator to guide the way. Because of this, many great data scientists are master communicators.

As a Communicator you:

  • Are able to lead key business decision makers into an ongoing conversation with data, rather than carrying out ad hoc analyses.
  • Have a natural ability to communicate complex technical details to non-technical audiences
  • Have an understanding of the wider implications of the project, and so convey the key analysis insights to influence new business directions.
  • Understand that communication is a two-fold process: explain well, listen well.
  • Listen for business challenges, define requirements and clarify how data analysis can help.
  • Have the ability to speak a language that each of your stakeholders understands – even on projects using highly technical software and mathematical methods.

Data-Wrangler

All great analysis starts with a dataset. Or rather starts with data in multiple locations, in different formats, languages and timezones.

As a Data-Wrangler you:

  • Understand that defining the question and the approach to creating insight stems from getting the data into a useable format.
  • Extract, manage and combine data from a variety of sources in a highly efficient manner.
  • Delight in the bottom-up approach that fully immerses you in problem solving, as it is in the detail where understanding of a system can be gained.

Programmer

When you are at the forefront of innovation, often the tools needed to solve a problem simply do not exist.

As a Programmer you:

  • May already be a master of multiple technical languages and enjoy adding languages to your skillset.
  • Combine carefully constructed analysis workflow, with robust pipelines to automate as much of the process as possible, but enjoy building applications from scratch.
  • Understand the value of planning, and know that thinking through an analysis is more efficient and less error prone than starting an analysis and seeing what sticks.
  • Always ensure that your customers are not misled by your work by providing thorough reporting, including lists of assumptions and descriptions of algorithms along with your findings.
  • Use software development approaches such as unit testing and version control to ensure that costly mistakes are not caused by your work.
  • Your cool head and rational approach serve as a great counterpoint to team members whose focus on the big picture can lead to key details being missed.

Technologist

Never satisfied with good enough, you find the best tool to aid with every challenge.

Since every challenge is different, it is often faster and more efficient to use technologies that have been created elsewhere, rather than reinventing the wheel.

As a Technologist you:

  • Are continually interested in exploring how evolving tools and techniques can add value to the data science workflow.
  • Modern multi-purpose programming languages provide the perfect environment to stand on the shoulders of giants and truly see further than others.
  • Know how to use many different technologies, allowing you to educate your team on possible ways to interrogate a dataset.
  • Are creative and relish using novel approaches when it comes to problem solving

Modeller

By creating quantitative descriptions of your data, you create insight that is a key deliverable for your team.

As a Modeller you:

  • Interpret the meaningful reasons for features in a dataset.
  • Pay attention to the detail of underlying assumptions, limits and exceptions when describing a system.
  • Are familiar with a variety of mathematical methods for describing dynamic systems and are highly skilled in using software that implements these.
  • Use a variety of graphical and numeric techniques to verify that you are delivering a high quality result that can be used to predict and optimise future performance.
  • Are the ultimate investigator – when you’re on the team, if there is information that can be gleaned from a system, you’ll find it.

Visualiser

You convert information into a landscape that can be explored with the eyes to create an information map.

This skillset is absolutely indispensable for organisations that are lost in information. That said, you don’t like to be kept on a short leash.

As a Visualiser you:

  • Welcome the chance to experiment and explore all possible data opportunities and the new product avenues it can open.
  • Have a creative urge to go beneath the surface to uncover creative data solutions that can permeate the entire business.
  • Are imaginative when conveying information visually, and use a variety of graphical tools to ensure the patterns you find are presented coherently; both to internal and external customers.
  • May be the most important storyteller in your the team – afterall, seeing is believing.

Personal Radars

Following an initial assessment, this SaaS solution provides each team member with their own personal radar, highlighing their personal skills and strengths, whilst highlighting a path for potential learning and development.

A typical Radar for an experienced Consultant might look like this:

 

Whereas a Junior Data Scientst’s might look like this:

 

Doug Ashton, our Principal Data Scientist shares his Radar and key traits as:

  • Programmer
  • Communicator
  • Modeller-Technologist

Doug explains how the Radar has helped his work at Mango Solutions:

What impact has the radar had on your recent work?

“Thinking about data science through the lens of the radar has helped me organise teams. Between us, and as you would expect in our Consultancy, we can create a pretty full radar by combining consultants’ skills. Rather than attempting to do everything myself, I look to the radar for colleagues who specialise in that area. A most recent example included working alongside a specialist visualiser, helping to build a vocabulary of graphics for a client to create a consistent look and feel across graphics. Working alongside, gave me opportunity to develop my own skills”, said Doug.

Which parts of your radar would you like to improve the most and why?

“I consider myself an experienced modeller but one thing the radar has reminded me is how big the modelling world is. It’s more than just one type of machine learning method. So, I would like to constantly improve and expand my modelling skills and understanding across a broad range of areas. This is vital to keep up with the pace of advancement in data science. Rather than try to improve my visualiser score I prefer to focus on my strengths and to make best use of some of my expert visualising colleagues. Of course, that doesn’t mean I don’t enjoy the odd ggplot from time to time!”.

 

Know your team’s skills and capabilities

A thorough understanding of capabilities and skill level mapped against these traits for the team, allows Managers to qualify strengths, areas for development and enables analytic leads to make effective decisions for delivering data-driven value.

Building a centralised hub of excellence for best practice and stakeholder collaboration for a data community is key to data -driven transformation and avoids what might otherwise be potentially siloed and untapped resource.

 

Want to learn more?

Data Science Radar is a software solution that can help you define and build the right data science capability to align to your strategic objectives. To request a demo click here.

Building a successful data science team
Blogs home Featured Image

What does your ‘Minimally Viable Data Scientist’ look like?

The creation of data-driven value requires the right skills, and that means bringing together an analytic community of data professionals. But before we run off hunting unicorns, there are a number of things we need to be aware of:

1) There is no agreed definition of Data Science

Whilst “data science” as a term is typically attributed to Professor Jeff Wu (who coined the term as a suggested rebranding for “statistics”) the modern use of the word stems the marketing hype surrounding the big data movement. At Mango, we define Data Science as “the proactive use of data and advanced analytics to drive better decision making”, but every data scientist you speak to will have their own favourite definition. Note: if you want to read more about our definition, read this.

Given that there is no single agreed definition of “data science”, there is also no agreed definition of “data scientist”. So, within reason, “data science” can mean different things across different organisations. Understanding what data science means to you, and what the business believe it will deliver (i.e. why you’re investing in it in the first place), is a great starting point to understand the skills you’ll need in your team.

2) Data Science requires a mix of skillsets

When you start looking for definitions of “data scientist”, you’ll quickly run into a variety of venn diagrams. This is because there are a range of skills required to actually deliver data science outputs, particularly in a complex, commercial organisation. If we consider the Mango definition (above), then we need to be able to:

• Manipulate data sources of varying shapes and sizes
• Model the data using a broad range of analytic approaches
• Engage effectively with stakeholders across the business to ensure the “change” lands
• Create production-grade code to deliver the insight into the hands of decision makers.

At Mango, we believe a Data Science function needs a mixture of 6 different skillsets to succeed. These are represented on our Data Science Radar:

3) Unicorns. Don’t. Exist

When we start thinking about these combinations of skills, it is important to note that (sadly) … Data Science Unicorns do not exist. Having interviewed “data scientists” for ~20 years now, I’m yet to meet one. When I say “unicorn” here, I mean someone with the full set of data science skills on the radar.

4) Data Science is a Team Sport

Because of this mixture of data science skills (which isn’t necessarily found in a single person), together with the potential “always on” requirement of analytics in a data-driven company, Data Science is seen very much as a “Team Sport”. In other words, whilst we may not be able to find unicorns, we can build a team that (together) have the skills needed to deliver data science.

Building a great Data Science Team

Given the above, building a great data science team is about:

• Understanding the skills you’ll need in your team to deliver on your objectives
• Knowing what your ‘Minimally Viable Data Scientist’ looks like
• Hire “spiky people” (people with key strengths that you can bring together)

So … what is a ‘Minimally Viable Data Scientist’?

Your ‘Minimally Viable Data Scientist’ (MVDS) is a theoretical data scientist who has the minimal skills required to operate as part of your team.

For example, at Mango our “MVDS” has to have strong “Programmer” and “Technologist” skills, since our data science work is built on good programming foundations in R and Python and developed using software development approaches. As a business, we’re also Consultants, so soft skills are of importance. Beyond that, we need at least a solid grounding in modelling, visualisation and data wrangling. So our “MVDS” looks like this:

If we’re screening data scientist candidates, understanding whether they meet this minimal threshold allows us to understand whether they will be able to operate within the team. However, we’re then looking for “spiky people”.

Hiring “Spiky People”

Using the Data Science Radar during the recruitment process allows us to quickly understand whether a candidate meets the “MVDS” threshold, but also allows us to understand their skills across these 6 axes. Next we’re looking for “spikes” in the chart that represent particular specialisms for an individual. For example, consider the following 3 (theoretical) candidates:

Each of these candidates pass the “MVDS” threshold and have specialisms in at least 1 area (i.e. spikes). What we’re looking for here when we’re interviewing is how these specialisms complement the rest of the team. The “MVDS” approach means we know these 3 individuals can operate well within our team, but then it’s a case of looking at skill gaps, building these capabilities and understanding how these specialisms will impact the communal skillsets we already have.

In Summary

When hiring for your data science team, it’s important to understand what you’re hoping to achieve and therefore what skills you’re going to need in the analytic community as a whole. By having a strong understanding of your ‘minimal’ skillset, you can clarify the threshold beyond which people could operate within your team – this allows you to focus on adding people with specialisms you need to succeed.

Why not join our webinar for a guide to Building the Ultimate Data Science Team. 

RStudio::Conf 2020
Blogs home Featured Image

Dude: Where are my Cats?  RStudio::Conf 2020

It may not have been the start to the conference that we planned as RStudio Full Service Certified partners. – did you see the lonely guy on social media? Yes, that was me, and I’m here to tell the tale…

Eventful as it was at the time, I have to say this was the first RStudio Conference I have had the pleasure to attend since joining Mango Solutions. The things that really stood out for me were the event’s ubiquitous and thought-through inclusivity and the fantastically run and well organised event for nearly 2400 R users worldwide. Here’s a summary of our time in San Francisco, what it had to offer and why we are immensely proud partners of RStudio.

Cat rehoming 10:41am San Francisco time

Held up in customs, the conference started without our exhibition stand, materials and conference goodies, the famous Mango cats. I remain ever thankful to the whole #rstats community, who despite this little hiccup, took pity on us and came to visit us anyway. What I was able to quickly grasp, was that this is a community that is so quickly available to support others, present a forum to share ideas and learn how to solve problems, in particular learn how others are benefiting from using R.

Public Benefit Corporation

A vital and impressive moment of the conference was the standing ovation for J.J Allaire after his announcement that RStudio had become a Public Benefit Corporation. You could feel the appreciation in the room for RStudio’s innovation and how it had pushed the R Community forward.  He discussed their future plans which provides growth opportunities for the community.

From a content perspective, the RStudio::conf was a great event, filled with informative and well organised workshops and talks. As hard as it is to pick out one particular talk, it was probably Jenny Bryan’s talk: “Object of type ‘closure’ is not subsettable”; this was all about debugging in R – best approaches, available tools and hints on how to write more informative error messages in your own functions. It was engaging, informative, witty and it was relevant to pretty much every single R developer on this planet, let alone present in the room.

Amongst other things, the Mango team of Data Scientists really appreciated these packages which the RStudio team featured as part of their workshops:

  • The best ways to scale up you API using plumber package
  • Custom styling of Shiny apps using bootstraplib package
  • Effective R code parallelization using future and furrr packages
  • Load testing using loadtest package

 

Inclusivity all round

Inclusivity was felt not only with the RLadies breakfast, but also in having prayer rooms, quiet rooms for neurodiverse attendees, the gender-neutral bathrooms, diversity scholarships and very frequent reminders of the event’s code of conduct that revolves heavily around inclusivity and tolerance. Great organisation was shown not only in a suitable venue, but also in every effort that went into ensuring that queues for food/buses didn’t stay long, that there was enough time to change rooms between the talks and via the great entertainment/perks throughout the event.

Endless networking opportunities

RStudio::conf 2020 was a fantastic place to meet and connect with other people in the industry and gain insight into how other companies, data science teams and individuals are using R and the underlying infrastructure that supports it. For Ben our Data Platform Consultant, it was interesting and exciting to hear from a platform perspective about the needs of data science teams, and how we could potentially solve the challenges they are facing. A recurring issue seemed to be in scaling R in a production environment and the best way to do this. Ben found the Renv talk interesting and hopes to be using it more this year in place of Packrat.

For Mango it was a real pleasure to discuss at large the wealth of opportunity presented by ValidR in our validated production-ready version of R.

A huge thank you to everyone at RStudio for supporting my first conference with RStudio.  It was truly a pleasure to meet the team in person and has really given Mango and RStudio the opportunity to consolidate our partnership to the next level.

 

Author: Rich Adams, Solutions Specialist

The importance of communicatiomn soft skills
Blogs home Featured Image

The Importance of Communication for Data Science and Analysis

Data Science is often viewed as a purely technical discipline that encompasses all kinds of mind-boggling mathematics, software development skills and domain knowledge. Often Data Scientists will focus their learning and development areas in these three disciplines and neglect to give the development of their soft skills the same attention. 

Perhaps the most important, and most neglected soft skill of all is communication. After all, if you can’t adequately explain your work to your peers and more importantly senior leaders, organisations will not act upon your information. At Mango we see these issues all the time, and this was partly the inspiration for our Trusted Consultant training course which encompasses planning, organisation and you guessed it, communication. 

In this blog we’ll look at a few simple approaches that we teach in our trusted consultant course that can improve communication skills. 

Keep Your Audience in Mind

The mistake we see most often is not keeping the audience in mind, both in terms of technical ability and in terms of what they’re most interested in. This is especially true of presentations. 

A good example of this is senior leadership. They’re most interested in the bottom line… How data and data science can unlock the potential in an organisation, most often through improving efficiency and increasing profitability. Senior leaders don’t want to sit through formulae, complex charts and detailed explanations. They care most about impact, so a presentation to them should be tailored toward this, and ideally the outcome should be at the very start of a presentation and not the very end. 

 

Do the Simple Things Well

As with most things in life, 80% of the battle with communication, is doing simple things well and this is especially true when it comes to charts.  As a base rule every chart should have: 

  • Title 
  • Axis labels 
  • Legend / key 
  • Dates / Time period (if appropriate) 
  • Data Source 

A little effort with the design can also make a big difference, particularly through emphasis. The most common thing people do is more place emphasis on more important aspects of a chart (e.g. the title) through increased font size or bold . This can work, but in many cases, it’s better to de-emphasise less important aspects of the chart instead.   

A good technique here is to colour less important chart aspects in a shade of grey instead of black, particularly the axis and gridlines. Less is sometimes more. 

Lastly, your presentation may also be scrutinised without you there to present it. It’s a good idea to provide a little narrative on the slide itself to make sure that it ‘travels well’ and can reach and influence a wider audience than you first presented it to. 

 

Be Memorable

Being memorable isn’t necessarily about organising a pyrotechnic display to accompany your presentation (that would probably make it memorable for all the wrong reasons) but ensuring that your message resonates with your audience so that they understand it and take it away. 

A very simple way to do this is using the Rule of Three. If you think about it, three is a very memorable number and we tend to remember things more so in threes than twos or fours. For example: 

  • The Good, the Bad and the Ugly 
  • came, I saw, I conquered 
  • Blood, Sweat and Tears 

You can applythe rule of three in a variety of ways, from splitting your presentation structure into three, by picking the three strongest supporting arguments alongside a proposal, or through summarizing the contents of your presentation into three key takeaways.  

Lastly, The more keenly eyed of you will notice that I’ve picked three things to write about as part of this blog. That’s not a coincidence! 

 

In closing

These techniques and more, are covered in detail as part of our Trusted Consultant training material. You can find our course and a summary of what it will cover on our course list page within our Training pages here.

 

Author – Tom Ewing

Tom is a Senior Data Scientist at Mango Solutions. He’s been a data scientist for 5 years and has spent the majority of his career in a variety of data, statistical and analytical roles before making the jump to data science.