are you on a data-driven coddiwomple?
Blogs home Featured Image

If like me, you attend data conferences, then there’s one word you will hear time and time again: “journey.  It’s an incredibly popular word in the data-driven transformation world, and it’s common to hear a speaker talking about the “journey” their business is on.  But, for me, I often struggle with that word.

 

Journey

The Oxford Dictionary defines a “journey” as follows:

Journey:
the act of travelling from one place to another

So to be on a journey, I feel we need to have a very clear understanding of (1) where we are travelling from and (2) what our destination is.

For example, as I’m writing this, I’m on a “journey” – I am travelling by train from my home (my starting point) to our London office (my destination).  Knowing my starting point and destination allows me to select the most appropriate route, estimate the time I need to invest to reach my destination, and allows me to understand my current progress along that route.  And, if I encounter any difficulties or delays on my journey (surely not on British trains!) then I know how to adjust and reset my expectations to ensure my route is appropriate and understood.

If we compare this to the use of the word “journey” in the context of data-driven transformation, I’m not entirely sure it fits.  When I speak with data and analytic leaders who are on a data-driven journey, it is surprising how often there is a lack of clarity over the destination, or their current position, which makes it very difficult to plan and measure progress.

But I see how the word journey has become so common – it conjures a sense of momentum and change which really fits the world of data-driven transformation.

 

Coddiwomple

However, I recently came across this incredible word, which I think may be more fitting. The origins of the word are unknown, but it is defined as follows:

Coddiwomple: 
to travel in a purposeful manner towards a vague destination

Despite being a lovely word to use, I think it is a far more appropriate description of many data-driven “journeys” I have encountered.

Know your destination

So if you’re currently on a “data-driven coddiwomple” and want to be on a “data-driven journey”, then you need only decide on a destination – in other words, what does a “data-driven” version of your current business look like?  In my experience, this can vary significantly – I’ve worked with organisations who see the destination as everything from a fully autonomous company to a place with highly disruptive business models.

Once this is decided, then you can build data-driven maturity models to measure your value and inform downstream investments – in the meantime, “Happy Coddiwompling!!”

 

Modspace collaboration software for data science teams
Blogs home Featured Image

The ability to collaborate and share resources and information across distributed, and often siloed, data science teams is a common goal for today’s managers and team leads.  To maximise efficiencies, one centralised location that facilitates collaboration across projects is the ultimate aim for team success and best practice. Gone are multiple workflows, incomplete activity or locally held models and datasets meaning hours of wasted time and unreproducible efforts.

Collaboration is the key across Data Science teams

Mango’s team of consultants share these frustrations of in-house data science teams and in answer to this common business challenge Mango developed Modspace: a single repository that allows data science teams – modellers and analysts – to work with non-technical staff to streamline the collaboration process by creating a virtual workspace where teams can meet, work and prioritise deadlines.

A single computational environment accessible by all

Providing a single meeting point that integrates analytic development environments with traditional desktop applications, Modspace has the additional functionality of a powerful search engine to make this valuable information quickly accessible to all.

The result is the ability to manage complex tasks across a team ensuring projects with multiple needs and requirements are taken care of.

Mango Solutions’ Product Manager, Richard Kaye, believes ModSpace is the future of project collaboration: “It has often been said that data is the new oil.  A company’s analytic IP – code and models that unlock the insight from their data – is a highly valuable asset. The management of this asset is of increasing importance as organisations embark on a journey  to leverage their data to make more informed, data-driven decisions.  ModSpace was specifically built to support this journey – offering controlled storage of information and powerful search facilities in a single computational environment accessible by all.”

A proven platform

ModSpace offers teams a centralised environment for your data science projects:

  • A safe centralised repository for all file types, from scripts and datasets to MS Office documents
  • An industry-leading text search engine powered by strong text parsing capabilities which helps team members find valuable resources efficiently
  • The ability to categorise and describe projects and content using configurable metadata, thus enabling easy location of related and useful work
  • A newsfeed to enhance visibility of project activity, along with public and private feedback channels
  • Control over who has access to what, to keep sensitive information private
  • Access to industry-standard transparent version control, which assists with compliance and integrity
  • Compatibility with data analysis and modelling tools, such as R, Python, MATLAB, SAS and NONMEM, allowing users to interact easily with all files held in the repository

Richard continues, “Some of the world’s biggest pharmaceutical companies currently use ModSpace due to its flexibility to allow them to store everything in one place. Mango’s own Consultants use it as an internal repository – storing and collaborating on projects. It streamlines our processes and allows our teams to work on projects, no matter where they are, and without concern for having work overwritten or lost.”

For a product demonstration or a discussion about how ModSpace can help improve the way your data science team works powering innovation, collaboration and efficiency, call

data scientist or data engineer - what's the difference?
Blogs home Featured Image

Author: Principal Consultant, Dean Wood

When it was floated that I should write this article, I approached it with trepidation. There is no better way to start an argument in the world of data than by trying to define what a Data Scientist is or isn’t – by adding in the complication of the relatively newly appearing role of Data engineer, there is no way this is not going to end in supposition and a lot of sentences starting “I reckon”.

Nevertheless, understanding the value of these two vital roles is important to any organisation that seeks to unlock the value found in its data – without being able to describe these roles, it’s next to impossible to recruit appropriately. With that in mind, here is what I reckon.

‘By thinking in terms of rigidly defined boxes we are missing the point. A Data team should be thought of as covering a spectrum of the range of skills you need for effective data management and analytics. Simple boxes like Data Scientist and Data Engineer are useful, but should not be too rigidly defined.’

Reams have been written attempting to define what a Data Scientist is. The data science community has careered from expecting a Data Scientist to know everything from Dev Ops to statistics, to a Data Scientist needing to have a PhD which leads to large institutions giving up and just rebranding their BI professionals as Data Scientists. All of this misses the point.

Then arise the Data Engineer. No longer is your IT department the custodians of the data. The role has become too specialist and critical to the business for those who have worked really hard to understand traditional IT systems and think things like 3rd Normal Form is something in gymnastics and Hadoop is a noise you make after eating a kebab. Completely understandably, data has outgrown your average IT professional, but what do you need to make sure your data is corralled properly? Can’t we just throw a Data Scientist at it and get them to look after the Data? Again, I think this misses the point.

Human beings are good at putting things into boxes and categories. It is how we manage the world and it is largely how we are trained to manage our businesses. Our management accountants take care of the finances and our HR department takes care of our employees. However, by putting people in these boxes with fairly rigid boundaries, there is a risk that necessary skills are missed and you end up with a team across your organisation that cannot provide what the business needs.

This is particularly true when we come to think of Data Scientists and Data Engineers. Rather than thinking of people in terms of the box to put them in, when looking at building your data team it is preferable to think of a spectrum of skills that you need to cover. These can be broadly put into the boxes of Data Scientist and Data Engineer, however the crossover can be high as can be seen in the diagram above.

In your Data Engineering team you will need individuals with a leaning towards the world of Dev Ops and you will need team members who are close to Machine Learning engineers. Likewise, in your Data Science team you will need members who are virtually statisticians, and team members who know something about deploying a model in a production environment – making sure your team as a whole and your individual project teams cover this skill mix, can be a real challenge.

So in summary, I reckon that we need to stop thinking about the boxes we put people in quite so much, and start looking at the skills we actually need in our teams to make our projects a success. Understanding the Data Scientist/Data Engineer job roles as a spectrum of skills where you may need Data Engineer-like Data Scientists, and Data Scientist-like Data Engineers, will give you more success when it comes to building your data team and delivering value from your data.

Data for Good
Blogs home Featured Image

Good triumphing over evil in the end is the stuff of every good fairy tale or Hollywood storyline, but in real life, as we all know, it’s usually the tales of political doom and gloom across the world that dominate our screens with stories of good remaining well away from the spotlight.  Good, it seems, does not make for high viewing figures.

And stories about data are no exception to this rule.

Barely a day goes by without a story alarming the general public about their privacy and how their information is being used.  Think about the investigative documentary about the Cambridge Analytica Scandal, The Great Hack.  Think about banking information leaks or how Facebook is using your personal details and preferences.

It’s easy to forget that data science is also shaping the way we live, improving lives for the better and providing services we could only have dreamed of decades ago.

And it’s for this reason, that we at Mango decided to celebrate #Data4Good week, showcasing all of the different ways data science and analytics can be used for good in the world.

When The Economist declared in 2017 that data was more valuable than oil, few people truly understood its power and how this was possible.  Fast-forward to the current day, and the picture of data usage is becoming clearer.

In September, we were fortunate enough to secure some incredible speakers at our annual EARL Conference in London who shared stories of how data science has benefited services from local communities, to healthcare and even helping progress peace talks in war stricken areas.  We have shared some of these stories via Twitter and I would urge you to take a look.

R in the NHS

 When it comes to healthcare, analysis and prediction can be used to better inform decisions about healthcare provision, streamline and automate tasks, dig into complex problems, and predict changes in the healthcare the NHS provides to its patients. Because of this, many non-profit organisations want to harness the power of data science.

During EARL, Edward Watkinson, an analyst, economist and data scientist currently working for the Royal Free London Hospital Group, took to the stage to explain how adopting R as the core tool on their analytical workbench for helping to run their hospitals, and show how useful it has been in the cash-strapped NHS.

You can watch Edward’s ten minute lighting talk here.

Helping local communities

 Another great use-case for data science being used for good, is how it can help local communities. David Baker, research and evaluation residential volunteer worker at Toynbee Hall, took to the stage at EARL to explain how Toynbee Hall has adopted R as a tool within the charity sector.

By way of background, since its inception in 1884, evidence-based research has been central to Toynbee Hall’s mission as a charity throughout East London communities. It has had a hand in creating first data visualisations for public good, publishing a series of poverty maps, and regularly engages with the local community to solve problems.

R has allowed for rapid analyses of data on a diversity of projects, at Toynbee Hall. Additionally, Baker explained how embracing an open source software allowed for the team to host a series of data hackathons, that allowed them to recruit freelance data scientists to help analyse publicly available datasets, contributing to building materials that they use in their policy advocacy campaigns.

You can catch-up on David’s talk here.

Some amazing work has been done through the power of data science and analytics, and it’s continually changing the world around us. These stories won’t ever make the news, but it’s reassuring to remind ourselves that sometimes, good really can triumph over evil in the real world as well as in the movies.  We hope that people are beginning to see that data science is more than just a buzzword – it’s a new hope for good.

integrating Python and R
Blogs home Featured Image

For a conference in the R language, the EARL Conference sees a surprising number of discussions about Python. I like to think that at least some of these are to do with the fact that we have run 3-hour workshops outlining various strategies for integrating Python and R – here’s how:

  • outline the basic strategy for integrating Python and R;
  • run through the different steps involved in this process; and
  • give a real example of how and why you would want to do this.

This post kicks everything off by:

  • covering the reasons why you may want to include both languages in a pipeline;
  • introducing ways of running R and Python from the command line; and
  • showing how you can accept inputs as arguments and write outputs to various file formats.

Why “And” not “Or”?

From a quick internet search for articles about “R Python”, of the top 10 results, only 2 discuss the merits of using both R and Python rather than pitting them against each other. This is understandable; from their inception, both have had very distinctive strengths and weaknesses. Historically, though, the split has been one of educational background: statisticians have preferred the approach that R takes, whereas programmers have made Python their language of choice. However, with the growing breed of data scientists, this distinction blurs:

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. — twitter @josh_wills

With the wealth of distinct library resources provided by each language, there is a growing need for data scientists to be able to leverage their relative strengths. For example: Python tends to outperform R in such areas as:

  • Web scraping and crawling: though rvest has simplified web scraping and crawling within R, Python’s beautifulsoup and Scrapy are more mature and deliver more functionality.
  • Database connections: though R has a large number of options for connecting to databases, Python’s sqlachemy offers this in a single package and is widely used in production environments.

Whereas R outperforms Python in such areas as:

  • Statistical analysis options: though Python’s combination of ScipyPandas and statsmodels offer a great set of statistical analysis tools, R is built specifically around statistical analysis applications and so provides a much larger collection of such tools.
  • Interactive graphics/dashboardsbokehplotly and intuitics have all recently extended the use of Python graphics onto web browsers, but getting an example up and running using shiny and shiny dashboard in R is faster, and often requires less code.

Further, as data science teams now have a relatively wide range of skills, the language of choice for any application may come down to prior knowledge and experience. For some applications – especially in prototyping and development – it is faster for people to use the tool that they already know.

Flat File “Air Gap” Strategy

In this series of posts we are going to consider the simplest strategy for integrating the two languages, and step though it with some examples. Using a flat file as an air gap between the two languages requires you to do the following steps.

  1. Refactor your R and Python scripts to be executable from the command line and accept command line arguments.
  2. Output the shared data to a common file format.
  3. Execute one language from the other, passing in arguments as required.

Pros

  • Simplest method, so commonly the quickest
  • Can view the intermediate outputs easily
  • Parsers already exist for many common file formats: CSV, JSON, YAML

Cons

  • Need to agree upfront on a common schema or file format
  • Can become cumbersome to manage intermediate outputs and paths if the pipeline grows.
  • Reading and writing to disk can become a bottleneck if data becomes large.

Command Line Scripting

Running scripts from the command line via a Windows/Linux-like terminal environment is similar in both R and Python. The command to be run is broken down into the following parts,

<command_to_run> <path_to_script> <any_additional_arguments>

where:

  • <command> is the executable to run (Rscript for R code and Python for Python code),
  • <path_to_script> is the full or relative file path to the script being executed. Note that if there are any spaces in the path name, the whole file path must me enclosed in double quotes.
  • <any_additional_arguments> This is a list of space delimited arguments parsed to the script itself. Note that these will be passed in as strings.

So for example, an R script is executed by opening up a terminal environment and running the following:

Rscript path/to/myscript.R arg1 arg2 arg3

A Few Gotchas

  • For the commands Rscript and Python to be found, these executables must already be on your path. Otherwise the full path to their location on your file system must be supplied.
  • Path names with spaces create problems, especially on Windows, and so must be enclosed in double quotes so they are recognised as a single file path.

Accessing Command Line Arguments in R

In the above example where arg1arg2 and arg3 are the arguments parsed to the R script being executed, these are accessible using the commandArgsfunction.

## myscript.R

# Fetch command line arguments
myArgs <- commandArgs(trailingOnly = TRUE)

# myArgs is a character vector of all arguments
print(myArgs)
print(class(myArgs))

By setting trailingOnly = TRUE, the vector myArgs only contains arguments that you added on the command line. If left as FALSE (by default), there will be other arguments included in the vector, such as the path to the script that was just executed.

Accessing Command Line Arguments in Python

For a Python script executed by running the following on the command line

python path/to/myscript.py arg1 arg2 arg3

the arguments arg1arg2 and arg3 can be accessed from within the Python script by first importing the sys module. This module holds parameters and functions that are system specific, however we are only interested here in the argv attribute. This argv attribute is a list of all the arguments passed to the script currently being executed. The first element in this list is always the full file path to the script being executed.

# myscript.py
import sys

# Fetch command line arguments
my_args = sys.argv

# my_args is a list where the first element is the file executed.
print(type(my_args))
print(my_args)

If you only wished to keep the arguments parsed into the script, you can use list slicing to select all but the first element.

# Using a slice, selects all but the first element
my_args = sys.argv[1:]

As with the above example for R, recall that all arguments are parsed in as strings, and so will need converting to the expected types as necessary.

Writing Outputs to a File

You have a few options when sharing data between R and Python via an intermediate file. In general for flat files, CSVs are a good format for tabular data, while JSON or YAML are best if you are dealing with more unstructured data (or metadata), which could contain a variable number of fields or more nested data structures. All these are very common data serialisation formats, and parsers already exist in both languages. In R the following packages are recommended for each format:

And in Python:

The csv and json modules are part of the Python standard library, distributed with Python itself, whereas PyYAML will need installing separately. All R packages will also need installing in the usual way.

Summary

So passing data between R and Python (and vice-versa) can be done in a single pipeline by:

  • using the command line to transfer arguments, and
  • transferring data through a commonly-structured flat file.

However, in some instances, having to use a flat file as an intermediate data store can be both cumbersome and detrimental to performance.

Authors: Chris Musselle and Kate Ross-Smith

Blogs home Featured Image

At Mango, we talk a lot about going on a ‘data-driven journey’ with your business. We’re passionate about data and getting the best use out of it. But for now, instead of looking at business journeys, I wanted to talk to the Mango team and find out how they started on their own ‘data journey’ – what attracted them to a career in data science and what they enjoy about their day-to-day work. (It’s not just typing in random numbers?! What?!)

We are hugely fortunate to have a wonderful team of data scientists who are always generous in sharing their skills or don’t mind teaching non data scientists R for uber beginners. So let’s see what they have to say on becoming a Mango…

Jack Talboys

Jack joined us last year as a year-long placement student 

“I actually had no idea what Data Science was until I discovered Mango about a year and a half ago. I was at the university career fair – not really impressed by the prospect of working in finance or as a statistician for a large company. I stumbled across Liz Matthews and Owen Jones who were there representing Mango, drawn in by the title “Data Science” we started talking. Data Science seemed to tick all of my boxes, being able to use my knowledge of statistics and probability while doing lots of coding in R.

I’m now 6 months in at Mango and it couldn’t be going better. I’ve greatly improved my proficiency in R, alongside learning new skills like Git, SQL and Python. I’ve been given a great deal of responsibility, assisting in delivering training to a client and attending the EARL 2018 conference making up some of my highlights. There have also been opportunities for me to be client-facing, giving me a deeper understanding of what it takes to be a Data Science Consultant.

Working at Mango hasn’t just developed my technical skills however, without really noticing I’ve found that I have become a better communicator. Whether organising tasks with the other members of the ValidR team or talking to clients I have discovered a new sense of confidence and trust in myself. Even as a relative newbie I can see that Data Science as an industry is growing massively – and I’m excited to be part of this growth and make the most of the exciting opportunities it presents with Mango.”

Beth Ashlee, Data Scientist

“I got into data science after applying for a summer internship at Mango. I didn’t really know much about the data science community previously, but spent the next few weeks learning more technical and practical skills than I had in 3 years at university.

I’ve been working as a Data Science Consultant for nearly 3 years and due to the wide variety of projects I’ve never had a dull moment. I have had amazing opportunities to travel worldwide teaching training courses and interacting with customers from all industries. The variety is my favourite part of the job, you could be building a Shiny application to help a pharmaceutical company visualise their assay data one week and the next teaching a training course at the head offices of large companies such as Screwfix.”

Owen Jones, Data Scientist

“To be honest, it rarely feels like work… since we’re a consultancy, there’s always a wide variety of projects on the go, and you can get yourself involved in the areas you find most interesting! Plus, you have the opportunity to visit new places, and you’re always meeting and working with new people – which means new conversations, new ideas and new understanding. I love it.”

Nick Howlett, Data Scientist

Nick is currently working on a client project in Italy.

“During my time creating simulations in academic contexts I found myself more motivated to meet my supervisor’s requirements than pursuing niche research topics. Towards the end of my studies, I discovered data science and realised that the client-consultant relationship was a situation very similar to this.

Working at Mango has allowed me to develop personal relationships with clients across many sectors – and get to know their motivations and individual data requirements. Mango has also given me the opportunity to travel on both short term training projects and more long term projects abroad.”

Jack Talboys

grow your own data science team
Blogs home Featured Image

How to Grow Your Own Data Teams – a practical guide for the data-driven C-Suite

Data today is the fuel driving the modern business world. It therefore stands to reason that the ability to read and speak data should be a fairly mainstream skill. Except it isn’t ­- yet. A 2018 report by Qlik suggests that just 24% of business decision were fully confident in their abilities with data. This is despite the fact that, according to the 2018 Gartner CIO Agenda, CIOs globally ranked analytics and business intelligence as the most critical technology to achieve the organisation’s business goals, with data and analytics skills topping the list as the most sought-after talent.

As more organisations embrace data-driven digital transformation, it’s clear that the need to upskill and resource data science teams has become far more pronounced. With the gap seeming to only become wider, how can the C-suite continue to leverage data-driven digital transformation if there are insufficient resources to fill it? With the widening gulf between the skills on offer and those emerging from tertiary education, and the demand for data literacy, it’s becoming incumbent upon the businesses themselves, led by the C-suite, to champion the drive towards a more data-savvy future.

Fortunately, a positive trend that is taking rapid shape is the emergence of data science and analytics capabilities across a much wider range of sectors. However, if that innovation is taking place in siloes, separated from the business, it likely isn’t delivering the results you need. So, what should a data-driven C-Suite do?

Nurture existing resources and growing your own data science teams

Pulling together existing disparate data science resources into a single, connected community of practice creates a secure foundation for the C-suite from which to grow its data science teams. If this single entity has a common understanding of the skill sets it has within the business already, the best practice examples for approaching different business scenarios, and an awareness of new tools and solutions that could help, it provides the most solid basis for working out where the talent pool needs to be extended with new hires or training.

Similarly, it’s important to encourage knowledge-sharing and innovation within this community. Organising team hackathons to boost cross-function collaboration and new ideas can be a great example of this, while hosting internal events which showcase successes can help motivate the team to deliver creative new solutions.

Encourage collaboration and knowledge-sharing

Building relationships between the data team and the business is critical for two things: ensuring the data science team understands the business’ problems and is producing useful insights, and for helping to “demystify” the data science process. Ensuring that the business as a whole has insight into the data available, how it can and cannot be used, and encouraging a dialogue between technical and non-technical professionals will foster curiosity and trust that ensures a productive data-driven culture.

How can the C-suite go about doing this? By organising short workshops that focus on delivering real ideas. If the existing data scientists can show the business more broadly how data science techniques can lead to real business outcomes that improve success for a business area, it is likely to encourage enthusiasm and optimism about the potential of data. As a result, this is more likely to drive members of the business team to look further into the types of analytics they might be able to learn and apply, as well as seek out both at-work and external training to support this.

Understand that everyone, to some extent, needs to be data literate

Drew Conway’s famous Venn diagram on what data science is made of stressed the importance of substantive expertise as an integral part of data science – and encouraging a culture of collaboration between business and analytics function ensures that this is the case. However, an argument can also be made in the reverse. Ensuring that the business has enough knowledge of data science to be able to accurately reflect and act on the findings of data-driven insights is just as important. Without this, insights discovered by the data science community will not be able to have the fullest possible impact on the business, or, at worst, could even end up being misunderstood and misinterpreted. Enabling training at all levels of data awareness will be critical – and this should even include training on how to use information to guide decision-making, and other non-technical topics.

This sort of training, repeatedly reinforced, will be essential for cutting through any data apathy that exists within the organisation. It’s important that even those resistant to change understand that the business is developing to be data-driven, and that a culture shift will come as part of this. Having a connected community of evangelists, both in the form of technical experts, and business enthusiasts who can continue to spread the message will be invaluable.

With these three steps, C-suite executives will find it far easier to grow data scientists throughout the organisation and invest more effectively in hiring new talent to fill any skills gaps. By encouraging a culture of data curiosity, and an awareness of the power and potential of data, as well as an interest in learning more, businesses are creating fertile ground to inspire a new generation of data scientists from any background.

Blogs home Featured Image

As the importance of using analytics to drive decision making continues to grow at pace, so too does the need to make data science more efficient and “deployable”. This drives many changes to the way in which analytics is performed, including:

  • Allowing data scientists to collaborate on analytic projects
  • Enabling discovery and re-use of code to avoid duplication of effort
  • Enforcing rigour (versioning, audit) without adding admin overhead
  • Centralising and standardising analytic code so it can be deployed via applications

Working with customers, and the RStudio team, Mango have been tasked with creating data science environments, leading to the development of the Data Science Workbench.

While the Data Science Workbench won’t be officially released for a few months, we couldn’t resist giving everyone a sneak peek at the way in which ModSpace (Mango’s collaborative analytics platform) and RStudio Server Professional can be integrated to create an effective R ecosystem.

ModSpace is a web based collaborative platform used by groups of analysts to store, share and discover models, data and scripts. It manages an underlying version control repository allowing full provenance over model/script development and outputs, so you can always return to exactly what went into that analytic report you sent out a while back(!). Working with the fantastic RStudio developers, we have integrated ModSpace directly with RStudio Server Professional, taking advantage of the authentication features of the Pro edition to enable single sign-on across both (web) applications. That means that groups of users can collaborate on a single managed project using their favourite R IDE in an efficient manner, with the version control, audit and metadata taken care of.

This video gives a quick view of a typical workflow between ModSpace and RStudio Server Professional:

 

 

his is just one aspect of the Data Science Workbench, but something we are very proud of. Look for the release of Data Science Workbench over the coming months.

In the meantime, if you want to get involved in the beta phase, or want more info on the other features of the Workbench, just contact info@mango-solutions.com.

And of course, a big thank you to the guys at RStudio for their support!

Python
Blogs home Featured Image

I have been asked this tricky question many times in my career – “Python or R?”. Based on my experience, if anything, the answer to this is totally dependent on purpose for purpose and is still a question that many aspiring data scientists, business leaders and organisations are still pondering over.

It is important to have the right and best tools when providing the desired answers to the many business questions within the data science space – which isn’t as simple as it sounds. When you consider Data Analytics, Data Science, Data Strategic Planning or developing a Data Science team, where to start from in terms of languages could be a major blocker.

Python has become the de facto language of choice for organisations seeking seamless creation or upscaling skills; and its influence is evident in the cloud computing environment. The fact of the matter is, according to the 20th annual KDnuggets Software Poll, Python is still the leader – top tech companies like Alphabet’s Google and Facebook continue to use Python at the core of their frameworks.

Also, some of the essential benefits of Python are its fluency and clarity in natural readability. It is easy to learn, and it provides much flexibility in terms of scalability and productionalization. There are many libraries or packages that have been created for purpose.

Data is everywhere

Data is everywhere, big or small. And loads of companies have it but are not harnessing the capabilities of these great assets. Of course, the availability of data without the “algorithms” will not add any business values. That is why it is important for companies and business leaders to get on fast and get the tool that helps to transform their data fundamentally into the viable economic positives they desire. By choosing Python, companies will be able to utilize the potential of their data.

Deployment and Cloud Capability

The Python capability is big and its impact is felt in the areas of Machine Learning, Computer Vision, Natural Language Processing and many others. Its robustness and growing ecosystem has made it easy for many deployment and integration tools. If you use Google Cloud Platform (GCP), Amazon Web Service (AWS) or Microsoft’s Azure, you will find the convenience of use and integration with Python. As a matter of fact, cloud technologies are growing at the fastest pace with ease as Python drives most applications on cloud.

Concluding Remarks

Considering a broad perspective, you might doubt if there is any question of supremacy between Python and R (or even SQL). But there is apparently a high variation in terms of needs and versatility. Python has been become a kingpin in terms of its user-friendliness, scalability and the extensive ecosystem of libraries and interoperability. Some popular libraries within Python supports the development and evolution of Artificial Intelligence (AI). Many organisations are beginning to see the reality of upskilling and taking advantage of Python in their AI driven decisions.

Mango Solutions

There is a big drive within the layers of Mango that supports the use of Python as an essential tool benefiting our consultants and clients in many ways. Many projects have had Python at their core when it comes to project execution. Also, our consultants have delivered several training courses to different organisations within both the public and private sectors across the globe, to help them harness the potential Python in their data-driven-decisions, asserting business values and helping to shape their data journey

Author: Dayo Oguntoyinbo, Data Scientist