adding data thinking to software solutions
Blogs home Featured Image

Whilst the world of Data and AI offers significant opportunity to drive value, knowledge of their potential and mechanics are mostly confined to data practitioners.

As a result, when business users look for solutions to their challenges, they are typically unaware of this potential.  Instead, they may ask technical teams to deliver a software solution to their problem, outlining a capability via a set of features.

However, when making this leap we risk missing out on the opportunity to build more effective systems using data and analytics, creating “part solutions” to our challenge.

Let’s use a real-life example to illustrate this …

Case Study: Customer Engagement

One of our customers is a major financial services firm, which has a number of touch points with their B2B customers.  This can include a variety of interactions including customer support calls, service contract renewals and even customer complaints.  These interactions are driven by their large, globally dispersed customer team.

The goal of the customer team is to increase retention of their high-value customers and, where possible, to upsell them to more expensive service offerings.  As such, they see that every interaction is an opportunity to build better relationships with customers, and to suggest compelling offers for new products.

Let’s imagine the head of this customer team looks for support to better achieve their aims … 

Software-first Projects

A classic approach would be for the customer team to turn to the world of software for support.

Knowing the possibilities that a modern software system can bring which puts all the information about a customer in front of the customer team member during interactions (akin to a “Single Customer View”).  This information could include:

  • General customer details (e.g. sector, size)
  • Purchasing history (e.g. services they current subscribe to, volumes)
  • Usage (e.g. how often they use a particular product or service)
  • Recent interactions (e.g. what happened during the last interaction)
  • Offers (e.g. what did we last offer them and how did they react)

This could create an invaluable asset for the customer team – by having all of the relevant information at hand they can have more informed discussions.

However, the customer team still needs to fill the “gap” between being presented information and achieving their goal of customer retention and product upsell.  They do this using standard scripts, or by interpreting the information presented to consider appropriate discussion points.

So while the software system supports their aims, the human brain is left to do most of the work.  

Data-first Projects

In the above example, the head of the customer team didn’t request a software system – instead, she turned to an internal data professional for advice.  After some conversations, the data professional identified the potential for analytics to support the customer team.

They engaged us with the concept of building a “next best action” engine that could support more intelligent customer conversations.  Working with the customer team and the internal data professional, we developed a system that presented the relevant information (as above), but crucially added:

  • Enriched data outputs (e.g. expected customer lifetime value)
  • Predicted outcomes (e.g. likelihood that the customer will churn in next 3 months)
  • Suggested “next best actions” (e.g. best offer to present to the customer which maximised the chance of conversion, best action to reduce churn risk)

These capabilities spoke more directly to the customer team aims, and demonstrated a significant uplift in retention and upsell.  The system has since been rolled out to the global teams, and is considered to be one of a few “core applications” for the organisation – a real success story.

Software vs Data Projects

It is important to note here the similarities in the delivery of the system between these 2 approaches: fundamentally, the majority of the work involved in both approaches would be considered software development.  After all, developing clever algorithms only gets you so far – to realise value we need to implement software systems to deliver wisdom to end users, and to support resulting actions by integration with internal systems.

However, the key difference in mindset that leads to the approaches described are driven by 2 characteristics:

  • Knowledge of the Data Opportunity – a key factor in the above example was the presence of a data professional who could empathise with the head of the customer team, and identify the potential for analytics. Having this viewpoint available ensured that the broader capabilities of software AND data were available when considering a possible solution to the challenge presented.  Without access to this knowledge, this would likely have turned into a “single customer view” software project.
  • An Openness to Design Thinking – in the world of software design best practices, there are 2 (often conflated) concepts: “design thinking” (empathise and ideate to develop effective solutions) and “user-centred design” (put the user first when designing user experiences). In software-first projects, the focus is often on the delivery of a solution that has been pre-determined, leading to a user-centred design process.  When we consider the world of data, the lack of understanding of the potential solutions in this space can lead more naturally to a “design thinking” process, where we focus more on “how can we solve this challenge” as opposed to “how do I build this software system really well”. 

Adding Data Thinking to “Software-First” Projects

So how do we ensure we consider the broader opportunity, and potential that data and analytics provides, when presented with a software development project?   We can accomplish this with 3 steps:

  1. Enable a Design Thinking Approach

Design thinking allows us to empathise with a challenge and ideate to find solutions, as opposed to focusing on the delivery of a pre-determined solution.  Within this context, we can focus on the broader aspirations, constraints and consequences so that a solution can be considered which connects more closely to the business outcomes.

  1. Include Data Knowledge

During this design thinking activity, it is essential that we have representatives who understand the potential that data and analytics represents.  In this way, the team is able to consider the broader set of capabilities when designing possible solutions.

  1. Design the Data flow

Data is always a consideration in software design.   However, the potential of analytics requires us to think differently around the flow of data through a system with a view to delivering value-add capabilities.  This takes us beyond thinking about how we store and manage data, and towards a situation where we consider new data sources, data access, and the lifecycle of model-driven data outputs (such as predictions or actions).  This is particularly important where the “data” opportunity may be added to a system at a later date, once core “nuts and bolts” functionality has been delivered.

Data + Software + Design Thinking

The approach described here enables us to leverage the opportunity that resides on the bounds of data and software, and fundamentally deliver more value to users by delivering richer capabilities more aligned to business outcomes.

Moreover, we’ve seen that effective application of design thinking, combined with deep knowledge of data, analytic and software, has enabled us to deliver significant value for customers that goes way beyond solutions that may have been originally imagined.

Author: Rich Pugh, Chief Data Scientist

 

going pro blog
Blogs home Featured Image

Becoming a professional athlete isn’t just about pure talent and hoping that will be enough to excel. Going pro means setting out a clear plan and following through with sustained training, the right nutrition, coaching and support. Not to mention an incredible amount of discipline and determination to get the most from your talent!

In a similar way for businesses, becoming data-driven can’t depend solely on investing in a data project and hoping it will succeed.  Typically, organisations have similar challenges to amateur athletes in that they are successfully trying aspects of analytics with notable successes, but just cannot withstand the test of time to be repeatable, scalable and consistent. Or, they simply don’t know where to start with analytics to achieve maximum deliverable insight. This tends to have a knock-on effect, causing concerns over stakeholder buy-in, with the result that the analytics team continues as a siloed entity with sporadic projects and no guarantee of consistency of approach across the organisation. They fail to make the transition from talented amateur to pro athlete and so great talent is wasted as funding and enthusiasm runs dry.

As the role of analytics becomes more strategically important to the business, it becomes necessary to follow a standardised process for delivery. As part of this, business leaders need to ensure that initiatives meet business objectives and that there is consistency in delivery and prioritisation, as well as in the platforms and technologies used.  To move forward, you have to evaluate where you are, what needs to be put in place to succeed, and enable the transition to implementation and data-driven value. In other words, you need to go pro in analytics.

It sounds easy enough. But as most pro athletes know very well, taking the leap from amateur to pro warrants a whole new game plan, and then sticking to it – a rather daunting prospect for most of us. The good news is that Mango can help! As experts in data science and analytics, we’ve honed in on the key pillars of a data-driven transformation and drawn up a 5-step game plan aimed at helping you to scan and audit what your business has in place, identifying what’s needed, and where to focus next. Here’s a snapshot of how it works.

Join our webinar

If you’d like to find out more, why not join our webinar Going Pro in Analytics: Lessons in Data Science Operational Excellence where Deputy Director at Mango, Dave Gardner and Mango Account Director Ian Cassley discuss what organisations need to do to ‘go pro’ with their analytical platforms, capabilities, and processes once the limitations of sticking plaster solutions and ‘quick and dirty’ approaches start to bite:

Register Now

Blogs home

ABOUT THE BOOK:

With the open source R programming language and its immense library of packages, you can perform virtually any data analysis task. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you’ll need to import, manipulate, summarize, model, and plot data with R, formalize analytical code; and build powerful R packages using current best practices.

Each short, easy lesson builds on all that’s come before: you’ll learn all of R’s essentials as you create real R solutions.

R in 24 hours, Sams Teach Yourself covers the entire data analysis workflow from the viewpoint of professionals whose code must be efficient, reproducible and suitable for sharing with others.

 

WHAT YOU’LL LEARN:

You’ll learn all this, and much more:

  • Installing and configuring the R environment
  • Creating single-mode and multi-mode data structures
  • Working with dates, times, and factors
  • Using common R functions, and writing your own
  • Importing, exporting, manipulating, and transforming data
  • Handling data more efficiently, and writing more efficient R code
  • Plotting data with ggplot2 and Lattice graphics
  • Building the most common types of R models
  • Building high-quality packages, both simple and complex – complete with data and documentation
  • Writing R classes: S3, S4, and beyond
  • Using R to generate automated reports
  • Building web applications with Shiny

Step-by-step instructions walk you through common questions, issues, and tasks; Q & As, Quizzes, and Exercises build and test your knowledge; “Did You Know?” tips offer insider advice and shortcuts and “Watch Out!” alerts help you avoid pitfalls.

By the time you’re finished, you’ll be comfortable going beyond the book to solve a wide spectrum of analytical and statistical problems with R.

If you are finding that you have some time on your hands and would like to enhance your skills, why not Teach yourself R in 24 hours?

The data and scripts to accompany the book can be accessed on GitHub here and the accompanying MangoTraining package can be installed from CRAN using the following in R:  install.packages(“mangoTraining”)

 

ORDERING A COPY OF THIS BOOK:

If you’d like to order a copy use the following ISBN codes:

ISBN-13: 978-0-672-33848-9

ISBN-10: 0-672-33848-3

Authors: Andy Nicholls, Richard Pugh and Aimee Gott.

Blogs home Featured Image

When technical capabilities and company culture combine, IoT-fed data lakes become a powerful brain at the heart of the business

Internet-enabled devices have led to an explosion in the growth of data. On its own, this data has some value, however, the only way to unlock its full potential is by combining it with other data that businesses already hold.

Together, pre-existing data and newly-minted IoT data can provide a full picture of specific insights around a single consumer. It is paramount, however, that companies don’t prioritise innovation at the expense of ethics. Sourcing and analytics must be done correctly – with the right context that respects consumer privacy and wishes around data usage.

The insights gained from successfully blending these two different data sources also unlock secondary benefits including new product development, possible upsells or the ability to build customer goodwill through advice-driven service delivery.

It’s a winning combination, but the challenge is how to actually merge device data with regular customer information.

No easy fit

This problem arises from the fact that IoT device data is a different “shape” to data in traditional customer records.

If you think of a customer record in a sales database as one long row of information, IoT collected information is more like an entire column of time series information, with a supporting web of additional detail. Trying to directly join the two is near impossible, and it is likely that some valuable semantic information could end up lost in the process.

But if IoT information fundamentally resists structure, and existing business databases are built on rigid structures, how do you find an environment that works for both? The answer is a data lake.

Pooling insight

A data lake is a more “fluid” approach to storing and connecting data. It is a central repository where data can be stored in the form it’s generated, whether that is in a relational database format or entirely unstructured. Analytics can then be applied over the top to connect different pieces of information and derive useful business insights.

However, there is more complexity involved in setting up a data lake than just combining all of an organisation’s data and hoping for the best. If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful.

This can be avoided using the expertise of dedicated data engineers. These are the masterminds who build the framework for a data lake and manage the process of extracting data from its source, before transforming it into a usable format and then loading it into the data lake environment. Done properly, this will ensure data provenance, with appropriate metadata to guide users on allowable use cases and analysis.

“If you do that, you’ll likely end up with a data swamp – a disorganised, underperforming mess of data that lacks the necessary context to make it useful”

This sounds like a significant undertaking, and there’s no getting around the fact that doing data lakes right does take time and effort, but it is possible to take a staged approach. Many organisations start with a data “puddle” – a small collection of computers hosting a limited amount of data — and then slowly add to this, increasing the number of computers over time to form the full data lake.

A question of culture

In addition, technical considerations are just one side of the coin. The other side is one of culture. At the core of the problem is that businesses will not succeed with commercialising their IoT data if users are either unaware of, or distrusting of, the data lake and its potential.

While investment in big data continues to grow, a recent NewVantage Partners survey on Big Data and AI found that just 31 percent of organisations consider themselves data driven — the second year in a row that the number has fallen. Data lake technology has been around for several years now, and should be more than capable of enabling these types of organisations, but without the right culture in place, its benefits are seldom felt.

How do you create a culture that centres on being data-driven? As any management team knows, culture shifts are never easy, but a data-driven culture boils down to improving collaboration, communication and understanding between data professionals and business functions.

With a successful technical implementation of a data lake, you then need data professionals to advocate its benefits, and liaise with business departments to understand the types of insights that would be most useful to inform strategic decisions.

This then reinforces business confidence in the data function, and allows the data teams to expand their contributions to the business and be recognised for their hard work. When supported by senior buy-in, this positive feedback loop generates a growing culture of data savviness and data-driven approaches within the organisation.

Brain of the organisation

When technical capabilities and company culture combine, data lakes can become a powerful brain at the heart of the business. With the right analytics tools layered over the top, data lakes can reduce the time to finding insights and surface powerful information. These insights can serve business needs better and faster and are an outright win for any organisation. In short, they are well worth the time and investment.

Author: Dean Wood, Principal Data Scientist

Blogs home Featured Image

Adnan Fiaz

With two out of three EARL conferences part of R history we’re really excited about the next EARL conference in Boston (only 1 week away!). This calls for an(other) EARL conference analysis, this time with Twitter data. Twitter is an amazingly rich data source and a great starting point for any data analysis (I feel there should be a awesome-twitter-blogposts list somewhere).

I was planning on using the wonderful rtweet package by Michael Kearney (as advertised by Bob Rudis) but unfortunately the Twitter API doesn’t provide a full history of tweets. Instead I had to revert to a Python package (gasp) called GetOldTweets. I strongly recommend using the official Twitter API first before going down this path.

The Data

# I have used the Exporter script with the hashtags #EARLConf2017, #EARLConf and #EARL2017
tweets_df <- purrr::map_df(list.files('data/tweets', full.names = TRUE), 
~ readr::read_delim(.x, delim=";", quote="")) %>% 
# filter out company accounts
filter(username!="earlconf", username!="MangoTheCat") %>% 
mutate(shorttext = stringr::str_sub(text, end=50))

tweets_df %>% 
select(username, date, shorttext) %>% 
head() %>% 
knitr::kable()
username date shorttext
AlanHoKT 2017-10-02 02:15:00 “. @TIBCO ’s @LouBajuk spoke at #EARL2017 London o
johnon2 2017-09-23 16:02:00 “. @TIBCO ’s @LouBajuk spoke at #EARL2017 London o
AndySugs 2017-09-21 22:19:00 “RT: LearnRinaDay: EARL London 2017 ? That?s a wra
LearnRinaDay 2017-09-21 22:17:00 “EARL London 2017 ? That?s a wrap! https://www. r-
LouBajuk 2017-09-20 23:15:00 “. @TIBCO ’s @LouBajuk spoke at #EARL2017 London o
pjevrard 2017-09-20 13:02:00 “. @TIBCO ’s @LouBajuk spoke at #EARL2017 London o

First things first, let’s get a timeline up:

 

The hashtags I used to search tweets were generic so the results include tweets from last year’s conferences. Let’s zoom in on this year’s conferences: EARL San Francisco (5-7 June) and EARL London (12-14 September). They clearly explain the large peaks in the above graph.

 

I’ve tried to highlight the period when the conferences were on but I don’t quite like the result. Let’s see if it works better with a bar chart.

earlconf_sf_dates <- lubridate::interval("2017-06-05", "2017-06-08")
earlconf_lon_dates <- lubridate::interval("2017-09-12", "2017-09-15")
tweets_df %>% 
filter(date > "2017-05-01") %>% 
mutate(day = lubridate::date(date)) %>% 
count(day) %>% 
mutate(conference = case_when(day %within% earlconf_sf_dates ~ "SF",
day %within% earlconf_lon_dates ~ "LON",
TRUE ~ "NONE")) %>% 
ggplot(aes(x=day, y=n)) + 
geom_bar(stat="identity", aes(fill=conference)) +
scale_fill_manual(guide=FALSE, values=c("#F8766D","black","#619CFF")) +
labs(x='Date', y='Number of tweets', title='Number of EARL-related tweets by day') +
scale_x_date(date_breaks="1 months", labels=date_format('%b-%y')) +
theme_classic()

 

Now that’s a lot better. The tweet counts in black surrounding the conferences look like small buildings which make the conference tweet counts look like giant skyscrapers (I was a failed art critic in a previous life).

Activity during conferences

I’ve been to my fair share of conferences/presentations and I’ve always wondered how people tweet so fast during a talk. It could be just my ancient phone or I may lack the necessary skills. Either way it would be interesting to analyse the tweets at the talk level. First I will need to link the tweets to a specific talks. I’ve translated the published agenda into a nicer format by hand and read it in below.

earl_agenda <- map_df(c("EARL_SF", "EARL_LON"), 
~ readxl::read_xlsx('data/earl_agenda.xlsx', sheet = .x) )
earl_agenda %>% 
select(StartTime, EndTime, Title, Presenter) %>% 
head() %>% 
knitr::kable()
StartTime EndTime Title Presenter
2017-06-06 11:00:00 2017-06-06 11:30:00 R?s role in Data Science Joe Cheng
2017-06-06 11:30:00 2017-06-06 12:00:00 ‘Full Stack’ Data Science with R: production data science and engineering with open source tools Gabriela de Queiroz
2017-06-06 12:00:00 2017-06-06 12:30:00 R Operating Model Mark Sellors
2017-06-06 11:00:00 2017-06-06 11:30:00 Large-scale reproducible simulation pipelines in R using Docker Mike Gahan
2017-06-06 11:30:00 2017-06-06 12:00:00 Using data to identify risky prescribing habits in physicians Aaron Hamming
2017-06-06 12:00:00 2017-06-06 12:30:00 How we built a Shiny App for 700 users Filip Stachura

Before I merge the tweets with the agenda it’s a good idea to zoom in on the conference tweets (who doesn’t like a facetted plot).

conference_tweets <- tweets_df %>% 
mutate(conference = case_when(date %within% earlconf_sf_dates ~ "SF",
date %within% earlconf_lon_dates ~ "LON",
TRUE ~ "NONE")) %>% 
filter(conference != "NONE")

ggplot(conference_tweets, aes(x=date)) +
geom_histogram() +
facet_wrap(~ conference, scales = 'free_x')

 

Nothing odd in the pattern of tweets: there are no talks on the first day so barely any tweets; the amount of tweets spikes at the beginning of the other two days and then declines as the day progresses. There is something odd about the timing of the tweets though. I didn’t notice it before but when I compared the position of the bars on the x-axis the San Francisco tweets look shifted. And then my lack of travel experience hit me: time zones! The tweets were recorded in UTC time but the talks obviously weren’t in the evening in San Francisco.

After correcting for time zones I can finally merge the tweets with the agenda.

selection <- conference_tweets$conference=='SF'
conference_tweets[selection, 'date'] <- conference_tweets[selection, 'date'] - 8*60*60
# I intended to use a fuzzy join here and check if the tweet timestamp falls within the [start, end) of a talk
# unfortunately I couldn't get it to work with datetime objects
# so I resort to determining the cartesian product and simply filtering the relevant records
tweets_and_talks <- conference_tweets %>% 
mutate(dummy = 1) %>% 
left_join(earl_agenda %>% mutate(dummy=1)) %>% 
filter(date >= StartTime, date < EndTime) 

tweets_and_talks %>% 
select(username, date, shorttext, Title, Presenter) %>% 
tail() %>% 
knitr::kable()
username date shorttext Title Presenter
hspter 2017-06-06 11:17:00 “Nice shout out to @rOpenSci as prodigious package R?s role in Data Science Joe Cheng
hspter 2017-06-06 11:17:00 “Nice shout out to @rOpenSci as prodigious package Large-scale reproducible simulation pipelines in R using Docker Mike Gahan
RLadiesGlobal 2017-06-06 11:14:00 “#RLadies @b23kellytalking about #rstats at #EARL R?s role in Data Science Joe Cheng
RLadiesGlobal 2017-06-06 11:14:00 “#RLadies @b23kellytalking about #rstats at #EARL Large-scale reproducible simulation pipelines in R using Docker Mike Gahan
hspter 2017-06-06 11:14:00 “I’m digging the postmodern data scientist from @R R?s role in Data Science Joe Cheng
hspter 2017-06-06 11:14:00 “I’m digging the postmodern data scientist from @R Large-scale reproducible simulation pipelines in R using Docker Mike Gahan

You ever have that feeling that you’re forgetting something and then you’re at the airport without your passport? From the above table it’s obvious I’ve forgotten that talks are organised in parallel. So matching on time only will create duplicates. However, you may notice that some tweets also mention the presenter (that is considered good tweetiquette). We can use that information to further improve the matching.

talks_and_tweets <- tweets_and_talks %>% 
# calculate various scores based on what is said in the tweet text
mutate(presenter_score = ifelse(!is.na(mentions) & !is.na(TwitterHandle), stringr::str_detect(mentions, TwitterHandle), 0),
# check if the presenter's name is mentioned
presenter_score2 = stringr::str_detect(text, Presenter),
# check if the company name is mentioned
company_score = stringr::str_detect(text, Company),
# check if what is mentioned has any overlap with the title (description would've been better)
overall_score = stringsim(text, Title),
# sum all the scores
score = overall_score + presenter_score + presenter_score2 + company_score) %>% 
select(-presenter_score, -presenter_score2, -company_score, -overall_score) %>% 
# now select the highest scoring match
group_by(username, date) %>% 
top_n(1, score) %>% 
ungroup()

talks_and_tweets %>% 
select(username, date, shorttext, Title, Presenter) %>% 
tail() %>% 
knitr::kable()
username date shorttext Title Presenter
Madhuraraju 2017-06-06 11:39:00 @aj2z @gdequeiroz from @SelfScore talking about u ‘Full Stack’ Data Science with R: production data science and engineering with open source tools Gabriela de Queiroz
hspter 2017-06-06 11:22:00 “#rstats is great for achieving”flow” while doing R?s role in Data Science Joe Cheng
RLadiesGlobal 2017-06-06 11:20:00 @RStudioJoe showing the #RLadies logo and a big m R?s role in Data Science Joe Cheng
hspter 2017-06-06 11:17:00 “Nice shout out to @rOpenSci as prodigious package Large-scale reproducible simulation pipelines in R using Docker Mike Gahan
RLadiesGlobal 2017-06-06 11:14:00 “#RLadies @b23kellytalking about #rstats at #EARL Large-scale reproducible simulation pipelines in R using Docker Mike Gahan
hspter 2017-06-06 11:14:00 “I’m digging the postmodern data scientist from @R R?s role in Data Science Joe Cheng

That looks better but I am disappointed at the number of tweets (263) during talks. Maybe attendees are too busy listening to the talk instead of tweeting which is a good thing I suppose. Nevertheless I can still try to create some interesting visualisations with this data.

tweets_by_presenter <- talks_and_tweets %>% 
count(conference, Title, Presenter) %>% 
ungroup() %>% 
arrange(conference, n)

tweets_by_presenter$Presenter <- factor(tweets_by_presenter$Presenter, levels=tweets_by_presenter$Presenter)

 

The visualisation doesn’t really work for the large number of presenters although I don’t really see another way to add the information about a talk. I also tried to sort the levels of the factor so they appear sorted in the plot but for some reason the SF facet doesn’t want to cooperate. There are a number of talks vying for the top spot in San Francisco but the differences aren’t that large. I’m of course assuming my matching heuristic worked perfectly but one or two mismatches and the results could look completely different. The same applies to EARL London but here Joe Cheng clearly takes the crown.

Follow the leader…

Let’s go down a somewhat more creepy road and see what talks people go to.

tweeters <- talks_and_tweets %>% 
group_by(username) %>% 
mutate(num_tweets = n()) %>% 
ungroup() %>% 
filter(num_tweets > 4) %>% 
mutate(day = ifelse(conference == "SF", (Session > 6)+1, (Session > 9)+1),
day = ifelse(day==1, "Day 1", "Day 2")) %>% 
select(username, conference, StartTime, Stream, day)

Each line is a twitter user (twitterer? tweeter? tweep?) and each observation represents a tweet during a presentation. My expectation was that by drawing a line between the observations you could see how people switch (or don’t switch) between talks. That has clearly failed as the tweeting behaviour isn’t consistent or numerous enough to actually see that. I’m quite glad it’s not possible since tracking people isn’t what Twitter is for.

The code and data for this blogpost are available on GitHub so feel free to play around with it yourself. Do let us know if you create any awesome visualisations or if we can improve on any of the above. If you also want to tweet at conferences, EARL Boston is happening on 1-3 November and ticketsare still available. I promise we won’t track you!

Blogs home Featured Image

We have been working within the Pharmaceutical sector for over a decade. Our expertise, knowledge of the industry, and presence in the R community mean we are used to providing services within a GxP environment and providing best practice.

We are excited to be at three great events in the coming months. Find us at:

• 15th Annual Pharmaceutical IT Congress, 27-28 September – London, England
• Pharmaceutical Users Software Exchange (PhUSE), 8-11 October 2017 – Edinburgh, Scotland
• American Conference on Pharmacometrics (ACoP8), 15-18 October – Fort Lauderdale, USA

Our dedicated Pharma team will be at the events to address any of your questions or concerns around data science and using R within your organisation.

How Mango can help with your Data Science needs

A validated version of R

Because the use of R is growing in the pharmaceutical sector, it’s one of the enquiries we get the most at Mango, so we’d love to talk to you about how you can use it in your organisation.

We know that a major concern for using R within the Pharma sector is its open source nature, especially when using R for regulatory submissions.

R contains many capabilities specifically aimed at helping users perform their day-to-day activities, but with the concerns over meeting compliance, understandably some companies are hesitant to make the move.

To eliminate risk, we’ve created ValidR – a series of scripts and services that deliver a validated build of R to an organisation.

For each validated package, we apply our ISO9001 accredited software quality process of identifying requirements, performing code review, testing that the requirements have been met and installing the application in a controlled and reproducible manner. We have helped major organisations adopt R in compliance with FDA 21 CFR Part 11 guidelines on open source software.

Consultancy

We have helped our clients adopt or migrate to R by providing a range of consultancy services from our unique mix of Mango Data Scientists who have both extensive technical and real-world experience.

Our team of consultants have been deployed globally in projects including, SAS to R migration, Shiny application development, script validation and much more. Our team also provide premier R training with courses designed specifically for the Pharmaceutical sector.

Products

Organisations today are not only looking for how they can validate their R code but how that information is retained, shared and stored across teams globally. Our dedicated validation team have a specialised mix of software developers who build rich analytic web and desktop applications using technologies such as Java, .NET and JavaScript.

Our applications, ModSpace and Navigator, have been deployed within Pharma organisations globally. These both help organisations maintain best practice but also again achieve a ‘validated working environment’.

Why Mango?

All of our work, including the support of open source software such as R, is governed by our Quality Management System, which is regularly audited by Pharmaceutical companies in order to ensure compliance with industry best practices and regulatory guidelines.

Make sure you stop by our stand and talk to us about how we can help you make the most of your data!