managed service
Blogs home Featured Image

In a recent webinar, we provided an overview of our Managed RStudio platform and demonstrated how modern technology platforms like RStudio gives you the ability to collect, store and analyse data at the right time and in the right format to make informed business decisions.

The Public Health Evidence & Intelligence team at Herts Country Council demonstrated how they have benefitted significantly from the Managed RStudio – enabling collaborative development, empowerment and productivity at a time when they needed it most. In turn, they have been able to scale their department.

Many of the questions from the webinar focused on the governance and security aspects of Managed RStudio. In this blog, we’ve taken all your questions and have for further clarity attached a document that can help with any further questions regarding architecture, data management and maintenance.

Many of the questions asked were aligned to the management of data in the platform from the process of working with data on local drives, user interfaces to the management of large datasets.

There are several methods of getting data in and out of the Managed RStudio. These methods will largely depend on the type and size of the data involved.

For data science teams to work productively and deliver effective results for the business, the starting point is with the data itself. Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured. Good data quality needs disciplined data governance, thorough management of incoming data, accurate requirement gathering, strict regression testing for change management and careful design of data pipelines. This is over and above data quality control programmes for data delivery from the outside and within.

Can you please elaborate on getting data into and out of the Managed RStudio platform?

Working with small data sets (< 100Mb)

For smaller data sets, we recommend using RStudio Workbench’s upload feature directly from the IDE. To do this, you can simply click on ‘upload’ in the ‘file’ panel. From here you can select any type of file from either your local hard disk, or a mapped network drive. The file will be uploaded to the current directory. You can also upload compressed files (zip),. which are automatically decompressed on completion. This means that you can upload much more than the 100Mb limit.

Working with large data sets (>100Mb)

For larger data sets or real-time data, we recommend using an external service such as CloudSQL or BigQuery (GCP), Azure SQL Database or Amazon RDS. These can be directly interfaced using R packages such as bigrquery,  RMariaDB or RMySQL.

For consuming real-time data, we recommend using either Cloud Pub/Sub or Azure Service Bus to create a messaging queue for R or python to read these messages.

Sharing data between RStudio Pro/Workbench, connect and other users

Data can easily be shared via ‘Pins’, allowing data to be published to Connect and shared with other users, across Shiny apps and RStudio.

Getting data out of Managed RStudio

As with upload, there are several methods to export data from Managed RStudio. RStudio Connect allows the publishing on Shiny Apps, Flask, Dash and Markdown. It also allows the scheduling of e-mail reports. For one-off analytics jobs, RStudio also allows you to download files directly from the IDE.

The Managed Service also allows uploading to any cloud service such as Cloud storage buckets.

Package Management

R Packages are managed and maintained by RStudio Package Manager giving the user complete control of which versions are installed.

RStudio Package Manager also allows the user to ‘snapshot’ a particular set of packages on a specific day to ensure consistency.

The solution to disciplined data governance

Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured. Good data quality needs disciplined data governance and thorough management of incoming data, accurate requirements gathering, strict regression testing for change management and careful design of data pipelines. This all leads to better decisions based on data analysis but also ensures compliance with regulation.

As a Product Manager at Mango, Matt is passionate about data and delivering products where data is key to driving insights and decisions. With over 20 years experience in data consulting and product delivery, Matt has worked across a variety of industries including Retail, Financial Services and Gaming to help companies use data and analytical platforms to drive growth and increase value.

Matt is a strong believer that the combined value of the data and analytics is the key to success of data solutions.

RStudio::Conf 2020
Blogs home Featured Image

Dude: Where are my Cats?  RStudio::Conf 2020

It may not have been the start to the conference that we planned as RStudio Full Service Certified partners. – did you see the lonely guy on social media? Yes, that was me, and I’m here to tell the tale…

Eventful as it was at the time, I have to say this was the first RStudio Conference I have had the pleasure to attend since joining Mango Solutions. The things that really stood out for me were the event’s ubiquitous and thought-through inclusivity and the fantastically run and well organised event for nearly 2400 R users worldwide. Here’s a summary of our time in San Francisco, what it had to offer and why we are immensely proud partners of RStudio.

Cat rehoming 10:41am San Francisco time

Held up in customs, the conference started without our exhibition stand, materials and conference goodies, the famous Mango cats. I remain ever thankful to the whole #rstats community, who despite this little hiccup, took pity on us and came to visit us anyway. What I was able to quickly grasp, was that this is a community that is so quickly available to support others, present a forum to share ideas and learn how to solve problems, in particular learn how others are benefiting from using R.

Public Benefit Corporation

A vital and impressive moment of the conference was the standing ovation for J.J Allaire after his announcement that RStudio had become a Public Benefit Corporation. You could feel the appreciation in the room for RStudio’s innovation and how it had pushed the R Community forward.  He discussed their future plans which provides growth opportunities for the community.

From a content perspective, the RStudio::conf was a great event, filled with informative and well organised workshops and talks. As hard as it is to pick out one particular talk, it was probably Jenny Bryan’s talk: “Object of type ‘closure’ is not subsettable”; this was all about debugging in R – best approaches, available tools and hints on how to write more informative error messages in your own functions. It was engaging, informative, witty and it was relevant to pretty much every single R developer on this planet, let alone present in the room.

Amongst other things, the Mango team of Data Scientists really appreciated these packages which the RStudio team featured as part of their workshops:

  • The best ways to scale up you API using plumber package
  • Custom styling of Shiny apps using bootstraplib package
  • Effective R code parallelization using future and furrr packages
  • Load testing using loadtest package

 

Inclusivity all round

Inclusivity was felt not only with the RLadies breakfast, but also in having prayer rooms, quiet rooms for neurodiverse attendees, the gender-neutral bathrooms, diversity scholarships and very frequent reminders of the event’s code of conduct that revolves heavily around inclusivity and tolerance. Great organisation was shown not only in a suitable venue, but also in every effort that went into ensuring that queues for food/buses didn’t stay long, that there was enough time to change rooms between the talks and via the great entertainment/perks throughout the event.

Endless networking opportunities

RStudio::conf 2020 was a fantastic place to meet and connect with other people in the industry and gain insight into how other companies, data science teams and individuals are using R and the underlying infrastructure that supports it. For Ben our Data Platform Consultant, it was interesting and exciting to hear from a platform perspective about the needs of data science teams, and how we could potentially solve the challenges they are facing. A recurring issue seemed to be in scaling R in a production environment and the best way to do this. Ben found the Renv talk interesting and hopes to be using it more this year in place of Packrat.

For Mango it was a real pleasure to discuss at large the wealth of opportunity presented by ValidR in our validated production-ready version of R.

A huge thank you to everyone at RStudio for supporting my first conference with RStudio.  It was truly a pleasure to meet the team in person and has really given Mango and RStudio the opportunity to consolidate our partnership to the next level.

 

Author: Rich Adams, Solutions Specialist

Blogs home

ABOUT THE BOOK:

With the open source R programming language and its immense library of packages, you can perform virtually any data analysis task. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you’ll need to import, manipulate, summarize, model, and plot data with R, formalize analytical code; and build powerful R packages using current best practices.

Each short, easy lesson builds on all that’s come before: you’ll learn all of R’s essentials as you create real R solutions.

R in 24 hours, Sams Teach Yourself covers the entire data analysis workflow from the viewpoint of professionals whose code must be efficient, reproducible and suitable for sharing with others.

 

WHAT YOU’LL LEARN:

You’ll learn all this, and much more:

  • Installing and configuring the R environment
  • Creating single-mode and multi-mode data structures
  • Working with dates, times, and factors
  • Using common R functions, and writing your own
  • Importing, exporting, manipulating, and transforming data
  • Handling data more efficiently, and writing more efficient R code
  • Plotting data with ggplot2 and Lattice graphics
  • Building the most common types of R models
  • Building high-quality packages, both simple and complex – complete with data and documentation
  • Writing R classes: S3, S4, and beyond
  • Using R to generate automated reports
  • Building web applications with Shiny

Step-by-step instructions walk you through common questions, issues, and tasks; Q & As, Quizzes, and Exercises build and test your knowledge; “Did You Know?” tips offer insider advice and shortcuts and “Watch Out!” alerts help you avoid pitfalls.

By the time you’re finished, you’ll be comfortable going beyond the book to solve a wide spectrum of analytical and statistical problems with R.

If you are finding that you have some time on your hands and would like to enhance your skills, why not Teach yourself R in 24 hours?

The data and scripts to accompany the book can be accessed on GitHub here and the accompanying MangoTraining package can be installed from CRAN using the following in R:  install.packages(“mangoTraining”)

 

ORDERING A COPY OF THIS BOOK:

If you’d like to order a copy use the following ISBN codes:

ISBN-13: 978-0-672-33848-9

ISBN-10: 0-672-33848-3

Authors: Andy Nicholls, Richard Pugh and Aimee Gott.