managed service
Blogs home Featured Image

In a recent webinar, we provided an overview of our Managed RStudio platform and demonstrated how modern technology platforms like RStudio gives you the ability to collect, store and analyse data at the right time and in the right format to make informed business decisions.

The Public Health Evidence & Intelligence team at Herts Country Council demonstrated how they have benefitted significantly from the Managed RStudio – enabling collaborative development, empowerment and productivity at a time when they needed it most. In turn, they have been able to scale their department.

Many of the questions from the webinar focused on the governance and security aspects of Managed RStudio. In this blog, we’ve taken all your questions and have for further clarity attached a document that can help with any further questions regarding architecture, data management and maintenance.

Many of the questions asked were aligned to the management of data in the platform from the process of working with data on local drives, user interfaces to the management of large datasets.

There are several methods of getting data in and out of the Managed RStudio. These methods will largely depend on the type and size of the data involved.

For data science teams to work productively and deliver effective results for the business, the starting point is with the data itself. Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured. Good data quality needs disciplined data governance, thorough management of incoming data, accurate requirement gathering, strict regression testing for change management and careful design of data pipelines. This is over and above data quality control programmes for data delivery from the outside and within.

Can you please elaborate on getting data into and out of the Managed RStudio platform?

Working with small data sets (< 100Mb)

For smaller data sets, we recommend using RStudio Workbench’s upload feature directly from the IDE. To do this, you can simply click on ‘upload’ in the ‘file’ panel. From here you can select any type of file from either your local hard disk, or a mapped network drive. The file will be uploaded to the current directory. You can also upload compressed files (zip),. which are automatically decompressed on completion. This means that you can upload much more than the 100Mb limit.

Working with large data sets (>100Mb)

For larger data sets or real-time data, we recommend using an external service such as CloudSQL or BigQuery (GCP), Azure SQL Database or Amazon RDS. These can be directly interfaced using R packages such as bigrquery,  RMariaDB or RMySQL.

For consuming real-time data, we recommend using either Cloud Pub/Sub or Azure Service Bus to create a messaging queue for R or python to read these messages.

Sharing data between RStudio Pro/Workbench, connect and other users

Data can easily be shared via ‘Pins’, allowing data to be published to Connect and shared with other users, across Shiny apps and RStudio.

Getting data out of Managed RStudio

As with upload, there are several methods to export data from Managed RStudio. RStudio Connect allows the publishing on Shiny Apps, Flask, Dash and Markdown. It also allows the scheduling of e-mail reports. For one-off analytics jobs, RStudio also allows you to download files directly from the IDE.

The Managed Service also allows uploading to any cloud service such as Cloud storage buckets.

Package Management

R Packages are managed and maintained by RStudio Package Manager giving the user complete control of which versions are installed.

RStudio Package Manager also allows the user to ‘snapshot’ a particular set of packages on a specific day to ensure consistency.

The solution to disciplined data governance

Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured. Good data quality needs disciplined data governance and thorough management of incoming data, accurate requirements gathering, strict regression testing for change management and careful design of data pipelines. This all leads to better decisions based on data analysis but also ensures compliance with regulation.

As a Product Manager at Mango, Matt is passionate about data and delivering products where data is key to driving insights and decisions. With over 20 years experience in data consulting and product delivery, Matt has worked across a variety of industries including Retail, Financial Services and Gaming to help companies use data and analytical platforms to drive growth and increase value.

Matt is a strong believer that the combined value of the data and analytics is the key to success of data solutions.

RStudio Managed Service
Blogs home Featured Image

Author: Rich Adams, RStudio Partner Manager

Free webinar: How to successfully manage your R environment – the RStudio managed service platform (22nd July @ 4PM BST)

In a free session on Thursday 22nd July, we’ll be discussing how data science teams can confidently and securely collaborate with large data sets in R, supported  with the right expertise where capacity or skills may otherwise be lacking internally.

With guest speakers Lou Bajuk, Director of Product Marketing, RStudio and Will Yuill, Principal Public Health Analyst, Hertfordshire County Council, we’ll explore how data science teams can develop a best practice managed service production environment and achieve maximum return on investment from their data science cloud platform. Register here

What’s the webinar about?

 As a language, R can come with restrictions when it comes to the implementation and necessary technical know-how of installing, configuring, and supporting a centralised platform for maximum adoption.

Many teams lack the required support from IT or the necessary knowledge that makes an environment suitable for future scalability. This can impact a team in their ability to manage large data sets, collaborate with ease and often means a duplication of effort.

This webinar focuses on how to develop a best practice production environment, ensuring technical excellence and maximum return on investment from your data science platform.

Also under discussion is:

  • How to effectively reduce barriers to scaling your R environment through a ‘RStudio Managed service’
  • How Hertfordshire County Council overcame their barriers through the extra pressure of Covid-19 through a managed Services platform

Why is it important?

As we have seen from this year, scaling of data science teams and investment in data-driven strategies is even more crucial than ever.

If like Hertfordshire County Council your team has seen a rapid development, yet you lack the internal expertise and resources to support an RStudio environment – a managed services platform may be the secure, compliant and effective cloud environment that can be up and running effectively almost immediately.  This expert Managed Service removes the need for specialist in-house IT expertise and guarantees a service level agreement to meet your requirements in terms of configuration, maintenance, and system updates.

Can you join us on 22th July, 4pm to learn more?

The Public Health Evidence & Intelligence Team at Hertfordshire Country Council will discuss why this is already providing an effective solution for them.

Register for the webinar here