Covid infection rate
Blogs home Featured Image

Two years ago the Public Health Evidence & Intelligence team at Hertfordshire Country Council numbered 5, fast forward to today and the team have built their capability to 32 competent R users. 

For Manager Will Yuill, it’s been an extremely busy few years as the urgency of the COVID-19 crisis took hold. The team’s workload doubled overnight, leading to extensive data sharing and analysis of daily infection rates to a range of partnership organisations. Within a week of infection rates hitting the UK, they were being asked to reprioritise workloads, model the pandemic and make recommendations locally on how to limit the spread of infection. 

With a team largely using Excel or a desktop version of R, Will knew changes had to be made to keep up with the quantity and speed of data and to maintain efficiency across the team. Met with these challenges, the team knew they had to consider an outsourcing partnership to meet their immediate and long-term objectives.     

In this blog, Will Yuill, Manager of the Public Health Evidence & Intelligence team at Hertfordshire Country Council, informs us of his challenges and just how his team have dramatically developed their remit, developed their internal capability whilst strengthened stakeholder collaboration across vital health partnerships to actively reduce the spread of the virus. 

The teams’ blockers:  

  • Software – R was not supported internally 
  • Hardware – only able to use R on low power VM’s 
  • Skills – predominantly an academic team , R skills were not applied 
  • Culture – limited opportunity to try new things with IT 
  • Capacity – limited time to try new things   

 How I built my case for R  

“Some of the team members were more familiar with using R, having exhausted the capability of Excel, so in our search for an immediate solution – an R environment seemed a logical solution. Our IT teams were Windows focused and didn’t have the capacity or skills required to support an internal Linux environment. Their priority was to support the council’s migration to home working.  

Before committing to an outsourced partnership, the team had tried high specification laptops to help resolve the immediate challenges of managing data sets but were hindered by with IT corporate policy and firewalls. However, they simply could not meet our sharing analysis needs or compliance with data governance.  With Mango’s Managed RStudio, core stakeholders could have a reliable, secure enterprise environment for data collaborating within days. There was no need for an infrastructure, and it meant we could have 24x7x365 outsourced application monitoring, performance alerts and support. Negating any dependency internally on an already stretched resource.  

With the Managed RStudio, the team have successfully developed their own Shiny applications, both public and private. The public application is currently receiving c20,000 hits a month for detailed analysis around public health services. Internal more data sensitive applications, allow effective dissemination of data trends by location and services, such as environmental health and NHS”. 

Through the use of  RStudio Teams, the team is benefiting from a go-to tool which is empowering their statisticians to manage and develop their code. The ability to provide these tools on a centralised server, accessible from anywhere and without computational constraints of a laptop, has been highly conducive to team productivity, success, and stakeholder engagement.  The data is significantly more secure with improved data governance and infinitely presents less work for the team, allowing them to focus on providing analytic value. 

The lessons learnt  

 “Over the last 2 years and working across an R-environment we have transformed our procedures, implemented best practice and significantly enhanced our stakeholder communications. Here’s some lessons I learnt along the way.  

Take your changes and run with it – if your team is working ineffectively, lacking processes and delivering value, then I strongly recommend investing in a modern data analytics enterprise. This means striving to do more with less resources, which involves pushing productivity to the max to gain the best value.  

Show ROI early – Our team were able to show the impact of our investment. Our data is effectively shared to partnership organisations daily – it is relevant, complete, timely and consistentGone are the days where organisations operate in silos, Managed RStudio has been vital for critical communications with key stakeholders.  

Know what you are looking for  – Sometimes an independent view point prevents you from wasting significant amounts of time, when thinking about data in terms of your objectives is a good place to start.   

Start small and scale – With RStudio Teams, my team is benefiting from a go-to tool which is empowering statisticians to manage and develop their code. The ability to provide these tools on a centralised server, accessible from anywhere and without computational constraints of a laptop, has been highly conducive to team productivity, success, and stakeholder engagement.    

Deployment is hard – RStudio Connect and Pins have been invaluable for production and deployment.  When we got R locally we thought we were set and then realised we could only share analysis via R Markdown and email.  RStudio Connect has allowed to share and publish interactive analysis across partnership organisations.  

Use Git – Git allows an abundance of team collaboration and help manage version control. Utilising Git provides the security, collaboration and certainty required to create and reproduce code and analysis across the team”.  

Will Yuill be joining the NHS R Community as a guest speaker on xxx where he will expand on his case for R and the impact it has had on the local authority.  He will be joined at Matt Sawkins, Product Manager of Mango. 

managed service
Blogs home Featured Image

In a recent webinar, we provided an overview of our Managed RStudio platform and demonstrated how modern technology platforms like RStudio gives you the ability to collect, store and analyse data at the right time and in the right format to make informed business decisions.

The Public Health Evidence & Intelligence team at Herts Country Council demonstrated how they have benefitted significantly from the Managed RStudio – enabling collaborative development, empowerment and productivity at a time when they needed it most. In turn, they have been able to scale their department.

Many of the questions from the webinar focused on the governance and security aspects of Managed RStudio. In this blog, we’ve taken all your questions and have for further clarity attached a document that can help with any further questions regarding architecture, data management and maintenance.

Many of the questions asked were aligned to the management of data in the platform from the process of working with data on local drives, user interfaces to the management of large datasets.

There are several methods of getting data in and out of the Managed RStudio. These methods will largely depend on the type and size of the data involved.

For data science teams to work productively and deliver effective results for the business, the starting point is with the data itself. Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured. Good data quality needs disciplined data governance, thorough management of incoming data, accurate requirement gathering, strict regression testing for change management and careful design of data pipelines. This is over and above data quality control programmes for data delivery from the outside and within.

Can you please elaborate on getting data into and out of the Managed RStudio platform?

Working with small data sets (< 100Mb)

For smaller data sets, we recommend using RStudio Workbench’s upload feature directly from the IDE. To do this, you can simply click on ‘upload’ in the ‘file’ panel. From here you can select any type of file from either your local hard disk, or a mapped network drive. The file will be uploaded to the current directory. You can also upload compressed files (zip),. which are automatically decompressed on completion. This means that you can upload much more than the 100Mb limit.

Working with large data sets (>100Mb)

For larger data sets or real-time data, we recommend using an external service such as CloudSQL or BigQuery (GCP), Azure SQL Database or Amazon RDS. These can be directly interfaced using R packages such as bigrquery,  RMariaDB or RMySQL.

For consuming real-time data, we recommend using either Cloud Pub/Sub or Azure Service Bus to create a messaging queue for R or python to read these messages.

Sharing data between RStudio Pro/Workbench, connect and other users

Data can easily be shared via ‘Pins’, allowing data to be published to Connect and shared with other users, across Shiny apps and RStudio.

Getting data out of Managed RStudio

As with upload, there are several methods to export data from Managed RStudio. RStudio Connect allows the publishing on Shiny Apps, Flask, Dash and Markdown. It also allows the scheduling of e-mail reports. For one-off analytics jobs, RStudio also allows you to download files directly from the IDE.

The Managed Service also allows uploading to any cloud service such as Cloud storage buckets.

Package Management

R Packages are managed and maintained by RStudio Package Manager giving the user complete control of which versions are installed.

RStudio Package Manager also allows the user to ‘snapshot’ a particular set of packages on a specific day to ensure consistency.

The solution to disciplined data governance

Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured. Good data quality needs disciplined data governance and thorough management of incoming data, accurate requirements gathering, strict regression testing for change management and careful design of data pipelines. This all leads to better decisions based on data analysis but also ensures compliance with regulation.

As a Product Manager at Mango, Matt is passionate about data and delivering products where data is key to driving insights and decisions. With over 20 years experience in data consulting and product delivery, Matt has worked across a variety of industries including Retail, Financial Services and Gaming to help companies use data and analytical platforms to drive growth and increase value.

Matt is a strong believer that the combined value of the data and analytics is the key to success of data solutions.

RStudio Managed Service
Blogs home Featured Image

Author: Rich Adams, RStudio Partner Manager

Free webinar: How to successfully manage your R environment – the RStudio managed service platform (22nd July @ 4PM BST)

In a free session on Thursday 22nd July, we’ll be discussing how data science teams can confidently and securely collaborate with large data sets in R, supported  with the right expertise where capacity or skills may otherwise be lacking internally.

With guest speakers Lou Bajuk, Director of Product Marketing, RStudio and Will Yuill, Principal Public Health Analyst, Hertfordshire County Council, we’ll explore how data science teams can develop a best practice managed service production environment and achieve maximum return on investment from their data science cloud platform. Register here

What’s the webinar about?

 As a language, R can come with restrictions when it comes to the implementation and necessary technical know-how of installing, configuring, and supporting a centralised platform for maximum adoption.

Many teams lack the required support from IT or the necessary knowledge that makes an environment suitable for future scalability. This can impact a team in their ability to manage large data sets, collaborate with ease and often means a duplication of effort.

This webinar focuses on how to develop a best practice production environment, ensuring technical excellence and maximum return on investment from your data science platform.

Also under discussion is:

  • How to effectively reduce barriers to scaling your R environment through a ‘RStudio Managed service’
  • How Hertfordshire County Council overcame their barriers through the extra pressure of Covid-19 through a managed Services platform

Why is it important?

As we have seen from this year, scaling of data science teams and investment in data-driven strategies is even more crucial than ever.

If like Hertfordshire County Council your team has seen a rapid development, yet you lack the internal expertise and resources to support an RStudio environment – a managed services platform may be the secure, compliant and effective cloud environment that can be up and running effectively almost immediately.  This expert Managed Service removes the need for specialist in-house IT expertise and guarantees a service level agreement to meet your requirements in terms of configuration, maintenance, and system updates.

Can you join us on 22th July, 4pm to learn more?

The Public Health Evidence & Intelligence Team at Hertfordshire Country Council will discuss why this is already providing an effective solution for them.

Register for the webinar here 

rstudioconf
Blogs home Featured Image

We’re delighted to be sponsoring the rstudio::global conference this year. It’s a credit to the community that such events (including our own flagship EARL conference) have been so readily able to respond to the pandemic lockdown and transform to a fully virtual presence, providing inspirational talks on all things R. 

We are excited to see John Burn-Murdoch, senior data visualisation journalist at the Financial Times amongst the keynotes.  John led the FT’s data-driven coverage of the pandemic, amassing an enormous following on social media for his incisive reporting.  It will be fascinating to hear the lessons he has learned from his reporting and visualisations and how he addressed the challenges of communicating often complex findings to the population at large. It was our pleasure to have had John speak at the EARL conference in both 2014 and 2016 – so we know that the rstudio::global audience can expect a riveting presentation. 

With a packed itinerary and 24-hour streaming for accessibility all over the world, there will be some extremely useful presentations and stimulating conversations to be had for the 10,000 expected data professionals.  

As a sponsor of this event and as RStudio’s longest serving Full Service Certified Partner, we would like this opportunity to invite attendees to meet us in our virtual booth. Whether you are scaling the use of R in your organisation and require technical advice on setup or configuration, lacking internal IT to support the required maintenance of RStudio products or have reservations around the validation of open-source packages from a security or malware perspective, we can help.   

Some of the services that we offer include: 

  • A Managed Service providing a scalable RStudio environment which can be effectively built up, run in the cloud and fully maintained by Mango, to minimise the responsibility and burden on your inhouse IT teams. 
  • An On-Premise solution designed to address current in-house service gaps; following an Installation, Accelerate and Healthcheck review, this service offers the full installation, configuration and maintenance of RStudio products. 
  • A new validation service through ValidR® presents a validated collection of the 150 most popular industry leading R packages, such as those within the tidyverse and can be deployed with RStudio Package Manager (RSPM) to mitigate any uncertainty of using opensource software, with guaranteed reproducibility for any data science team. 

We’re very much looking forward to seeing you at the event on 21st January – don’t forget to sign up for this event now if you haven’t already. 

#Rstudioconf2021 #rstats #RStudiofullservicepartner 

 

      

happy data team
Blogs home Featured Image

A word familiar to most of us, productivity is measured in terms of the rate of output per unit of input. For the modern enterprise, this means striving to do more with less resources, a goal you cannot achieve without effective and efficient decision making. Here at Mango, one of our main aims is to enable companies to make proactive use of their data to drive better decision making allowing them to create value from insight. This alone is a pathway to improved productivity.

Why then are many companies disappointed with the return on their data science investments? To put it in perspective, around 87% of data science projects never make it into production.

The thing is, you may have the best data scientists who can program, model, visualise and wrangle data, but that is not enough. For a data science project to be successful, there needs to be more than some data science ‘unicorns’ doing their data science ‘stuff’ independently of each other and the business to support a project or use case here and there. Data science is a team sport that thrives in a company with data at its core, where there are understood methods for collaborating across both technical and business departments enabling a data science product to be maintained and utilised across its lifespan.

Here are five key areas to consider which will ensure success of your data science outputs:

1.The data itself

Data drives decision-making. If the quality of your data is poor, the outputs and resulting decisions will be poor. For data science teams to work productively and deliver effective results for the business, the starting point is with the data itself. Data that is accurate, relevant, complete, timely and consistent are the key criteria against which data quality needs to be measured.

Good data quality needs disciplined data governance, thorough management of incoming data, accurate requirements gathering, strict regression testing for change management and careful design of data pipelines. This is over and above data quality control programmes for data delivery from the outside and within.

2.Collaboration tools

Having the best quality data in the world will be useless if you do not have the tools to allow people to work together on development projects. Tools for version control and collaborative development are key to extracting value from your data. Git, RStudio and Jupyter are becoming go-to tools to enable your data scientists to manage and develop their code. The ability to provide these tools on a centralised server, accessible from anywhere and without computational constraints of a laptop, mean that you have the best chance of being successful.

In addition to these collaboration tools, you also need to cooperate on the wider project – shared platforms such as Trello, Planner or JIRA offer a great platform for sharing to do lists and help understand generally how projects are progressing.

3.Communication tools

Gone are the days where organisations can afford to operate in silos. Maximising productivity requires bringing teams together to collaborate across the business as a community that shares best practice. The adoption of effective communication tools, particularly during this period of remote working, is the only way to enable this community to thrive.

Mango relies heavily on instant messaging tools such as Microsoft Teams, which offers a great way for our team to communicate and share their own tips and tricks. We also conduct a weekly analytics club for showcasing ideas and progress of projects.

4.Stakeholder engagements

Once there is quality data, and communication and collaboration tools to support teams, it’s vital to secure buy-in and understanding from key stakeholders across the business. Data science is often accompanied by its own language, so fostering collaboration and a mutual understanding of what’s possible with data for stakeholders is vital. In the same way, by sharing with data science teams the direction in which the business wants to or needs to move, stakeholders are empowering teams with the necessary information to make sure analytical outputs support these goals.

5.Best practices that lead to long-lived business results

In order to make sure that project outputs are of an appropriate quality, and that level of quality is achievable again, processes and ways of working must follow best practice. You can aid your teams to follow best practice by developing a framework for them to work within. Standardising these approaches – take a look at Mango’s 4-step grid in the image below – ensures that everyone in your team knows their role and can generate a quality output time and time again.

The productivity of a data science team itself, and the business as a whole, relies on more than just tools, or training, or the right resources. Boosting productivity and achieving the most value relies on being a team. Data science teams will thrive in a company that has a data-driven culture, with a central platform where they can work together to efficiently produce repeatable results in harmony with the business objectives.

What’s holding you back?

If you are keen to adopt open-source data science software at scale and you need a production-ready environment that’s configured to your business but require help on where to start, Mango can help.

We can advise, install, support and train your teams on your RStudio production ready environment so you can share, develop, publish and manage data at scale – in a controlled, reproducible way. Contact us now and we’ll get you started.

 

Related content:

Podcast: Data Engineering – the key to extracting value from your data

Blog: Future Proofing Your Data Science Team 

 

centralized collaboration to futureproof your data scinece team
Blogs home Featured Image

The data science industry like any IT or tech company have the largest share of remote employees with teams based all over the world. Yet, like other teams projects are implemented effectively, collaboratively and on time.

With the rapidly evolving COVID-19 crisis, companies have been forced to adopt working from home policies. Our technology and digital infrastructure has never been more important. Newly formed ‘remote data science teams’ through this, need to maintain productivity and continue to drive effective stakeholder communication and business value and the only way to achieve this is through appropriate infrastructure and well-defined ways of working.

Whether your workforce works remotely or otherwise, centralising platforms and enabling a cloud based infrastructure for data science will lead to more opportunities for collaboration. It may even reduce IT spend in terms of equipment and maintenance overhead, thus future-proofing your data science infrastructure for the long run.

So when it comes to implementing long-lived platform, here are some things to keep in mind:

Collaboration through a centralised data & analytics platform

A centralized platform, such as RStudio Server Pro, means all your data scientists will have access to an appropriate platform and be working within the same environment. Working in this way means that a package written by one developer can work with a minimum of effort in all your developers’ environments allowing simpler collaboration. There are other ways of achieving this with technologies such as virtualenv for Python, but this requires that each project set up its own environment, thereby increasing overhead. Centralizing this effort ensures that  there is a well-understood way of creating projects, and each developer is working in the same way.

Best practices when using a centralized platform

  1. Version control. If you are writing code of any kind, even just scripts, it should be versioned religiously and have clear commit messages. This ensures that users can see each change made in scripts if anything breaks and can reproduce your results on their own.
  2. Packages. Whether you are working in Python or R, code should be packaged and treated like the valuable commodity it is. At Mango Solutions, a frequent challenge we address with our clients is to debug legacy code where a single ‘expert’ in a particular technology has written some piece of process which has become mission critical and then left the business. There is then no way to support, develop, or otherwise change this process without the whole business grinding to a halt. Packaging code and workflows helps to document and enforce dependencies, which can make legacy code easier to manage. These packages can then be maintained by RStudio Package Manager or Artifactory.
  3. Reusability. By putting your code in packages and managing your environments with renv, you’re able to make your data science reusable. Creating this institutional knowledge means that you can avoid a Data Scientist becoming a single point of failure, and with the job market still incredibly buoyant in the data sector, when a data scientist does leave, you won’t be left with a model that nobody understands or can’t run. As Lou Bajuk explained in his blog post, Driving Real, Lasting Value with Serious Data Science, durable code is a significant criteria for future-proofing your data science organization.

Enabling a Cloud-based environment

In addition to this institutional knowledge benefit, running this data science  platform on a cloud instance allows us to scale up the platform easily. With the ability to deploy to Kubernetes, scaling your deployment as your data science team grows is a huge benefit while only requiring you to pay for what you need to, when you need it.

This move to cloud comes with some tangential benefits which are often overlooked. Providing your data science team with a cloud-based environment has a number of benefits:

  1. The cost of hardware for your data science staff can be reduced to low cost laptops rather than costly high end on-premise hardware.
  2. By providing a centralized development platform, you allow remote and mobile work which is a key discriminator for hiring the best talent.
  3. By enhancing flexibility, you are better positioned to remain productive in unforeseen circumstances.

This last point cannot be overstated. At the beginning of the Covid-19 lockdown, a nationwide company whose data team was tied to desktops found themselves struggling to provide enough equipment to continue working through the lockdown. As a result, their data science team could not function and were unable to provide insights that would have been  invaluable through these changing times. By contrast, here at Mango, our data science platform strategy allowed us to switch seamlessly to remote working, add value to our partners and deliver insights when they were needed most.

Building agility into your basic ways of working means that you are well placed to adapt to unexpected events and adopt new platforms which are easier to update as technology moves on.

Once you have a centralized analytics platform and cloud-based infrastructure in place, how are you going to convince the business to use it? This is where the worlds of Business Intelligence and software dev-ops come to the rescue.

Analytics-backed dashboards using technologies like Shiny and RStudio Connect or Plotly and Dash for Python means you can quickly and easily create front ends for business users to access results from your models. You can also easily expose APIs that allow your websites to be backed by scalable models, potentially creating new ways for customers to engage with your business.

A word of caution here: Doing this without considering how you are going to maintain and update what have now become software products can be dangerous. Models may go out of date, functionality can become  irrelevant,  and the business can become disillusioned. Fortunately, these are solved problems in the web world, and solutions such as containers and Kubernetes alongside CI/CD tools make this a simpler challenge. As a consultancy, we have a tried and tested solutions that expose APIs from R or Python that back high-throughput websites from across a number of sectors for our customers.

Collaborative forms of communications

The last piece of the puzzle for your data science team to be productive has nothing to do with data science but is instead about empathy and communication. Your data science team may create insights from your data, but they are like a rudderless ship without input from the business. Understanding business problems and what has value to the wider enterprise requires good communication. This means that your data scientists have to partner with people who understand the sales and marketing strategy. And if you are to embrace the ethos of flexibility as protection against the future, then good video-conferencing and other technological communications are essential.

Written by Dean Wood, Principal Data Science Consultant at Mango Solutions, also published as guest blog on our partner RStudio’s website.

 

 

Blogs home Featured Image

We were thrilled to host Hadley Wickham who delivered, as ever, a funny and engaging talk to a packed house at LondonR in August. In fact, to give you an idea of how much anticipated this event was, tickets to see Hadley sold out in under two hours!

It’s always fascinating for us elder members of the R community who remember the good old days, to witness the move from academic tools through to commercial adoption and engagement. For many years, R was proposed and rejected by many organisations due to the environment and architecture that existed. We used to spend time trying to work out data sizes and whether things would help.

I remember talking to Hadley at the first EARL in the US about creating toolsets that allowed organisations who didn’t “Love” R to use it and deploy it internally, comfortably. Hadley’s and latterly his team’s work, has allowed the ecosystem around R to develop from introspection, to a wide view of the analytic landscape, and his talk reflected I felt on some of these shifts.

Hadley’s insight into the mistakes he has made rang very true when considering the scale of the user base today compared to when he started developing packages. That moment of clarity when you realise that you need to prepare things in order for people you don’t know to pick up and use them efficiently, lies at the heart of good programming practice but sometimes is easily forgotten. This has driven Hadley on, to create better and easier codebases that are central platforms but also initiating others thoughts and developments.

It was great to hear someone like Hadley acknowledge that innovation isn’t a straight line and that forking and dead ends are essential parts of the process. Speaking to attendees afterwards, this message was highly prized and it felt as though there was an increased confidence with many attendees to go out and try things without the fear of failure.

All in all a fantastic evening that reinforced just how great the R community is.

If you’d like to view Hadley’s LondonR presentation, you can download it here.

Blogs home Featured Image

RStudio have recently announced ‘RStudio Connect QuickStart’ which is a VM containing a full suite of RStudio’s pro tools, available to be trialled for a 45 day period. RStudio Connect Quickstart allows R users and people exploring the idea of using R in production, a quick and easy way to set-up a full, production-like environment that contains all of RStudio’s enterprise-grade products.

In essence, RStudio Connect quickstart is a virtual machine appliance that when used with your favourite virtualisation software (Virtualbox,VMWare etc) will create a ready to use environment containing the following tools;

  • RStudio Server Pro
  • RStudio Connect
  • RStudio Package Manager
  • Webmail (for checking emails from RStudio Connect)

This means tools like RStudio Connect, which is professional software that runs on Linux servers behind a company firewall, can now be set-up locally on a users machine within a matter of minutes. This means the user is able to get a feel for the products and experiment with their functionality. This includes things like: hosting Shiny applications, scheduling and distributing R Markdown documents, and exposing R functions as APIs.

Using RStudio Connect Quickstart

Using RStudio Connect Quickstart is relatively straightforward. Once you have downloaded the Quickstart virtual appliance from here, you then need to import this into VirtualBox or other similar virtualisation product. No further configuration is needed from this point and the virtual machine should start straight up. (Running the Virtual Machine in headless mode is recommended which will start up the environment, without displaying an ugly terminal screen on your desktop.)

Once the Virtual Machine has fully booted, navigating over to http://localhost:5000 on your desktop’s web browser will present you with a welcome page with links to RStudio Connect, RStudio Server Pro, Package Manager and the webmail client.

Creating and Deploying a Shiny app

Creating and deploying a shiny app in QuickStart is really efficient and easy. In this case, I have created some basic charts that I would like to deploy on to RStudio connect. To do this I just press the publish button, specify the URL of RStudio Connect, which in this case is http://localhost:5000/rsconnect (ensure you specify http and not https!).

The Shiny app is now deployed to RStudio Connect and is automatically opened for me to view in a separate browser window.

Old Faithful Geyser Data

Here you can see that I have deployed the default ‘Old Faithful Geyser Data’ app onto RStudio Connect directly from the RStudio Server IDE. Now that my Shiny app has been deployed into RStudio Connect, I can set who can and can’t view it as well as customise performance related settings in the ‘Runtime’ menu to the right.

Publishing an R Markdown Document

In the same way we can deploy Shiny apps, we can also publish R markdown documents. The great thing about RStudio Connect is that we can set a schedule in which we would like the knitted R markdown document to get emailed out on, as well as access it directly from the Connect UI.

In my case, I would like mine to be knitted and emailed to my colleagues every Friday evening. To do this I just need to go into the ‘Schedule’ menu, pick a date and a time that I would like this to go out, and then save the changes. Now my knitted R markdown document will get sent out to the recipients that I specified, every Friday evening.

Final Notes

The wide range of uses and functions the RStudio Connect Quickstart offering provides is perfect for testing, trialling, or just getting to know the superb range of products that RStudio offers. I would encourage everyone to give it a go and have a play around with some of the features it provides, In particular, RStudio Connect and the newest RStudio product family member, RStudio Package Manager. RStudio Package Manager helps you, your team, department or company centralize and organise your R packages which is ideal for environments where connectivity to resources outside of your company network is restricted or blocked.

For more details on using RStudio Connect QuickStart see https://www.rstudio.com/products/quickstart/.

Blogs home Featured Image

One of the few remaining hurdles when working with R in the enterprise is consistent access to CRAN. Often desktop class systems will have unrestricted access while server systems might not have any access at all.

This inconsistency often stems from security concerns about allowing servers access to the internet. There have been many different approaches to solving this problem, with some organisations reluctantly allowing outbound access to CRAN, some rolling their own internal CRAN-like repositories, and others installing a fixed set of packages and leaving it at that.

Fig 1. Access to public CRAN from multiple sources can be a security and compliance headache

Fortunately, this problem may now be a thing of the past. Yesterday RStudio announced a new software tool called “Package Manager” that provides a single, on-premise, CRAN-like interface that can provide access to CRAN, your organisations own internal packages, or a combination of the two all in a unified system.

RStudio Package Manager (RSPM) removes the need for IT teams to whitelist external access to CRAN from all of their R servers. Now, just a single system requires external access to a carefully managed CRAN mirror built specifically for this purpose. Internal systems can now connect to this single, internal package repository instead of to ad-hoc mirrors. Desktop and laptop users can connect to it too, providing a unified package management experience.

Fig 2. RStudio Package Manager simplifies CRAN access and reduces risk

Further, RSPM can be used to publish internal packages as well, and even supports hosting multiple repositories, which can be useful for different groups within the business.

Mango have been using RSPM and providing feedback on it since the earliest private beta stage and have already provided support around it to a small number of other beta customers. That, combined with our long R heritage and deep roots in the R and enterprise ecosystems means we’re well placed to help others on their enterprise R journey.

To schedule a call to discuss the options, contact sales@mango-solutions.com.