Blogs home Featured Image

Background

What’s your background?

I’m a software engineer through and through. I majored in Management Science, but I’ve been working professionally as a programmer since my freshman year in college (1996, the start of the dot-com boom). I started out as a web developer and then, once I graduated and realized I wanted to spend my career writing software, focused on becoming a generalist software engineer. As a self-taught programmer, it took until about 2008 for my imposter syndrome to completely disappear.

I’ve spent most of my career at Boston-area startups, and I’ve enjoyed it tremendously. I’ve had the privilege of working with many wonderful people, and released a lot of software that I’m proud of (or at least, that I was proud of at the time!). In 2006, the startup I worked for was acquired by Microsoft, which meant a relocation to the Seattle area. In late 2009 I left Microsoft to join RStudio, but chose to remain in Seattle.

I’ve worked on a variety of software in my career: web, desktop, front end, back end. I take special pleasure in writing parsers and multithreaded code (generally not at the same time though!).

Tell us about your first experience with R

My very first experience with R was my first day at RStudio. JJ had been working on RStudio for a few months already, and the first feature he assigned to me was syntax highlighting of R code for the source editor. So before I ever wrote a line of R code, I was reading The R Language Definition and got intimately familiar with the grammar.

How did you come to work at RStudio?

JJ Allaire (RStudio’s founder) and I go back a ways. I was a web development intern at his first company, Allaire, which jump-started my software development career; and I also worked at his second startup, Onfolio. So I’ve basically been working for JJ on and off since 1997.

At the time that JJ was thinking about R, I was working at Microsoft, itching to get back into a startup. As soon as JJ convinced himself that a web-based IDE for R was technically feasible, he brought me in to help him build it, and I started in September 2009. Unlike JJ, I wasn’t interested in statistics at all. But the technical challenge of building a web-based IDE was alluring, and I was not going to turn down the chance to work with JJ directly.

In those days, it was not at all clear to us whether we could build a sustainable business writing tools for R users. Luckily, JJ didn’t care—he wanted to make the IDE a reality whether we ever saw a dollar or not, and he was willing to invest both his own time and my salary to make that happen. Obviously, the best case scenario was to build a robust business, so we could hire more people to write more good software.

What does your role as CTO at RStudio involve?

Actually, the majority of my job is working on Shiny and leading the Shiny team. Mostly what I do is try to move Shiny forward. I still code, but not nearly as much as I would like. On any given day, I might be writing docs, investigating trickier bugs, explaining parts of the code to other team members, reviewing PRs, prioritizing feature lists, planning releases, checking on peoples’ status, and reporting our team’s status to other people. I also speak at conferences a few times a year, and those presentations usually take me a really pathologically long time to prepare.

The title of CTO doesn’t define my responsibilities, but instead is more of an acknowledgement that as the longest tenured RStudio employee, and having been intimately involved in the creation of many of our products (RStudio IDE, Shiny, Shiny Server, and Connect), I should have a seat at the table when we make major company decisions. I do take part in a lot of technical discussions and decisions outside of my team, but the same could be said for a lot of other experienced technical folks around RStudio: JJ, Hadley, Jonathan McPherson, Aron Atkins, Tareef Kawaf, and on and on.

Going forward, I’m hoping to find a way to clear my schedule so I can get down to writing a book about Shiny. I’m astonished that people like Hadley, Garrett, and Yihui can write whole books while still doing their jobs; it takes all my concentration to write well and I find it extremely taxing, though ultimately satisfying.

Shiny Origins

What led you to create the Shiny framework?

From pretty early on, JJ and I received feedback from potential users that they wanted the ability to create interactive applets and reports using R. The first person who asked us for this was Danny Kaplan at Macalester College (who was also the first beta tester of RStudio). At the time, he was having grad students learn Java so they could build in-browser applets to help students explore statistical concepts. He implored us to make it possible to build those applets in R instead.

JJ and I both thought the idea was really appealing, but we were 100% focused on RStudio IDE at the time. I told JJ I thought we should do it someday, but only if we could come up with a really great API for doing so. Having spent years specializing in UI programming for both desktop and web, I really did not want to subject R users to that highly specialized black art.

The hard part of web UI programming, to me, was not HTML and JavaScript. Learning those just required time. Rather, it was the explosion of state management spaghetti code that inevitably occurred when creating even moderately complicated UIs, regardless of language. In my experience, it was possible to create complicated UIs without exponentially increasing code complexity, but it required experience, discipline, and a bit of luck. Only the very best teams could pull it off.

In April 2012, the Meteor JavaScript framework was announced on Hacker News. The Meteor screencast evoked for me the old Arthur C. Clarke quote, “Any sufficiently advanced technology is indistinguishable from magic.” Despite having just built a state of the art app in RStudio IDE, I could not conceive of how their framework’s UI layer could be so interactive with so little state management code. I couldn’t stop thinking about that mystery, so a couple of weeks later, I took advantage of a long plane flight to delve into the Meteor source code. I eventually ended up at this tiny JavaScript file, and the light bulb went on. It was an incredibly elegant little hack that enabled a whole new style of UI programming.

It took a few more months before I made the connection that Meteor-style reactivity could be used on the server side to create a high-level app framework for R.

How long did it take to come up with?

It took a couple of years between Danny asking us for a web framework, and the conception of Shiny, during which I spent zero time consciously thinking about it. But the Meteor reactivity implementation must have worked its way into my subconscious. On the last morning of useR! 2012, I woke up and literally the first thought in my mind was the architecture of Shiny: a simple, semantic HTML vocabulary for specifying inputs and outputs; a reactive programming library on the server side for specifying those outputs using pure R code; and some JavaScript and WebSocket plumbing to tie everything together automatically. JJ added the final piece, which was specifying the HTML itself using R.

Were there any unexpected challenges in that first version?

All of the ideas turned out to be really surprisingly easy to implement. Those first few months, JJ and I made progress at an almost absurd pace. During that period, I almost couldn’t type fast enough to get the ideas out of my head and into code. Reactive programming was this fantastically powerful and general technique, but once you knew about the little hack from deps.js, actually implementing it was dead simple.

Work on Shiny officially started on June 20, 2012. The first prototype of Shiny was actually written in Ruby (as I barely knew R at the time), just to prove the architecture. It took a day and a half to go from zero to a working little Shiny.rb app. (You can see the state of the repo on that day here. Looking at server.rb and www/index.html, you can clearly see that the core ideas in Shiny were present back then.)

The biggest challenge in those early days was the lack of a truly robust web server package for R. We needed not only a traditional HTTP server, but also support for WebSockets, which was not even an IETF-approved standard at that time. We started out building Shiny on top of the websockets package by Bryan Lewis, an early friend of the company. I’m not sure what had compelled Bryan to write the package in the first place, but by the time we adopted it, he had moved on and was looking to transfer the maintainership to someone else. I gratefully accepted the responsibility. But soon after we shipped the first versions of Shiny, it became clear to me that we couldn’t keep going with the websockets package, as it was trivially vulnerable to denial-of-service attacks and I couldn’t fix it without starting over from scratch. Shiny was already on CRAN and interest in it was growing quickly, so I felt tremendous pressure to get us onto a stable foundation. The result was a six week, hair-on-fire sprint to create httpuv, which I published to CRAN in March of 2013.

Besides that, the biggest challenges were API design and writing good docs. The former was especially challenging because I had so little R experience at the time, which made it hard to design APIs that would feel idiomatic to R users (to the degree that a reactive web framework could feel idiomatic!). So the addition of Hadley and Winston Chang to the company in late 2012 was a huge help, and led to significant changes in the API. Writing good docs, on the other hand, is just hard. It’s so much easier to build a web framework than to teach people how to use it effectively. We made a big push for that initial release, but it was years before the documentation even began to catch up to fully describe what we had built (and in some areas, still hasn’t).

Did the reception to Shiny surprise you?

It really did. I knew we had created something that was technically interesting, demoed especially well, and served a need that R users would find interesting. What I didn’t know was whether the community could get their heads around reactive programming, or rather, whether they’d be willing to invest the time necessary to get their heads around it. I was shocked to find how eager people were to jump in and invest. Within the first month we were already getting really surprisingly sophisticated questions from people we’d never met.

Have you been surprised by the ways in which it’s been used?

On one hand, yes, constantly. A lot of the features we’ve added over the years were inspired by brave users who managed to shoehorn Shiny into a scenario that we had not designed for. And every time I teach a training workshop, at least one person will ask me to look at some bug in their app that they haven’t been able to figure out, and then demo some mind-blowing thing.

But at another level, I’m not that surprised that people have built surprising things with Shiny, if that makes sense. Shiny provides a pretty general set of capabilities, in that it gives you a way to create user interfaces and a way to make them interactive. So there was always the expectation that if R users were sufficiently motivated and invested, they could build really cool things that we had never thought of–and that’s exactly what happened.

The present

Will async become the default for Shiny?

No, not a chance! I think of async as raising the ceiling on Shiny’s potential scalability, but most apps shouldn’t need to use it. But I hope most users will feel good knowing it’s there in case they ever do need it.

In terms of products, we see a lot of people using RStudio Connect, what’s going on with that right now?

For those who aren’t familiar, RStudio Connect is our answer to on-premises publishing and sharing of the reports and apps you create in R. First, you can use it to deploy Shiny apps to your on-prem server without leaving RStudio—it’s just like publishing to ShinyApps.io. Second, it’s an extremely powerful R Markdown publishing server: you author .Rmd docs in RStudio as usual, but then you can one-click publish your project to Connect. Once on Connect, your report can be re-rendered on a schedule, run with user-specified parameters, automatically emailed to your colleagues, and more.

One of the recent focuses for the Connect team has been expanding the types of projects you can publish, beyond Shiny and R Markdown. The last release added support for deploying Plumber APIs (web service endpoints written in R) and TensorFlow deep learning models.

Another feature that’s under development is a programmatic API for the Connect server itself. This will let you programmatically execute tasks that previously needed to be performed through Connect’s user interface. This is an important feature for enterprises, who often want to integrate Connect to their existing systems.

There’s plenty more to come, but I’ve been sworn to secrecy!

At what point do you think it becomes useful for an R user to know some JavaScript when working with Shiny?

A lot of R users seem to come to JavaScript through d3, and that’s a totally understandable motivation. Personally, I think any R users who seriously want to get into bespoke visualization should consider JavaScript as their second programming language. That said, a lot of Shiny users have built pretty sophisticated apps without directly writing a line of JavaScript (the shinyjs package helps bridge the gap).

I would encourage R users who have JavaScript skills, to look for opportunities to package up JavaScript code into a friendly R package, so that R users who don’t (yet) know JavaScript can take full advantage of your work. The htmlwidgets package is the most popular way of doing this and is ideal for wrapping JS-based visualizations.

The future

How does the future look for Shiny? Can you share any of your plans?

We’ve just come off a big 18 months of work where we had a big focus on making Shiny easier to deploy in production settings: regression testing with shinytest, load testing with shinyloadtest, a new mechanism for scaling with async. In the near term, we’ll be following up with a new plot caching feature that can dramatically speed up certain classes of apps, and a ground-up rewrite of the reactivity visualizer that will finally deliver on the promise that the original implementation (?showReactLog) only hinted at.

We have some plans for the rest of the year, but we’re not ready to talk about them just yet, sorry!

Do you have an overall roadmap for Shiny and is there anything you can tell us about that?

We’ve always been much more reactive than proactive in our planning for Shiny. We almost didn’t have a choice about it in the early years, when every month we were learning so much about how people wanted to use Shiny and the problems they were encountering. That’s not to say that we don’t have a long backlog of features, fixes, documentation, and examples we’d love to tackle; just that we traditionally don’t commit to anything until we start working on it, in case it’s preempted by something we decide is more important.

I suspect we will need to adopt a formal roadmap someday soon. Both the Shiny team and RStudio as a company have grown so much that the lightweight processes I’ve insisted on in the past have started to break down.

And the one we all really want to know the answer to…

Why did you call it Shiny?

It’s from the late and lamented sci-fi series Firefly; in the show, they casually toss that word around to mean “cool”. I just liked the sound of it, and thought it’d make a good name for an open source library, but not for RStudio as we tended to use mostly straightforward, literal names in those days (“RStudio”, “R Markdown”, “RPubs”).

When the time came to create the GitHub repo for our new R web framework project, I intended it to call it something bland—not “RWeb”, but similar. But something strange happened. The new repo page on GitHub has a little prompt that suggests a random name for your repo, and to my delight, this time it said “Need inspiration? How about shiny-octocat.” I took that as a sign, named the repo Shiny, and despite some moments of doubt, it ultimately stuck.

Hmmm, I wonder if it’s too late to rename the shinytest package “gorram“.

Blogs home

James Blair, RStudio

Scalability is a hot word these days, and for good reason. As data continues to grow in volume and importance, the ability to reliably access and reason about that data increases in importance. Enterprises expect data analysis and reporting solutions that are robust and allow several hundred, even thousands, of concurrent users while offering up-to-date security options.

Shiny is a highly flexible and widely used framework for creating web applications using R. It enables data scientists and analysts to create dynamic content that provides straightforward access to their work for those with no working knowledge of R. While Shiny has been around for quite some time, recent introductions to the Shiny ecosystem make Shiny simpler and safer to deploy in an enterprise environment where security and scalability are paramount. These new tools in connection with RStudio Connect provide enterprise grade solutions that make Shiny an even more attractive option for data resource creation.

Develop and Test

Most Shiny applications are developed either locally on a personal computer or using an instance of RStudio Server. During development it can be helpful to understand application performance, specifically if there are any concerning bottlenecks. The profvis package provides functions for profiling R code and can profile the performance of Shiny applications. profvis provides a breakdown of code performance and can be useful for identifying potential areas for improving application responsiveness.

The recently released promises package provides asynchronous capabilities to Shiny applications. Asynchronous programming can be used to improve application responsiveness when several concurrent users are accessing the same application. While there is some overhead involved in creating asynchronous applications, this method can improve application responsiveness.

Once an application is fully developed and ready to be deployed, it’s useful to establish a set of behavioral expectations. These expectations can be used to ensure that future updates to the application don’t break or unexpectedly change behavior. Traditionally most testing of Shiny applications has been done by hand, which is both time consuming and error prone. The new shinytest package provides a clean interface for testing Shiny applications. Once an application is fully developed, a set of tests can be recorded and stored to compare against future application versions. These tests can be run programatically and can even be used with continuous integration (CI) platforms. Robust testing for Shiny applications is a huge step forward in increasing the deployability and dependability of such applications.

Deploy

Once an application has been developed and tested to satisfaction, it must be deployed to a production environment in order to provide other users with application access. Production deployment of data resources within an enterprise centers on control. For example, access control and user authentication are important for controlling who has access to the application. Server resource control and monitoring are important for controlling application performance and server stability. These control points enable trustworthy and performant deployment.

There are a few current solutions for deploying Shiny applications. Shiny Server provides both an open source and professional framework for publishing Shiny applications and making them available to a wide audience. The professional version provides features that are attractive for enterprise deployment, such as user authentication. RStudio Connect is a recent product from RStudio that provides several enhancements to Shiny Server. Specifically, RStudio Connect supports push button deployment and natively handles application dependencies, both of which simplify the deployment process. RStudio Connect also places resource control in the hands of the application developer, which lightens the load on system administrators and allows the developer to tune app performance to align with expectations and company priorities.

Scale

In order to be properly leveraged, a deployed application must scale to meet user demand. In some instances, applications will have low concurrent users and will not need any additional help to remain responsive. However, it is often the case in large enterprises that applications are widely distributed and concurrently accessed by several hundred or even thousands of users. RStudio Connect provides the ability to set up a cluster of servers to provide high availability (HA) and load balanced configurations in order to scale applications to meet the needs of concurrent users. Shiny itself has been shown to effectively scale to meet the demands of 10,000 concurrent users!

As businesses continue searching for ways to efficiently capture and digest growing stores of data, R in connection with Shiny continues to establish itself as a robust and enterprise ready solution for data analysis and reporting.

Blogs home

There are times when it costs more than it should to leverage javascript, database, html, models and algorithms in one language. Now maybe is time for connecting some dots, without stretching too much.

  • If you have been developing shiny apps, consider letting it sit on one live database instead of manipulating data I/O by hand?
  • If you use DT to display tables in shiny apps, care to unleash the power of interactivity to its full?
  • If you struggle with constructing SQL queries in R, so did we.

Inspired (mainly) by the exciting new inline editing feature of DT, we created a minimal shiny app demo to show how you can update multiple values from DT and send the edits to database at a time.

As seen in the screenshot, after double clicking on a cell and editing the value, Save and Cancel buttons will show up. Continue editing, the updates are stored in a temporary (reactiveValue) object. Click on Save if you want to send bulk updates to database; click on Cancel to reset.

Global

On the global level, we use pool to manage database connections. A database connection pool object is constructed. With the onStop() function, the pool object gets closed after a session ends. It massively saves you from worrying about when to open or close a connection.

# Define pool handler by pool on global level
pool <- pool::dbPool(drv = dbDriver("PostgreSQL"),
                     dbname="demo",
                     host="localhost",
                     user= "postgres",
                     password="ava2post")

onStop(function() {
  poolClose(pool)
}) # important!

The next job is to define a function to update database. The glue_sql function puts together a SQL query in a human readable way. Writing SQL queries in R was bit of a nightmare. If you used to assemble a SQL clause by sprintf or past, you know what I’m talking about. The glued query is then processed by sqlInterpolate for SQL injection protection before being executed.

updateDB <- function(editedValue, pool, tbl){
  # Keep only the last modification for a cell
  editedValue <- editedValue %>% 
    group_by(row, col) %>% 
    filter(value == dplyr::last(value)| is.na(value)) %>% 
    ungroup()

  conn <- poolCheckout(pool)

  lapply(seq_len(nrow(editedValue)), function(i){
    id = editedValue$row[i]
    col = dbListFields(pool, tbl)[editedValue$col[i]]
    value = editedValue$value[i]

    query <- glue::glue_sql("UPDATE {`tbl`} SET
                          {`col`} = {value}
                          WHERE id = {id}
                          ", .con = conn)

    dbExecute(conn, sqlInterpolate(ANSI(), query))
  })

  poolReturn(conn)
  print(editedValue)  
  return(invisible())
}

Server

We begin with server.R from defining a couple of reactive values: data for most dynamic data object, dbdata for what’s in database, dataSame for whether data has changed from database, editedInfo for edited cell information (row, col and value). Next, create a reactive expression of source data to retrieve data, and assign it to reactive values.

# Generate reactive values
rvs <- reactiveValues(
  data = NA, 
  dbdata = NA, 
  dataSame = TRUE, 
  editedInfo = NA 
)

# Generate source via reactive expression
mysource <- reactive({
  pool %>% tbl("nasa") %>% collect()
})

# Observe the source, update reactive values accordingly
observeEvent(mysource(), {

  # Lightly format data by arranging id
  # Not sure why disordered after sending UPDATE query in db    
  data <- mysource() %>% arrange(id)

  rvs$data <- data
  rvs$dbdata <- data

})

We then render a DataTable object, create its proxy. Note that the editable parameter needs to be explicitly turned on. Finally with some format tweaking, we can merge the cell information, including row id, column id and value, with DT proxy and keep all edits as a single reactive value. See examples for details.

# Render DT table and edit cell
# 
# no curly bracket inside renderDataTable
# selection better be none
# editable must be TRUE
output$mydt <- DT::renderDataTable(
  rvs$data, rownames = FALSE, editable = TRUE, selection = 'none'
)

proxy3 = dataTableProxy('mydt')

observeEvent(input$mydt_cell_edit, {

  info = input$mydt_cell_edit

  i = info$row
  j = info$col = info$col + 1  # column index offset by 1
  v = info$value

  info$value <- as.numeric(info$value)

  rvs$data[i, j] <<- DT::coerceValue(v, purrr::flatten_dbl(rvs$data[i, j]))
  replaceData(proxy3, rvs$data, resetPaging = FALSE, rownames = FALSE)

  rvs$dataSame <- identical(rvs$data, rvs$dbdata)

  if (all(is.na(rvs$editedInfo))) {
    rvs$editedInfo <- data.frame(info)
  } else {
    rvs$editedInfo <- dplyr::bind_rows(rvs$editedInfo, data.frame(info))
  }
})

Once Save button is clicked upon, send bulk updates to database using the function we defined above. Discard current edits and revert DT to last saved status of database when you hit Cancel. Last chunk is a little trick that generates interactive UI buttons. When dynamic data object differs from the database representative object, show Save and Cancel buttons; otherwise hide them.

# Update edited values in db once save is clicked
observeEvent(input$save, {
  updateDB(editedValue = rvs$editedInfo, pool = pool, tbl = "nasa")

  rvs$dbdata <- rvs$data
  rvs$dataSame <- TRUE
})

# Observe cancel -> revert to last saved version
observeEvent(input$cancel, {
  rvs$data <- rvs$dbdata
  rvs$dataSame <- TRUE
})

# UI buttons
output$buttons <- renderUI({
  div(
    if (! rvs$dataSame) {
      span(
        actionButton(inputId = "save", label = "Save",
                     class = "btn-primary"),
        actionButton(inputId = "cancel", label = "Cancel")
      )
    } else {
      span()
    }
  )
})

UI

The UI part is exactly what you normally do. Nothing new.

Bon Appétit

  1. Set up a database instance e.g. PostgreSQL, SQLite, mySQL or MS SQL Server etc.
  2. Download/clone the GitHub repository
  3. Run through script app/prep.R but change database details to one’s own. It writes to DB our demo dataset which is the nasa dataset from dplyr with an index column added
  4. Also update database details in app/app.R and run
    shiny::runApp("app")
Acknowledgements

Workhorse functionality is made possible by:

  • DBI: R Database Interface
  • RPostgreSQL: R Interface to PostgreSQL (one of many relational database options)
  • pool: DBI connection object pooling
  • DT: R Interface to the jQuery Plug-in DataTables (requires version >= 0.2.30)
  • Shiny: Web Application Framework for R
  • dplyr: Data manipulation
  • glue: Glue strings to data in R. Small, fast, dependency free interpreted string literals (requires version >= 1.2.0.9000. Blank cell crashes the app with version 1.2.0)

Learn how to use Shiny with our Introduction, Intermediate and Advanced courses.

Blogs home Featured Image

If you use RStudio Connect to publish your Shiny app (and even if you don’t) take care with how your arrange your projects. If you have a single project that includes both your data prep and your Shiny app, packrat (which RSConnect uses to resolve package dependencies for your project) will assume the packages you used for both parts are required on the RSConnect server and will try to install them all.

This means that if your Shiny app uses three packages and your data prep uses six, packrat and RSconnect will attempt to install all nine on the server. This can be time consuming as packages are often built from source in Connect-based environments, so this will increase the deployment time considerably. Furthermore, some packages may require your server admin to resolve system-level package dependency issues, which may even be for packages that your app doesn’t use while it’s running.

Keeping data prep and your app within a single project can also confuse people who come on to your project as collaborators later in the development process, since the scope of the project will be less clear. Plus, documenting the pieces separately also helps to improve clarity.

Lastly, separating the two will make your life easier if you ever get to the stage where you want to start automating parts of your workflow as the data prep stage will already be separate from the rest of the project.

Clear separation of individual projects (and by extension, source code repositories) may cause some short term pain, but the long term benefits are hard to understate:

  • Smoother and faster RStudio Connect deployments
  • Easier collaboration
  • More straightforward automation (easier to build out into a pipeline)
  • Simpler to document – one set for the app, another for your data prep

Of course, if your Shiny app actually does data prep as part of the apps internal processing, then all bets are off!

Blogs home

For the last week we’ve been talking on the blog and Twitter about some of the functionality in Shiny and how you can learn it. But, if you haven’t already made the leap and started using Shiny, why should you?

What is the challenge to be solved?

At Mango we define data science as the proactive use of data and advanced analytics to drive better decision making.

We all know about the power of R for solving analytic challenges. It is, without a doubt, one of the most powerful analytic tools available to us as data scientists, providing the ability to solve modelling challenges using a range of traditional and modern analytic approaches.

However, the reality is that we can fit the best models and write the best code, but unless someone in the business is able to use the insight we generate to make a better decision our teams won’t add any value.

So, how do we solve this? How can we share the insight with the decision makers? How can we actually drive decision making with the analytics we have performed? If we’re not putting the results of our analysis into the hands of the decision makers it’s completely useless.

This is where Shiny comes in!

What is Shiny?

Shiny is a web application framework for R. In a nutshell this means that anyone who knows some R can start to build applications that sit in a web browser. It could be as simple as displaying some graphics and tables, to a fully interactive dashboard. The important part is that it is all done with R; there are no requirements for web developers to get involved.

Also, Shiny allows us to create true ‘data products’ that go beyond standard Business Intelligence dashboards. We can define intuitive interfaces that allow business users to perform what-if analysis, manipulating parameters that enable them to see the impact of different approaches on business outcomes.

What can it do?

Once your Shiny app is built it’s basically an interface to R – meaning your Shiny application can do whatever R can do (if you allow it to). So you can create Shiny applications that do anything from ‘add some numbers together’ to ‘fit sophisticated models across large data sources and simulate a variety of outputs’.

There are more use cases for Shiny than we could possibly list here and I would strongly recommend checking out the Shiny user showcase for more examples.

Share Insights

When it comes to Shiny for sharing insights some of the most common uses that we see include:

  • Presenting results of analysis to end users in the form of graphics and tables, allowing limited interaction such as selecting sub-groups of the data
  • Displaying current status and presenting recommended next actions based on R models
  • Automated production of common reports, letting users upload their own data that can be viewed in a standard way

Day-to-Day Data Tasks

Sharing insights is by no means the only way in which Shiny can be used. At Mango we are regularly asked by our customers to provide applications that allow non-R users to perform standard data manipulation and visualisation tasks or run standard analysis based on supplied data or data extracted from a database. Essentially, this allows the day to day tasks to move away from the data scientists or core R users who can then focus on new business challenges.

Check out this case study for an example of how we helped Pfizer with an application to simplify their data processing.

Prototyping

Shiny is also a great tool for prototyping. Whilst it can be, and is, used widely in production environments, some businesses may prefer to use other tools for business critical applications.

But allowing the data scientists in the team to generate prototypes in Shiny makes it much easier to understand if investment in the full system will add value, whilst also providing an interim solution.

The possibilities really are endless – in fact a question you may need to consider is: when should we move from Shiny to a formal web development framework?

But the decision makers don’t use R

The best thing about Shiny is that it produces a web application that can be deployed centrally and shared as a URL, just like any other web page. There are a whole host of tools that allow you to do this easily.

My personal favourite is RStudio Connect, as I can deploy a new application quickly and easily without having to spend time negotiating with the IT team. But there are other options and I would recommend checking out the RStudio website for a great resource comparing some of the most popular ones.

How can we get started with shiny?

There are a number of ways that you can get started understanding whether Shiny could add value in your business: from Shiny training courses to developing a prototype.

Get in touch with the team at Mango who will be happy to talk through your current business requirements and advise on the next best steps for putting the power of Shiny into your decision making process.

Why do we love Shiny?

Shiny allows R users to put data insights into the hands of the decision makers. It’s a really simple framework that doesn’t require any additional toolsets and allows all of the advanced analytics of R to be made available to the people who will be making the decisions.

Shiny Training at Mango

This month we have launched our newly updated Shiny training programme. The three one-day courses go from getting started right through to best practices for putting Shiny into production environments.

Importantly, all of these courses are taught by data science consultants who have hands-on experience building and deploying applications for commercial use. These consultants are supported by platform experts who can advise on the best approaches for getting an application out to end users so that you can see the benefits of using Shiny as quickly as possible.

If you want to know more about the Shiny training that we offer, take a look at our training page. If you are based in the UK we will be running public Shiny courses in London (see below for the currently scheduled dates). We will also be offering a snapshot of the materials for intermediate Shiny users at London EARL in September.

Public course dates:
  • Introduction to Shiny: 17th July
  • Intermediate Shiny: 18th July, 5th September
  • Advanced Shiny: 6th September

If you would like more information or to register for our Shiny courses, please contact our Training Team.

Blogs home

Back in the summer of 2012 I was meant to be focusing on one thing: finishing my thesis. But, unfortunately for me, a friend and former colleague came back from a conference (JSM) and told me all about a new package that she had seen demoed.

“You should sign up for the beta testing and try it out,” she said.

So, I did.

That package was Shiny and after just a couple of hours of playing around I was hooked. I was desperate to find a way to incorporate it into my thesis, but never managed to; largely due to the fact it wasn’t available on CRAN until a few months after I had submitted and because, at the time, it was quite limited in its functionality. However, I could see the potential – I was really excited about the ways it could be used to make analytics more accessible to non-technical audiences. After joining Mango I quickly became a Shiny advocate, telling everyone who would listen about how great it was.

Six years on at Mango, not a moment goes by when somebody in the team isn’t using Shiny for something. From prototyping to large scale deployments, we live and breathe Shiny. And we are extremely grateful to the team at RStudio—led by Joe Cheng—for the continued effort that they are putting in to its development. It really is a hugely different tool to the package I beta tested so long ago.

As Shiny has developed and the community around it has become greater so too has the need to teach it because more people than ever are looking to become Shiny users. For a number of years, we have been teaching the basics of Shiny to those who want to get started, and more serious development tools to those who want to deploy apps in production. But increasingly, we have seen a demand for more. And as the Shiny team have added more and more functionality it was time for a major update to our teaching materials.

Over the past six months we have had many long discussions over what functionality should be included. We have debated best practices, we have drawn on all of our combined experiences of both learning and deploying Shiny, and we eventually reached a consensus over what we felt was best for industry users of Shiny to learn.

We are now really pleased to announce an all new set of Shiny training courses.

Our courses cover everything from taking your first steps in building a Shiny application, to building production-ready applications and a whole host of topics in between. For those who want to take a private course we can tailor to your needs, and topics as diverse as getting the most from tables in DT to managing database access in apps can all be covered in just a few days.

For us, an important element of these courses, is that they are all taught by data science consultants who have hands-on experience building and deploying apps for commercial use. These consultants are supported by platform experts who can advise on the best approaches for getting an app out to end users so that you can see the benefits of using Shiny as quickly as possible.

But, one blog post was never going to be enough for all of the Shiny enthusiasts at Mango to share their passion. We needed more time, more than one blog post and more ways to share with the community.

Therefore, Mango are declaring June to be Shiny Appreciation Month!

For the whole of June, we will be talking all things Shiny. Follow us on Twitter where we will be sharing tips, ideas and resources. To get involved, share your own with us and the Shiny community, using #ShinyAppreciation. On the blog we will be sharing, among other things, some of the ways we are using Shiny in industry and some of the technical challenges we have had to overcome.

Watch this space for updates but, for now, if you want to know more about the Shiny training that we offer, take a look at our training pages. If you are based in the UK we will be running public Shiny courses in London (see below for the currently scheduled dates). We will also be offering a snapshot of the materials for intermediate Shiny users at London EARL in September.

Public course dates:

Introduction to Shiny: 17th July
Intermediate Shiny: 18th July, 5th September
Advanced Shiny: 6th September

If you would like more information or to register for our Shiny courses, please contact our Training Team.

Blogs home

I always find it difficult to pick highlights from a conference and the eRum 2018 team did a fantastic job of making it difficult for me once again, so here goes…

Day One

The first day offered a huge choice in workshops, but teaching one of them meant we didn’t make it to any of the others. However, everyone we spoke to had great things to say about them all. In fact, we were overwhelmed by the turnout for our own workshop on the keras package (and we have to give a shout out to Mark Sellors for setting up and monitoring the server for us). By the way, if you missed out, you can sign up for the workshop at EARL London in September.

Day Two

Tuesday might have been a rainy start outside but inside we were mesmerised by the transparent roof in the Akvarium Klub. For team Mango, the morning mostly involved cat restocking, so we were really grateful for the live streaming that enabled us to keep up with everything going on in the main room.

My favourite presentations included:

Having newly been introduced to the recipes package I particularly enjoyed seeing Edwin Thoen talk about how to add your own data preparation steps and checks.

Olga Mierzwa-Sulima presented six packages to add functionality to shiny apps. These cover UI aspects like using semantic elements in shiny or easily exchanging themes for the app as well as user management aspects like authentication and controlling the level of access for different users. She also covered additional functionality like making routing possible with shiny and building multi-language apps.

Jeroen Ooms spoke about using Rust code in R packages. Rust is a new system programming language and can be an alternative to C/C++. Jeroen mentioned several advantages including Rust being memory safe and as fast as C/C++ while being far safer. It ships with a native package manager (cargo) and does not need a runtime library which means that the binary (R) package does not depend on Rust or cargo. He stressed that it’s easy to wrap Rust libraries into R packages so hopefully soon the selection of tools available from R will be even more varied.

A particular highlight for Doug Ashton came on Tuesday afternoon with three complementary talks on machine learning. With all the buzz to deal with in the ML world right now, Doug thought the practical talks from three level-headed practioners were very useful:

First Erin LeDell, Chief Machine Learning Scientist at h2o, gave an excellent talk on their automl package – a system they’ve been working hard on to run several different algorithms and select the best. Doug’s favourite part was their automated model ensembling (aka model stacking) that provides the best mix of all the algorithms.

Szilárd Pafka, Chief Scientist at Epoch followed on with a presentation on the provactively titled “Better than deep learning: Gradient Boosted Machines in R”, where he talked about why the majority of ML problems he sees are not best suited for deep learning. Szilárd also gave a nice overview of the best performing algorithms for gbm. (Doug’s note to self was to check out Microsoft’s lightgbm and xgboost with gpu.)


Going back to the theme of automation Andrie de Vries, Solutions Engineer at RStudio, took us through how to tune your tensorflow models using the tfruns package to run grid/random search over the hyperparameter space. This is very timely as we are often asked about how to select the right network topology and until now we’ve largely hand-tuned. Andrie then took us through an example where deep learning certainly is the right choice—image/pattern recognition with convolutional neural nets (CNNs)—and taking our example from the keras workshop significantly improved the accuracy with automated tuning – 👏.

After the rainy to start the day we were all relieved to see it had cleared up by the time the evening event (a river cruise) came around – we were lucky to have stunning views of Budapest from the Danube; the sunset on the Parliament building with stormy skies overhead was incredible.

Day Three

On Wednesday morning Roger Bivand did a great job of talking us all through some of the very important history of R – you all knew “_” was once an assignment operator, right?

RStudio’s Barbara Borges Ribeiro showed off a cool shiny app for drilling down into data and making use of the dynamic insertion of UI. Unfortunately I missed her talk, but managed to get a demo of it later in the day – you can take a look at the app on GitHub.

The biggest highlight for me though were the people. Without a doubt it was one of the friendliest conferences I have attended. Everyone was happy to share their experiences, answer your questions and point you in the direction of tools to look at later. Importantly, everyone was made to feel welcome, from the most-experienced to the newest R users.

The videos from all talks are being made available, check the conference homepage for the link.

Congratulations to the whole organising committee, led by Gergely Daroczi, for putting on such a great event. Mango are certainly looking forward to the next eRum conference!

Blogs home

In this blog post I explore the purrr package (member of tidyverse collection) and its use within a data scientist’s code. I aim to present the case for using the purrr functions and through the use of examples compare them with base R functionality. To do this, we will concentrate on two typical coding scenarios in base R: 1) loops and 2) the suite of apply functions and then compare them with their relevant counterpart map functions in the purrr package.

However, before I start, I wanted to make it clear that I do sympathise with those of you whose first reaction to purrr is “but I can do all this stuff in base R”. Putting that aside, the obvious first obstacle for us to overcome is to lose the notion of “if it’s not broken why change it” and open our ‘coding’ minds to change. At least, I hope you agree with me that the silver lining of this kind of exercise is to satisfy ones curiosity about the purrrpackage and maybe learn something new!

Let us first briefly describe the concept of functional programming (FP) in case you are not familiar with it.

Functional programming (FP)

R is a functional programming language which means that a user of R has the necessary tools to create and manipulate functions. There is no need to go into too much depth here but it suffices to know that FP is the process of writing code in a structured way and through functions remove code duplications and redundancies. In effect, computations or evaluations are treated as mathematical functions and the output of a function only depends on the values of its inputs – known as arguments. FP ensures that any side-effects such as changes in state do not affect the expected output such that if you call the same function twice with the same arguments the function returns the same output.

For those that are interested to find out more, I suggest reading Hadley Wickham’s Functional Programmingchapter in the “Advanced R” book. The companion website for this can be found at: http://adv-r.had.co.nz/

The purrr package, which forms part of the tidyverse ecosystem of packages, further enhances the functional programming aspect of R. It allows the user to write functional code with less friction in a complete and consistent manner. The purrr functions can be used, among other things, to replace loops and the suite of apply functions.

Let’s talk about loops

The motivation behind the examples we are going to look at involve iterating in R for various scenarios. For example, iterate over elements of a vector or list, iterate over rows or columns of a matrix … the list (pun intended) can go on and on!

One of the first things that one gets very excited to ‘play’ when learning to use R – at least that was the case for me – is loops! Lot’s of loops, elaborate, complex… dare I say never ending infinite loops (queue hysteric laughter emoji). Joking aside, it is usually the default answer to a problem that involves iteration of some sort as I demonstrate below.

# Create a vector of the mean values of all the columns of the mtcars dataset
# The long repetitive way
mean_vec <- c(mean(mtcars$mpg),mean(mtcars$cyl),mean(mtcars$disp),mean(mtcars$hp),
              mean(mtcars$drat),mean(mtcars$wt),mean(mtcars$qsec),mean(mtcars$vs),
              mean(mtcars$am),mean(mtcars$gear),mean(mtcars$carb))
mean_vec
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500

# The loop way
mean_vec_loop <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
  mean_vec_loop[[i]] <- mean(mtcars[[i]])
}
mean_vec_loop
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500

The resulting vectors are the same and the difference in speed (milliseconds) is negligible. I hope that we can all agree that the long way is definitely not advised and actually is bad coding practice, let alone the frustration (and error-prone task) of copy/pasting. Having said that, I am sure there are other ways to do this – I demonstrate this later using lapply – but my aim was to show the benefit of using a for loop in base R for an iteration problem.

Now imagine if in the above example I wanted to calculate the variance of each column as well…

# Create two vectors of the mean and variance of all the columns of the mtcars dataset

# For mean
mean_vec_loop <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
  mean_vec_loop[[i]] <- mean(mtcars[[i]])
}
mean_vec_loop
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500

#For variance
var_vec_loop <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
  var_vec_loop[[i]] <- var(mtcars[[i]])
}
var_vec_loop
 [1] 3.632410e+01 3.189516e+00 1.536080e+04 4.700867e+03 2.858814e-01
 [6] 9.573790e-01 3.193166e+00 2.540323e-01 2.489919e-01 5.443548e-01
[11] 2.608871e+00

# Or combine both calculations in one loop
for (i in seq_along(mtcars)) {
  mean_vec_loop[[i]] <- mean(mtcars[[i]])
  var_vec_loop[[i]] <- var(mtcars[[i]])
}
mean_vec_loop
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500
var_vec_loop
 [1] 3.632410e+01 3.189516e+00 1.536080e+04 4.700867e+03 2.858814e-01
 [6] 9.573790e-01 3.193166e+00 2.540323e-01 2.489919e-01 5.443548e-01
[11] 2.608871e+00

Now let us assume that we know that we want to create these vectors not just for the mtcars dataset but for other datasets as well. We could in theory copy/paste the for loops and just change the dataset we supply in the loop but one should agree that this action is repetitive and could result to mistakes. Instead we can generalise this into functions. This is where FP comes into play.

# Create two functions that returns the mean and variance of the columns of a dataset

# For mean
col_mean <- function(df) {
  output <- vector("double", length(df))
  for (i in seq_along(df)) {
    output[[i]] <- mean(df[[i]])
  }
  output
}
col_mean(mtcars)
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500

#For variance
col_variance <- function(df) {
  output <- vector("double", length(df))
  for (i in seq_along(df)) {
    output[[i]] <- var(df[[i]])
  }
  output
}
col_variance(mtcars)
 [1] 3.632410e+01 3.189516e+00 1.536080e+04 4.700867e+03 2.858814e-01
 [6] 9.573790e-01 3.193166e+00 2.540323e-01 2.489919e-01 5.443548e-01
[11] 2.608871e+00

Why not take this one step further and take full advantage of R’s functional programming tools by creating a function that takes as an argument a function! Yes, you read it correctly… a function within a function!

Why do we want to do that? Well, the code for the two functions above, as clean as it might look, is still repetitive and the only real difference between col_mean and col_var is the mathematical function that we are calling. So why not generalise this further?

# Create a function that returns a computational value (such as mean or variance)
# for a given dataset

col_calculation <- function(df,fun) {
  output <- vector("double", length(df))
  for (i in seq_along(df)) {
    output[[i]] <- fun(df[[i]])
  }
  output
}
col_calculation(mtcars,mean)
 [1]  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250
 [7]  17.848750   0.437500   0.406250   3.687500   2.812500
col_calculation(mtcars,var)
 [1] 3.632410e+01 3.189516e+00 1.536080e+04 4.700867e+03 2.858814e-01
 [6] 9.573790e-01 3.193166e+00 2.540323e-01 2.489919e-01 5.443548e-01
[11] 2.608871e+00

Did someone say apply?

I mentioned earlier that an alternative way to solve the problem is to use the apply function (or suite of applyfunctions such as lapply, sapply, vapply, etc). In fact, these functions are what we call Higher Order Functions. Similar to what we did earlier, these are functions that can take other functions as an argument.

The benefit of using higher order functions instead of a for loop is that they allow us to think about what code we are executing at a higher level. Think of it as: “apply this to that” rather than “take the first item, do this, take the next item, do this…”

I must admit that at first it might take a little while to get used to but there is definitely a sense of pride when you can improve your code by eliminating for loops and replace them with apply-type functions.

# Create a list/vector of the mean values of all the columns of the mtcars dataset
lapply(mtcars,mean) %>% head # Returns a list
$mpg
[1] 20.09062

$cyl
[1] 6.1875

$disp
[1] 230.7219

$hp
[1] 146.6875

$drat
[1] 3.596563

$wt
[1] 3.21725
sapply(mtcars,mean) %>% head # Returns a vector
       mpg        cyl       disp         hp       drat         wt 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250

Once again, speed of execution is not the issue and neither is the common misconception about loops being slow compared to apply functions. As a matter of fact the main argument in favour of using lapply or any of the purrr functions as we will see later is the pure simplicity and readability of the code. Full stop.

Enter the purrr

The best place to start when exploring the purrr package is the map function. The reader will notice that these functions are utilised in a very similar way to the apply family of functions. The subtle difference is that the purrr functions are consistent and the user can be assured of the output – as opposed to some cases when using for example sapply as I demonstrate later on.

# Create a list/vector of the mean values of all the columns of the mtcars dataset
map(mtcars,mean) %>% head # Returns a list
$mpg
[1] 20.09062

$cyl
[1] 6.1875

$disp
[1] 230.7219

$hp
[1] 146.6875

$drat
[1] 3.596563

$wt
[1] 3.21725
map_dbl(mtcars,mean) %>% head # Returns a vector - of class double
       mpg        cyl       disp         hp       drat         wt 
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250

Let us introduce the iris dataset with a slight modification in order to demonstrate the inconsistency that sometimes can occur when using the sapply function. This can often cause issues with the code and introduce mystery bugs that are hard to spot.

# Modify iris dataset
iris_mod <- iris
iris_mod$Species <- ordered(iris_mod$Species) # Ordered factor levels class(iris_mod$Species) # Note: The ordered function changes the class [1] "ordered" "factor" # Extract class of every column in iris_mod sapply(iris_mod, class) %>% str # Returns a list of the results
List of 5
 $ Sepal.Length: chr "numeric"
 $ Sepal.Width : chr "numeric"
 $ Petal.Length: chr "numeric"
 $ Petal.Width : chr "numeric"
 $ Species     : chr [1:2] "ordered" "factor"
sapply(iris_mod[1:3], class) %>% str # Returns a character vector!?!? - Note: inconsistent object type
 Named chr [1:3] "numeric" "numeric" "numeric"
 - attr(*, "names")= chr [1:3] "Sepal.Length" "Sepal.Width" "Petal.Length"

Since by default map returns a list one can ensure that an object of the same class is returned without any unexpected (and unwanted) surprises. This is inline with FP consistency.

# Extract class of every column in iris_mod
map(iris_mod, class) %>% str # Returns a list of the results
List of 5
 $ Sepal.Length: chr "numeric"
 $ Sepal.Width : chr "numeric"
 $ Petal.Length: chr "numeric"
 $ Petal.Width : chr "numeric"
 $ Species     : chr [1:2] "ordered" "factor"
map(iris_mod[1:3], class) %>% str # Returns a list of the results
List of 3
 $ Sepal.Length: chr "numeric"
 $ Sepal.Width : chr "numeric"
 $ Petal.Length: chr "numeric"

To further demonstrate the consistency of the purrr package in this type of setting, the map_*() functions (see below) can be used to return a vector of the expected type, otherwise you get an informative error.

  • map_lgl() makes a logical vector.
  • map_int() makes an integer vector.
  • map_dbl() makes a double vector.
  • map_chr() makes a character vector.
# Extract class of every column in iris_mod
map_chr(iris_mod[1:4], class) %>% str # Returns a character vector
 Named chr [1:4] "numeric" "numeric" "numeric" "numeric"
 - attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
map_chr(iris_mod, class) %>% str # Returns a meaningful error
Error: Result 5 is not a length 1 atomic vector

# As opposed to the equivalent base R function vapply
vapply(iris_mod[1:4], class, character(1)) %>% str  # Returns a character vector
 Named chr [1:4] "numeric" "numeric" "numeric" "numeric"
 - attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
vapply(iris_mod, class, character(1)) %>% str  # Returns a possibly harder to understand error
Error in vapply(iris_mod, class, character(1)): values must be length 1,
 but FUN(X[[5]]) result is length 2

It is worth noting that if the user does not wish to rely on tidyverse dependencies they can always use base R functions but need to be extra careful of the potential inconsistencies that might arise.

Multiple arguments and neat tricks

In case we wanted to apply a function to multiple vector arguments we have the option of mapply from base R or the map2 from purrr.

# Create random normal values from a list of means and a list of standard deviations
mu <- list(10, 100, -100)
sigma <- list(0.01, 1, 10)

mapply(rnorm, n=5, mu, sigma, SIMPLIFY = FALSE) # I need SIMPLIFY = FALSE because otherwise I get a matrix
[[1]]
[1] 10.002750 10.001843  9.998684 10.008720  9.994432

[[2]]
[1] 100.54979  99.64918 100.00214 102.98765  98.49432

[[3]]
[1]  -82.98467  -99.05069  -95.48636  -97.43427 -110.02194

map2(mu, sigma, rnorm, n = 5)
[[1]]
[1] 10.00658 10.00005 10.00921 10.02296 10.00840

[[2]]
[1]  98.92438 100.86043 100.20079  97.02832  99.88593

[[3]]
[1] -113.32003  -94.37817  -86.16424  -97.80301 -105.86208

The map2 function can easily extend to further arguments – not just two as in the example above – and that is where the pmap function comes in.

I also thought of sharing a couple of neat tricks that one can use with the map function.

  1. Say you want to fit a linear model for every cylinder type in the mtcars dataset. You can avoid code duplication and do it as follows:
# Split mtcars dataset by cylinder values and then fit a simple lm
models <- mtcars %>% 
  split(.$cyl) %>% # Split by cylinder into 3 lists
  map(function(df) lm(mpg ~ wt, data = df)) # Fit linear model for each list
  1. Say we are using a function, such as sqrt (calculate square root), on a list that contains a non-numeric element. The base R function lapply throws an error and execution stops without knowing what caused the error. The safely function of purrr completes execution and the user can identify what caused the error.
x <- list(1, 2, 3, "e", 5)

# Base R
lapply(x, sqrt)
Error in FUN(X[[i]], ...): non-numeric argument to mathematical function

# purrr package
safe_sqrt <- safely(sqrt)
safe_result_list <- map(x, safe_sqrt) %>% transpose
safe_result_list$result
[[1]]
[1] 1

[[2]]
[1] 1.414214

[[3]]
[1] 1.732051

[[4]]
NULL

[[5]]
[1] 2.236068

Conclusion

Overall, I think it is fair to say that using higher order functions in R is a great way to improve ones code. With that in mind, my closing remark for this blog post is to simply re-iterate the benefits of using the purrrpackage. That is:

  • The output is consistent.
  • The code is easier to read and write.

If you enjoyed learning about purrr, then you can join us at our purrr workshop at this years EARL London – early bird tickets are available now!

Blogs home

Another month, another sweepstake to raise money for the Bath Cats & Dogs home!

This time, we picked the Eurovision song contest as our sweepstake of choice. After enjoying my first experience of using R to randomise the names for the previous sweepstake I decided to give it another go, but with a few tweaks.

Soundcheck

During my first attempt in R, issues arose when I had been (innocently!) allocated the favourite horse to win. I had no way to prove that the R code had made the selection, as my work was not reproducible.

So with the cries of “cheater!” and “fix!”” still ringing in my ears, we started by setting a seed. This meant that if someone else was to replicate my code they would get the same results; therefore removing the dark smudge against my good name.

At random I selected the number 6 at which to set my seed.

set.seed(6)

I next compiled my lists of people and Eurovision countries and associated them with correlating objects.

people_list <- c(
    "Andy M",
    "Adam",
    "Laura",
    "Rachel",
    "Owen",
    "Yvi",
    "Karis",
    "Toby",
    "Jen",
    "Matty G",
    "Tatiana",
    "Amanda",
    "Chrissy",
    "Lisa",
    "Lisa",
    "Ben",
    "Ben",
    "Robert",
    "Toby",
    "Matt A",
    "Lynn",
    "Ruth",
    "Julian",
    "Karina",
    "Colin",
    "Colin")
countries_list <- c(
    "Albania",
    "Australia",
    "Austria",
    "Bulgaria",
    "Cyprus",
    "Czech Rep",
    "Denmark",
    "Estonia",
    "Finland",
    "France",
    "Germany",
    "Hungary",
    "Ireland",
    "Israel",
    "Italy",
    "Lithuania",
    "Moldova",
    "Norway",
    "Portugal",
    "Serbia",
    "Slovenia",
    "Spain",
    "Sweden",
    "The Netherlands",
    "Ukraine",
    "United Kingdom"
  )

Once I had the lists associated with objects, I followed the same steps as my previous attempt in R. I put both objects into data frames and then used the sample function to jumble up the names.

assign_countries <- data.frame(people = people_list,
                               countries = sample(countries_list))

Task complete!

Fate had delivered me Denmark, who were nowhere near the favourites at the point of selection. I sighed with relief knowing that I had no chance of winning again and that perhaps maybe now I could start to re-build my reputation as an honest co-worker...

Encore

Before I finished my latest foray into R, we decided to create a function for creating sweepstakes in R.

I was talked down from picking the name SweepstakeizzleR and decided upon the slightly more sensible sweepR.

I entered the desired workings of the function, which followed from the above work in R.

sweepR <- function(a, b, seed = 1234){
 set.seed(seed)
 data.frame(a, sample(b))
}

Once done, I could use my newly created function to complete the work I had done before but in a much timelier fashion.

sweepR(people_list, countries_list)

My very first function worked! Using a function like sweepR will allow me to reliably reproduce the procedures I need for whatever task I'm working on. In this case it has enabled me to create a successfully random sweepstake mix of names and entries.

WinneR

With great relief Israel won Eurovision and I was very happy to hand over the prize to Amanda.

I really enjoyed learning a little more about R and how I can create functions to streamline my work. Hopefully another reason will come up for me to learn even more soon!