By Chris Campbell.
Forty people arrived at Bar Odder for an evening of fun, frolics and statistics. The mood was one of excitement; a subdued carnival. Forty enthusiasts keen to share their knowledge and learn. Although people arrive in groups, the excitement is infectious as people strike up conversations, pass badges and USB drives around and jostle for a better view of the projector.
As this was the first ever Manchester R the topics were designed for helping people make a start in R. We started off with a pre-session workshop; a brief introduction to R for absolute beginners. It can be hard to remember what it was like first moving from doing everything with graphical user interfaces (GUIs) to working from a console. It can be daunting at first. Starting with the simplest tasks is essential. I showed those who’d arrived early for the workshop some bread and butter commands.
In R we do most of our work with objects. Learning how to work with data objects is essential to everything we do in R. Vectors are a commonly used simple data object. Harnessing the power of vector calculations is a key skill for R users.
An essential skill is extracting values from data objects.
Another useful data structure is the data frame. There are some useful features of data frames that can catch beginners out. The most common difficulty new users have when working with data frames for the first time is that text strings are converted to factors. Factors are an incredibly useful data class, but require some practice, and need to be treated with care.
Two dimensional objects also need an additional index when extracting information. Just like referencing a spreadsheet cell by rows and columns we refer to contents in a data frame using the row index, then the column index using square brackets.
There is a lot of great functionality that we can use when working with data objects. When did our stock value reach a certain value? We can perform logical tests on columns of our data frame, and use these to select data.
One of the most gratifying features of R as a new user is how quickly one can create plots.
Learning R is more like an exploration of a new language than mastering a static syllabus. Getting help is an essential part of working in R at all levels of experience. The user documentation can sometimes seem a little terse, but there is a wealth of excellent books, web resources, mailing lists, and friendly consultants that can help with problems large and small.
A useful suite of functions that new users should know about are functions for writing out fixed sized images. A useful help topic is png, the portable network graphics device.
Understanding how to provide arguments to functions is another essential concept. Arguments are information provided in a variety of formats that modify the behaviour of functions.
Following this brief introduction to R, the main meeting started. For the first talk http://www.rmanchester.org/Presentations/Rslides.pdf Graeme showed off the benefits of using Rcmdr (pronounced R Commander), a GUI system to perform common tasks in R. When performing tasks such as creating a plot from the menu, Rcmdr generates the plot and also displays the code used to create that plot.
Graeme teaches students that have learned stats using SPSS. The more familiar menu system of Rcmdr eases them into the new way of thinking. They can create a plot with clicks as before, can see how the plots were created, and can take that code and can tweak it to customize the appearance.
I then gave a presentation on package building http://www.rmanchester.org/Presentations/packageBuilding.pdf. Once you have some useful scripts, a great way to consolidate your code into a robust and shareable unit is as a package. I presented the how tos of install R tools, create package source, build and test a package.
Munawar then presented guidelines to setting out on a project in R http://www.rmanchester.org/Presentations/talkManch.pdf. Managing data is a vital part of any analytics project. Without tidy data we have to work harder at every stage of the analysis, mistakes are more likely, and our work is harder to follow. Munawar structured his data in a SQL database. With planning and focus we can all succeed as analysts.
It’s great to have such an enthusiastic community in Manchester already. We already have many volunteers keen to present their experiences in a variety of industries. So if you’d like to present in 2013 you’ll need to get in quickly! It was a fun meeting, and I look forward to seeing everyone again in May. Bring a friend, and introduce them to the world of R.
Our blog and many others are also available at – www.r-bloggers.com/