Blogs home Featured Image

Today we caught up with Andrew Little and Daniel de Bortoli who will be teaching the ‘Web Scraping and Text Mining Lyrics in R‘ workshop at EARL on the 9th of September.  We spoke about their careers and lives at Mango and what their dream workshop would look like…

Hi both! Thank you for talking to me today, I’d love to know some more details about your lives at Mango and how your career has led you to this point?

Daniel – I’ve been a Data Scientist at Mango for around a year, prior to Mango I was a researcher in the mechanical engineering sector until deciding that I wanted to move away from academia, and data science always seemed like an exciting and interesting area. I was interested in solving more practical problems, but still having intellectually challenging and interesting work. Since joining Mango, I’ve done lots of consulting and training projects and more recently I’ve been working on a revenue optimisation project.

Andrew – I’m a Junior Data Scientist at Mango – I’ve been here for two years now. I came straight from university so this is the first step in my career, I believe I was part of the first graduate intake that Mango did. I’ve done lots of training in my time here and a decent amount of consultancy projects too. Currently, I’m doing lots of work on helping teams move from one programming language to R, so I’m showing them the best way to work in R.

You’re hosting a workshop on web scraping, why do you think this is a useful skill for Data Scientists?

Andrew – There are many situations where you have data given to you and it’s just available, but perhaps when you try and do something more exciting or nuanced often there is no data and you need to get that yourself. So it’s a way that you can use the ridiculous vastness of the internet to get freely available data that you can then use.

Daniel – No matter what you’re interested in, if it is on the internet you can collect your own data. That’s the power of it.

What can people expect to leave the workshop knowing?

Daniel – Having a full end-to-end view of a modern workflow, so working with text data in R. From collecting – web scraping – to processing it, cleaning it, and looking at many of the common modeling approaches or common tasks with text data. All the way to generating some interesting outputs. We chose lyrics to work with, as they’re not too niche and most people are interested in music!

What would your dream workshop be?

Andrew – Personally for me there are two main areas I’m interested in, one is the workshop we’re actually teaching – so that’s good!

Daniel – So you’re saying we created your dream workshop?

Andrew – You’re right actually, yes! So either this or I’m quite interested in computer vision as well. That’s something you don’t see too much on, as it’s quite a new area. Anything that is seen as cutting edge I am interested in and not just standard statistical analysis. For example when you’re using data to build AI that actually feels like AI – so it’s doing something like a human would do, like reading text documents or processing images.

Daniel – Preparing this workshop, I realised I’d never actually worked with audio data. We’ve been referring to data that Spotify has, on things like, how upbeat a song is – so that’s audio analysis. That would be really interesting and uncommon to look into.

Thank you both!

If you’d like to find out more about the EARL Conference and the other workshops we have available, please visit here.

fraud in finance
Blogs home Featured Image

Attending the recent Transform Finance ‘Virtual Fraud in Financial Services’ online event was a great opportunity to share experiences and knowledge with experts from across a highly innovative industry.

The event took in a range of critical themes, including our session with Cifas focusing on ‘Using Advanced Analytics To Detect, Prevent And Tackle Fraud’. Reflecting on the experience overall, there were a number of key takeaways that illustrate both the challenges and opportunities facing an industry increasingly reliant on technology to deliver effective services while tackling some major obstacles to success:

  1. The Changing Fraud Landscape

Many organisations are investing heavily in their use of data and technology to positively prevent and detect crime and fraud. This is essential as the emerging trends are worrying – The 2021 Identity Fraud Study revealed the true scale of identity fraud scams to consumers and businesses alike. While total combined fraud losses climbed to $56 billion in 2020, identity fraud scams accounted for $43 billion of that cost with ‘traditional’ identity fraud losses totalling $13 billion.

With fraudsters becoming more sophisticated, it is essential that organisations across the diverse finance industry constantly adapt to change with technology and application of data.

  1. Innovation And Cooperation

In a changing world, it’s clear that financial institutions understand the need for greater collaboration and data sharing across the industry. There is growing recognition that a more joined up approach to issues such as fraud detection and prevention is key to delivering effective outcomes. As an example, cooperation between ‘core’ public and private finance organisations with third party organisations, including regulators, mobile providers and social media companies is essential best practice when it comes to reducing the incidences and impact of fraud.

This extends to the wider implementation of technologies such as digital signatures and document sharing through digital channels and the development of digital currency for financial inclusion. In all these emerging areas, working together is key, while the application and proactive use of data, good management, metrics and governance can help ensure success and make sure that industry-wide goals in relation to fraud are clear.

  1. Tackling Major Fraud Trends

As part of our contribution towards the Transform Finance event, our Chief Data Scientist, Rich Pugh and Sandra Peaston, Director of Research & Development at Mango customer, Cifas, joined a panel session to discuss the use of data and intelligence to support fraud prevention.

Among the areas discussed, the panel agreed that it is essential that the industry operates from a level playing field and by sharing intelligence via organisations such as Cifas. In doing so, finance businesses across the sector are much better placed to adapt to new trends. For instance, an approach called ‘transfer learning’ can enable small organisations to benefit from the insight and data generated by larger businesses to react more quickly and effectively. There should also be ongoing efforts to improve customer education and awareness, reinforcing the idea that individuals always need to remain vigilant.

Looking to the future, data and technology will play an increasing role in combating and detecting crime. By maximising collaboration between corporates, banks, regulators and other key stakeholders, the industry will move towards a scenario where fraud detection happens in real time to minimise risk and loss.

To achieve these goals, organisations must focus on improving their capabilities, modernising fraud operations and maximising the technical competency of their teams. Those that do will be ideally placed to play a full and effective role in tackling fraud while gearing for growth.

For more information on how Mango is supporting working and detecting financial crime, read our Cifas case study.

 

EARL Purrr workshop
Blogs home Featured Image

This week we spoke to Xinye Li, Head of Data Science at Mango Solutions, to talk about his career thus far and what people can expect to learn at the ‘Functional Programming with Purrr’ EARL workshop.

Hi Xinye, thanks for joining me today. Could you tell me about your career and what you do at Mango currently?

I joined the Mango/Ascent 3 months ago as Head of Data Science, before that I had been a full-time Data Scientist with pretty hands-on responsibility – in coding, delivering results for both client-side business as well as agency. That means I have gained quite a wide range of experience with problems businesses can face – and dealing with the different stages of data maturity and how businesses go from coming up with questions and using data to solve those problems. At the core, I am a passionate Data Scientist who is always happy to write code and will always be fascinated by new developments in Data Science.

You are teaching the ‘Functional programming with Purrr‘ workshop at EARL online – could you let us know a bit more about the workshop and what people could expect to learn?

Talking of new developments, in the R language especially, I think the functional programming aspect of R is being talked about a lot more now. It has always been its core strength – for example, many base R maths symbols are actually sugar coating of functional implementations in the back end without people realising it.

Some of the latest best practices in coding in R such as Shiny Modules have surfaced the need to understand functional programming. With companies such as RStudio and their open-source contributions, this has been made a lot more fun and easier to practice functional programming. So the aim of the workshop is to introduce the idea of functional programming, demonstrate how to implement that with R, and provide some useful tips to write good functions. As an example, purrr is a package designed to work with the functional programming paradigm in R – the code is much easier to manage and it’s easy to convert the code to massively parallel processing with minimal effort.

Thanks, Xinye! We can’t wait for your workshop.

The Enterprise Applications of the R Language Conference is back online in 2021 from the 6th-10th of September. Tickets are now on sale for the four workshops and the final day of presentations on using R in enterprise.

Every year we train thousands of people worldwide from a range of backgrounds and industries, in face-to-face and virtual classrooms. Our instructors have extensive subject matter experience and real-world application knowledge. Please get in touch if you’d like to find out more about our training services.

 

 

Blogs home Featured Image

The next edition of LondonR online will take place on Tuesday the 27th of July from 4pm (BST). Tickets are free for attendees and you can register here for a place!

This next LondonR will have three presenters all sharing their work on using R stats in the real world.

The first presenter joining us is Paulito Palmes from the IBM Dublin Research Lab, he will be presenting on ‘JuliaR for data science and machine learning workflow’. Paulito is a research scientist at the IBM Research Europe (Dublin Research Lab) working in the areas of analytics, datamining, machine learning, reinforcement learning, automated decisions, and AI.

Next on the agenda at LondonR will be one of the Mango Solutions team, Elizabeth Brown. Elizabeth joined Mango as a professional placement student, with a key interest in Data Science and Shiny app development. Elizabeth will be presenting on ‘Creating a Shiny Dashboard as a Tool for Learning Git’. A best practice in Data Science is using Git for version control, something which isn’t introduced until working in industry. Thus, Elizabeth has developed a Shiny dashboard as a tool for beginners to learn how to use Git. In her presentation, she will give a demo of the app and how she has used {golem} and {shinyjs} in its development.

The final presenter place at LondonR will be announced shortly. We hope to see you there!

 

financial fraud
Blogs home Featured Image

The recent Transform Finance ‘Virtual Fraud in Financial Services’ event offered some fascinating insight into the risks facing the sector and how organisations are investing in advanced technologies to detect, prevent and tackle fraud.

An important and recurring theme was the role of data analytics in meeting the challenges presented by rising levels of fraud and the increasing sophistication of fraudsters. Bringing insight and experience to life for attendees, Mango Chief Data Scientist, Rich Pugh, was joined by Sandra Peaston, Director of Research & Development at Cifas to discuss their use of data and intelligence to support fraud prevention.

Cifas is the UK’s leading fraud prevention service, managing the largest database of instances of fraudulent conduct in the country. Its members are organisations from all sectors, sharing their data across to reduce instances of fraud and financial crime.

As a data-centric organisation, Cifas wanted to develop deeper insight into emerging fraud trends, understand which were the most significant and then quickly share that information with its members for further action.

Getting ahead of the game was key, and as Sandra Peaston described, “We wanted to use our data to speed up the early-stage intelligence process so our members didn’t need to report trends to us. Unlocking the power of the data we already hold was the challenge that took us to Mango.”

Having been approached by Cifas, Mango quickly deployed a team of data scientists to establish the right technical environment. As Rich Pugh explained, “The Cifas team has amassed some incredible data assets, but with many areas of potential focus the key question was: where could we deliver quick impacts against their priorities?”

The Mango project team focused on two core areas. The first was a ‘Match’ project, built to reduce false positive rates and improve the Cifas rules engine. This was supported via the development of a probabilistic matching engine prototype, designed to improve the existing matching and reduce member friction.

The second part of the solution was an ‘Intelligence’ project. This focused on the development of a fuzzy search capability and a signal detection tool to automate the previous manual fraud detection processes to uncover hidden and emerging fraud patterns. This insight would then be used to enrich intelligence and feedback to members.

As Sandra explained to event attendees, “We needed an intelligent way of dynamically identifying an emerging fraud trend, and key to this was the speed at which this happens. By working with Mango to uncover the huge power that sits within our data to a level of granularity that we couldn’t manage before, we can help members to prioritise and make them more efficient.”

Together, Cifas and Mango have deployed a best-practice framework using intelligence tools that demonstrably reveal hidden patterns that human beings would struggle to detect. Looking ahead, the teams will continue to innovate and use data science to unlock insight relating to fraud and e-crime, refining algorithms over time to become even more effective in countering criminal activity and finding ways to stay ahead of malicious actors.

RStudio Managed Service
Blogs home Featured Image

Author: Rich Adams, RStudio Partner Manager

Free webinar: How to successfully manage your R environment – the RStudio managed service platform (22nd July @ 4PM BST)

In a free session on Thursday 22nd July, we’ll be discussing how data science teams can confidently and securely collaborate with large data sets in R, supported  with the right expertise where capacity or skills may otherwise be lacking internally.

With guest speakers Lou Bajuk, Director of Product Marketing, RStudio and Will Yuill, Principal Public Health Analyst, Hertfordshire County Council, we’ll explore how data science teams can develop a best practice managed service production environment and achieve maximum return on investment from their data science cloud platform. Register here

What’s the webinar about?

 As a language, R can come with restrictions when it comes to the implementation and necessary technical know-how of installing, configuring, and supporting a centralised platform for maximum adoption.

Many teams lack the required support from IT or the necessary knowledge that makes an environment suitable for future scalability. This can impact a team in their ability to manage large data sets, collaborate with ease and often means a duplication of effort.

This webinar focuses on how to develop a best practice production environment, ensuring technical excellence and maximum return on investment from your data science platform.

Also under discussion is:

  • How to effectively reduce barriers to scaling your R environment through a ‘RStudio Managed service’
  • How Hertfordshire County Council overcame their barriers through the extra pressure of Covid-19 through a managed Services platform

Why is it important?

As we have seen from this year, scaling of data science teams and investment in data-driven strategies is even more crucial than ever.

If like Hertfordshire County Council your team has seen a rapid development, yet you lack the internal expertise and resources to support an RStudio environment – a managed services platform may be the secure, compliant and effective cloud environment that can be up and running effectively almost immediately.  This expert Managed Service removes the need for specialist in-house IT expertise and guarantees a service level agreement to meet your requirements in terms of configuration, maintenance, and system updates.

Can you join us on 22th July, 4pm to learn more?

The Public Health Evidence & Intelligence Team at Hertfordshire Country Council will discuss why this is already providing an effective solution for them.

Register for the webinar here 

Blogs home Featured Image

Towards the start of my placement, I was introduced to Shiny apps: {shiny} is an R package which allows users to create applications directly within R. I knew that this was something I wanted to learn more about. Additionally, I was introduced to Git, a version control system which is a best practice within data science and software development. As a result, I started a personal project with the aim of creating a Shiny app as a tool for learning Git, its main target audience being new Graduates and Placement Students.

The project had two main goals: the first to create an app for learning how to use Git locally; the second to expand on this to include remote Git.

Goal 1: Local Git

To create the app as an R package, I decided to use the framework provided by the R package {golem}. Using this framework had many advantages including keeping track of dependencies and easy app modularisation. I used several other packages when creating the app, including {shinydashboard} to allow for a dashboard layout. The code for each page of the dashboard is contained within its own module, which means that the code is well organised and easy to read.

One challenge I faced was the development of a few UI features. Most of these were solved by using the package {shinyjs} which allows users to improve Shiny apps using JavaScript. I used this to hide and disable relevant action buttons and when creating a bottom navigation bar. This navigation bar is used to move between the pages of the dashboard. This proved difficult, but with the help of the open-source community I was able to resolve the issue, creating a key feature of the app’s UI.

After developing the first stage of the app, I gave a demonstration to data science colleagues who gave positive feedback, with ideas for the future development of the app. Once the first version of the app was complete, it was time to test it. I used the package {shinytest} to automate the testing via a snapshot-based testing strategy. Once the tests passed, I finally deployed the app using RStudio Connect, which allows users to access it via a URL. I also deployed it via shinyapps.io.

Following the completion of the first version of the app, I gave a presentation at the BarcelonaR conference, demonstrating the app and the code behind it. The code for this version can be found on GitHub along with smaller example apps.

Goal 2: Remote Git

The next, and major release, of the app continues the materials for learning Git both locally and remotely. It also includes a project designed to introduce the user to the concept of an Agile framework, as well as a practical scenario for using Git.

Moreover, I created a help page and took the opportunity to learn how to send emails from within a Shiny app. This was a successful learning exercise, however the main challenge that followed this was its maintenance. Eventually I removed this feature and instead created a help page that contains a number of Git references.

Towards the end of the development, I gave another demonstration to data science colleagues, receiving positive feedback. The app also received direct user testing from a new Graduate. Once some changes were made based on this feedback, I tested the app again using {shinytest}. Finally, I deployed this via RStudio Connect, ready to be used.

Results from the Project

I learnt a lot from this project, such as how to create a production grade Shiny app, best practices for using Git, and R package building. Moreover, the app will help users gain knowledge and experience of using Git for version control, specifically Graduates and Placement Students.

Since finishing the project, I have continued to expand my knowledge of the Shiny ecosystem by exploring code profiling, load balancing and load testing. {shiny} is an excellent package, allowing for flexibility and creativity.

 

Blogs home Featured Image

This week I (Laura Swales, Community and Events Manager) spoke with Beth Ashlee and Owen Jones ahead of their EARL workshop – Package Development in R, which will take place on 7th September 2pm BST.

Hi Beth and Owen! Before we talk about what people can learn from the workshop, could you tell me a bit about yourselves?

Owen – As a Data Science Consultant,  I work on wide range of projects from a broad spectrum of industries. Most recently I’ve been working with the UK Government as part of the UK’s response to the Covid pandemic. I’m also the ValidR tech lead, which means that I do a lot of automated package testing too.

Beth – I’m a Senior Data Scientist and a Team Lead at Mango, so my role is very similar to Owen’s in that I work on a range of consultant projects. My most recent project was with a large retailer, working with them to upskill their Data Science team. Along with the consultancy work, I manage a team of Data Scientists at Mango.

You are both teaching the Package Development in R workshop at EARL – can you tell us a bit more about the workshop and what people can expect to learn?

Beth – They’re going to learn how to build a package in R (!), but more importantly, the reasons why that’s useful to do. We’ll talk about how it can make your code easier to maintain for others to use it and how to write good documentation…

Owen – There’s a lot of best practice incorporated in this workshop, in terms of how you are structuring the code you are writing and how you make it easy for yourself and others to contribute and maintain. Above all else, it’s about good practice, consistency, and code which other people can both use and look after.

Beth – All using the RStudio dev tools package!

How many EARL Conferences have you been to,  if you can remember?!

Owen- I think it’s 5 for me! Starting in 2017 with London, shortly followed by Boston EARL.

Beth – All of those for me as well, plus a San Fransico EARL in 2017 and two in Boston, so maybe 7 or 8 EARL’s!

I believe I’ve been to 8 now – the US Roadshow helped push up my EARL number quite quickly! Do you have any particularly fond memories or highlights?

Owen – I enjoyed the Shiny testing workshop that Beth and I collaborated on a few years back. We’ve both delivered workshops in the past that have always been really fun to work on.

Beth – Agreed! Other than those, I always enjoy the keynote talk because the talks are usually approachable to everyone – Jenny Bryant in Boston jumps out at me as being a great example of a keynote we’ve had. Outside of the actual conference, I and another colleague got booed off stage doing karaoke in America – which was definitely memorable!

Thank you both for talking to me – we are really looking forward to your workshop.

Beth and Owen’s workshop will run on the 7th of September and will be £90 for a half day. For more information on their workshop and to get your tickets, click here.

 

Blogs home Featured Image

EARL 2021 will start with a week of afternoon workshops, hosted by our expert Mango Solutions Data Scientists.

This morning I (Laura Swales, Community and Events Manager) caught up with Alejandro Rico who is hosting the ‘Introduction to Shiny’ workshop to find out more about him and what people can learn from his workshop.

Hi Alejandro! Could you tell us some background information about yourself and what you do at Ascent?

I started with maths and statistics and trying to use that knowledge somewhere in business. More often than not, the decisions I made for businesses were automated – normally when you are a Data Scientist you just process a lot of information and interpret this in some way. But often those decisions are quite simple – for example, if X number is higher than X threshold, then green light. You then normally end up automating all of that, once you automate that information then you end up probably building a nice looking UI – so the product team can already check those numbers and say: ‘ok, we’ve got the green light from the maths shenanigans!’ – so I specialised doing that, not so much on the statistics part of things, but on the automation and nice-looking UI part. This eventually became Shiny development, so I joined Ascent (Mango Solutions) exclusively as a Shiny developer.

During my time at Mango, I’ve been developing Shiny – for different companies and different Shiny applications, but all Shiny! This is still a broad term because sometimes you spend a lot of time on the UI part – like designing the UX flow and how the user should interact with the tool, and on some other occasions, you spend time on the data processing part – which is closer to what a pure Data Scientist would do, but it’s part of the job still. I also deliver training on Shiny as well.

You’re hosting the Introduction to Shiny workshop at EARL online this year – what can attendees expect to learn?

I want to explain some basics with this workshops – getting people up and started on Shiny. I also want to sell Shiny! By that I mean, showcasing what you can do with Shiny and what the applications can be used for. I hope that once people are convinced of the advantages of Shiny, then the chapter on how to build your first Shiny app will be even more exciting. So there are two parts – selling Shiny and getting started with simple applications using Shiny.

Why do you think people should use Shiny over other tools?

The main reasons are when you look at what tools or framework you want for developing stuff you usually have to choose between something that’s easy to use or something that is powerful. I believe Shiny is an interesting framework as it’s extremely simple and easy to use for those who might not know about web app development. You only need a basic knowledge of R to use it and to get started pretty quickly. For anyone who wants to get started on web application development, Shiny works because you can start quickly but know that you can invest more and build more complex apps over time.

What is your favourite thing about Shiny?

Really what I said before – it’s flexibility! You can just use R and not ever realise behind the scenes it is using HTML and CSS, but if you really want to (and I do) you can be specific and flex your javascript knowledge. Shiny can have lots of small widgets where you can embed your javascript code – and I think that’s really cool. It has helped me to do unique things in the web app development world.

Thank you Alejandro!

To find out more about Alejandro’s Introduction to Shiny workshop – please view here. The workshop will take place on Monday 6th September from 2 pm-5 pm UK time and will be £90. Profits from EARL online will be donated to DataKind UK.

EARL logo
Blogs home Featured Image

The final day of the Enterprise Applications of the R Language Conference will be a day full of presentations from speakers who use R at work. It will be a great opportunity to hear from a variety of industries and how R is helping them in enterprise.

This closing day of EARL will take place on Friday 10th September and will run all day on UK time. The tickets will be priced at £9.99 – as the event is online we have far fewer overheads, and we will be able to donate our profits to DataKind UK.

We are pleased to also announce that EARL has been DICE (Diversity and Inclusion at Conferences and Events) certified and approved. The DICE organisation aims to encourage event organisers to think more about who they put on stage and to improve representation at events.

Agenda highlights

Of course, we are excited about all the talks at this year’s EARL, but we’ve selected just a few to highlight on this blog.

Dr Jacqueline Nolis, Saturn Cloud – Keynote
Jacqueline will be our first announced keynote speaker – she is a data science leader with over 15 years of experience in managing data science teams and projects at companies including DSW and Airbnb. She is currently the Head of Data Science at Saturn Cloud, where she helps design products for data scientists. We can’t wait to hear from Jacqueline’s vast experience of being in the data science field.

Avision Ho, Mettle – Meeting citizens where they R
Avision’s talk sounds like an excellent example of using open-source software to improve citizen access to meaningful information. He will cover specific technical elements of building a robust CRAN package and a mobile-centric Shiny app, as well as good DevOps practices that facilitate standardised and easy collaboration.

Daniel Durling, Bank of England – We are not start-ups!
Daniel’s talk touches on a host of oft-experienced cultural/operational barriers to doing data science in larger organisations, such as making the switch to open-source software and dealing with legacy code. We are looking forward to hearing how Daniel navigated this in the Bank of England.

Adithi Upadhya, ILK Labs – Shiny apps for air quality data analysis – Introduction to network analysis
Adithi is a co-founder and co-organiser of R-Ladies Bangalore and currently works as a Geospatial Analyst at ILK Labs Bangalore. Adithi’s talk will focus on creating Shiny applications to analyse and visualise air-pollution data. With pollution and air quality as hugely topical subjects, it will be fascinating to hear how Adithi and her team are managing the vast amount of data and creating meaningful Shiny apps.

To see the rest of the agenda, please visit our EARL page here. Alongside the presentation day, we are also hosting four workshops from Monday-Thursday (6-9th September 2021): Introduction to Shiny, Package Development in R, Functional Programming with Purrr and Web Scraping and Text Mining Lyrics in R.