Blogs home Featured Image

Today we caught up with Andrew Little and Daniel de Bortoli who will be teaching the ‘Web Scraping and Text Mining Lyrics in R‘ workshop at EARL on the 9th of September.  We spoke about their careers and lives at Mango and what their dream workshop would look like…

Hi both! Thank you for talking to me today, I’d love to know some more details about your lives at Mango and how your career has led you to this point?

Daniel – I’ve been a Data Scientist at Mango for around a year, prior to Mango I was a researcher in the mechanical engineering sector until deciding that I wanted to move away from academia, and data science always seemed like an exciting and interesting area. I was interested in solving more practical problems, but still having intellectually challenging and interesting work. Since joining Mango, I’ve done lots of consulting and training projects and more recently I’ve been working on a revenue optimisation project.

Andrew – I’m a Junior Data Scientist at Mango – I’ve been here for two years now. I came straight from university so this is the first step in my career, I believe I was part of the first graduate intake that Mango did. I’ve done lots of training in my time here and a decent amount of consultancy projects too. Currently, I’m doing lots of work on helping teams move from one programming language to R, so I’m showing them the best way to work in R.

You’re hosting a workshop on web scraping, why do you think this is a useful skill for Data Scientists?

Andrew – There are many situations where you have data given to you and it’s just available, but perhaps when you try and do something more exciting or nuanced often there is no data and you need to get that yourself. So it’s a way that you can use the ridiculous vastness of the internet to get freely available data that you can then use.

Daniel – No matter what you’re interested in, if it is on the internet you can collect your own data. That’s the power of it.

What can people expect to leave the workshop knowing?

Daniel – Having a full end-to-end view of a modern workflow, so working with text data in R. From collecting – web scraping – to processing it, cleaning it, and looking at many of the common modeling approaches or common tasks with text data. All the way to generating some interesting outputs. We chose lyrics to work with, as they’re not too niche and most people are interested in music!

What would your dream workshop be?

Andrew – Personally for me there are two main areas I’m interested in, one is the workshop we’re actually teaching – so that’s good!

Daniel – So you’re saying we created your dream workshop?

Andrew – You’re right actually, yes! So either this or I’m quite interested in computer vision as well. That’s something you don’t see too much on, as it’s quite a new area. Anything that is seen as cutting edge I am interested in and not just standard statistical analysis. For example when you’re using data to build AI that actually feels like AI – so it’s doing something like a human would do, like reading text documents or processing images.

Daniel – Preparing this workshop, I realised I’d never actually worked with audio data. We’ve been referring to data that Spotify has, on things like, how upbeat a song is – so that’s audio analysis. That would be really interesting and uncommon to look into.

Thank you both!

If you’d like to find out more about the EARL Conference and the other workshops we have available, please visit here.

fraud in finance
Blogs home Featured Image

Attending the recent Transform Finance ‘Virtual Fraud in Financial Services’ online event was a great opportunity to share experiences and knowledge with experts from across a highly innovative industry.

The event took in a range of critical themes, including our session with Cifas focusing on ‘Using Advanced Analytics To Detect, Prevent And Tackle Fraud’. Reflecting on the experience overall, there were a number of key takeaways that illustrate both the challenges and opportunities facing an industry increasingly reliant on technology to deliver effective services while tackling some major obstacles to success:

  1. The Changing Fraud Landscape

Many organisations are investing heavily in their use of data and technology to positively prevent and detect crime and fraud. This is essential as the emerging trends are worrying – The 2021 Identity Fraud Study revealed the true scale of identity fraud scams to consumers and businesses alike. While total combined fraud losses climbed to $56 billion in 2020, identity fraud scams accounted for $43 billion of that cost with ‘traditional’ identity fraud losses totalling $13 billion.

With fraudsters becoming more sophisticated, it is essential that organisations across the diverse finance industry constantly adapt to change with technology and application of data.

  1. Innovation And Cooperation

In a changing world, it’s clear that financial institutions understand the need for greater collaboration and data sharing across the industry. There is growing recognition that a more joined up approach to issues such as fraud detection and prevention is key to delivering effective outcomes. As an example, cooperation between ‘core’ public and private finance organisations with third party organisations, including regulators, mobile providers and social media companies is essential best practice when it comes to reducing the incidences and impact of fraud.

This extends to the wider implementation of technologies such as digital signatures and document sharing through digital channels and the development of digital currency for financial inclusion. In all these emerging areas, working together is key, while the application and proactive use of data, good management, metrics and governance can help ensure success and make sure that industry-wide goals in relation to fraud are clear.

  1. Tackling Major Fraud Trends

As part of our contribution towards the Transform Finance event, our Chief Data Scientist, Rich Pugh and Sandra Peaston, Director of Research & Development at Mango customer, Cifas, joined a panel session to discuss the use of data and intelligence to support fraud prevention.

Among the areas discussed, the panel agreed that it is essential that the industry operates from a level playing field and by sharing intelligence via organisations such as Cifas. In doing so, finance businesses across the sector are much better placed to adapt to new trends. For instance, an approach called ‘transfer learning’ can enable small organisations to benefit from the insight and data generated by larger businesses to react more quickly and effectively. There should also be ongoing efforts to improve customer education and awareness, reinforcing the idea that individuals always need to remain vigilant.

Looking to the future, data and technology will play an increasing role in combating and detecting crime. By maximising collaboration between corporates, banks, regulators and other key stakeholders, the industry will move towards a scenario where fraud detection happens in real time to minimise risk and loss.

To achieve these goals, organisations must focus on improving their capabilities, modernising fraud operations and maximising the technical competency of their teams. Those that do will be ideally placed to play a full and effective role in tackling fraud while gearing for growth.

For more information on how Mango is supporting working and detecting financial crime, read our Cifas case study.

 

EARL Purrr workshop
Blogs home Featured Image

This week we spoke to Xinye Li, Head of Data Science at Mango Solutions, to talk about his career thus far and what people can expect to learn at the ‘Functional Programming with Purrr’ EARL workshop.

Hi Xinye, thanks for joining me today. Could you tell me about your career and what you do at Mango currently?

I joined the Mango/Ascent 3 months ago as Head of Data Science, before that I had been a full-time Data Scientist with pretty hands-on responsibility – in coding, delivering results for both client-side business as well as agency. That means I have gained quite a wide range of experience with problems businesses can face – and dealing with the different stages of data maturity and how businesses go from coming up with questions and using data to solve those problems. At the core, I am a passionate Data Scientist who is always happy to write code and will always be fascinated by new developments in Data Science.

You are teaching the ‘Functional programming with Purrr‘ workshop at EARL online – could you let us know a bit more about the workshop and what people could expect to learn?

Talking of new developments, in the R language especially, I think the functional programming aspect of R is being talked about a lot more now. It has always been its core strength – for example, many base R maths symbols are actually sugar coating of functional implementations in the back end without people realising it.

Some of the latest best practices in coding in R such as Shiny Modules have surfaced the need to understand functional programming. With companies such as RStudio and their open-source contributions, this has been made a lot more fun and easier to practice functional programming. So the aim of the workshop is to introduce the idea of functional programming, demonstrate how to implement that with R, and provide some useful tips to write good functions. As an example, purrr is a package designed to work with the functional programming paradigm in R – the code is much easier to manage and it’s easy to convert the code to massively parallel processing with minimal effort.

Thanks, Xinye! We can’t wait for your workshop.

The Enterprise Applications of the R Language Conference is back online in 2021 from the 6th-10th of September. Tickets are now on sale for the four workshops and the final day of presentations on using R in enterprise.

Every year we train thousands of people worldwide from a range of backgrounds and industries, in face-to-face and virtual classrooms. Our instructors have extensive subject matter experience and real-world application knowledge. Please get in touch if you’d like to find out more about our training services.

 

 

Blogs home Featured Image

The next edition of LondonR online will take place on Tuesday the 27th of July from 4pm (BST). Tickets are free for attendees and you can register here for a place!

This next LondonR will have three presenters all sharing their work on using R stats in the real world.

The first presenter joining us is Paulito Palmes from the IBM Dublin Research Lab, he will be presenting on ‘JuliaR for data science and machine learning workflow’. Paulito is a research scientist at the IBM Research Europe (Dublin Research Lab) working in the areas of analytics, datamining, machine learning, reinforcement learning, automated decisions, and AI.

Next on the agenda at LondonR will be one of the Mango Solutions team, Elizabeth Brown. Elizabeth joined Mango as a professional placement student, with a key interest in Data Science and Shiny app development. Elizabeth will be presenting on ‘Creating a Shiny Dashboard as a Tool for Learning Git’. A best practice in Data Science is using Git for version control, something which isn’t introduced until working in industry. Thus, Elizabeth has developed a Shiny dashboard as a tool for beginners to learn how to use Git. In her presentation, she will give a demo of the app and how she has used {golem} and {shinyjs} in its development.

The final presenter place at LondonR will be announced shortly. We hope to see you there!

 

financial fraud
Blogs home Featured Image

The recent Transform Finance ‘Virtual Fraud in Financial Services’ event offered some fascinating insight into the risks facing the sector and how organisations are investing in advanced technologies to detect, prevent and tackle fraud.

An important and recurring theme was the role of data analytics in meeting the challenges presented by rising levels of fraud and the increasing sophistication of fraudsters. Bringing insight and experience to life for attendees, Mango Chief Data Scientist, Rich Pugh, was joined by Sandra Peaston, Director of Research & Development at Cifas to discuss their use of data and intelligence to support fraud prevention.

Cifas is the UK’s leading fraud prevention service, managing the largest database of instances of fraudulent conduct in the country. Its members are organisations from all sectors, sharing their data across to reduce instances of fraud and financial crime.

As a data-centric organisation, Cifas wanted to develop deeper insight into emerging fraud trends, understand which were the most significant and then quickly share that information with its members for further action.

Getting ahead of the game was key, and as Sandra Peaston described, “We wanted to use our data to speed up the early-stage intelligence process so our members didn’t need to report trends to us. Unlocking the power of the data we already hold was the challenge that took us to Mango.”

Having been approached by Cifas, Mango quickly deployed a team of data scientists to establish the right technical environment. As Rich Pugh explained, “The Cifas team has amassed some incredible data assets, but with many areas of potential focus the key question was: where could we deliver quick impacts against their priorities?”

The Mango project team focused on two core areas. The first was a ‘Match’ project, built to reduce false positive rates and improve the Cifas rules engine. This was supported via the development of a probabilistic matching engine prototype, designed to improve the existing matching and reduce member friction.

The second part of the solution was an ‘Intelligence’ project. This focused on the development of a fuzzy search capability and a signal detection tool to automate the previous manual fraud detection processes to uncover hidden and emerging fraud patterns. This insight would then be used to enrich intelligence and feedback to members.

As Sandra explained to event attendees, “We needed an intelligent way of dynamically identifying an emerging fraud trend, and key to this was the speed at which this happens. By working with Mango to uncover the huge power that sits within our data to a level of granularity that we couldn’t manage before, we can help members to prioritise and make them more efficient.”

Together, Cifas and Mango have deployed a best-practice framework using intelligence tools that demonstrably reveal hidden patterns that human beings would struggle to detect. Looking ahead, the teams will continue to innovate and use data science to unlock insight relating to fraud and e-crime, refining algorithms over time to become even more effective in countering criminal activity and finding ways to stay ahead of malicious actors.

RStudio Managed Service
Blogs home Featured Image

Author: Rich Adams, RStudio Partner Manager

Free webinar: How to successfully manage your R environment – the RStudio managed service platform (22nd July @ 4PM BST)

In a free session on Thursday 22nd July, we’ll be discussing how data science teams can confidently and securely collaborate with large data sets in R, supported  with the right expertise where capacity or skills may otherwise be lacking internally.

With guest speakers Lou Bajuk, Director of Product Marketing, RStudio and Will Yuill, Principal Public Health Analyst, Hertfordshire County Council, we’ll explore how data science teams can develop a best practice managed service production environment and achieve maximum return on investment from their data science cloud platform. Register here

What’s the webinar about?

 As a language, R can come with restrictions when it comes to the implementation and necessary technical know-how of installing, configuring, and supporting a centralised platform for maximum adoption.

Many teams lack the required support from IT or the necessary knowledge that makes an environment suitable for future scalability. This can impact a team in their ability to manage large data sets, collaborate with ease and often means a duplication of effort.

This webinar focuses on how to develop a best practice production environment, ensuring technical excellence and maximum return on investment from your data science platform.

Also under discussion is:

  • How to effectively reduce barriers to scaling your R environment through a ‘RStudio Managed service’
  • How Hertfordshire County Council overcame their barriers through the extra pressure of Covid-19 through a managed Services platform

Why is it important?

As we have seen from this year, scaling of data science teams and investment in data-driven strategies is even more crucial than ever.

If like Hertfordshire County Council your team has seen a rapid development, yet you lack the internal expertise and resources to support an RStudio environment – a managed services platform may be the secure, compliant and effective cloud environment that can be up and running effectively almost immediately.  This expert Managed Service removes the need for specialist in-house IT expertise and guarantees a service level agreement to meet your requirements in terms of configuration, maintenance, and system updates.

Can you join us on 22th July, 4pm to learn more?

The Public Health Evidence & Intelligence Team at Hertfordshire Country Council will discuss why this is already providing an effective solution for them.

Register for the webinar here 

Blogs home Featured Image

Stewart Smythe at Ascent asks whether greater mutual understanding is needed to underpin the technology industry

In a commercial context, empathy is a theme that relates to a whole range of interactions from leadership to customer service and experience design, but in the technology services industry, it’s something of a left-field concept that is often overwhelmed by a focus on the solution. Tech companies everywhere talk about ‘understanding’ their customers, but how many invest more deeply than that?

Arguably, and particularly in the shadow of universal adversity that Covid cast over the last year, humanising the relentless digital challenge has become even more relevant – and critical.

Making empathy the cornerstone of a technology proposition allows organisations to create space for a different kind of customer relationship – grown up, resilient, flexible, commercially agile, constantly mindful of its fundamental aims. But how have some tech organisations arrived at this point when others seem to view the last 12 months as little more than a rollercoaster ride that will soon return them to a safe and familiar starting point?

Undoubtedly, the pandemic has hastened the pace of change across the technology landscape, but in much broader and more nuanced ways than the headlines about remote working and rapid product development would have us believe.

Some technology businesses have taken the opportunity to rethink their approach to customer relationships and have learnt that working closely together in exceptional circumstances brings out the best in everyone. In contrast, others have had to be more tactical to protect revenues and sustain momentum, employing short term promotions and heavy discounting to keep spending going.

In my business, we allowed empathy to guide the investments and changes we made in our business model, and we made different and longer-range choices based on what Covid took from customer strategies: certainty and confidence.
We did three fundamental things as part of this approach.
We wrote new contracts that put a proportion of our revenues at risk against our customer commitments to make it easier for customers to make technology investments during uncertain times.

We recut our delivery phasing to get value into the hands of customers even earlier, helping them build confidence in the relationship and see a return more quickly.

We adjusted our resourcing approach and made some new hires to ensure we were in a position to guarantee resource continuity with customers across longer programmes, sustaining existing team dynamics and maintaining consistency and predictability.

Building empathy into the business model is an approach that gives technology partners the opportunity to fully live up to their core values, taking on the weight of customer responsibility and really understanding their pressures and drivers. Any good business builds meaningful relationships with customers – but going a step further and working with customers to define success, and failure, on their own terms is an opportunity to contribute more value.

In short, mutual understanding, or empathy, is not just a way for technology partners to behave in an emergency, it’s an exceptionally effective business model for the long term. Those who have seen the best from adaptable technology partners are sharing their experiences as the model for future relationships and, for them, there’s no going back.

Published in: Business Reporter

Blogs home Featured Image

Towards the start of my placement, I was introduced to Shiny apps: {shiny} is an R package which allows users to create applications directly within R. I knew that this was something I wanted to learn more about. Additionally, I was introduced to Git, a version control system which is a best practice within data science and software development. As a result, I started a personal project with the aim of creating a Shiny app as a tool for learning Git, its main target audience being new Graduates and Placement Students.

The project had two main goals: the first to create an app for learning how to use Git locally; the second to expand on this to include remote Git.

Goal 1: Local Git

To create the app as an R package, I decided to use the framework provided by the R package {golem}. Using this framework had many advantages including keeping track of dependencies and easy app modularisation. I used several other packages when creating the app, including {shinydashboard} to allow for a dashboard layout. The code for each page of the dashboard is contained within its own module, which means that the code is well organised and easy to read.

One challenge I faced was the development of a few UI features. Most of these were solved by using the package {shinyjs} which allows users to improve Shiny apps using JavaScript. I used this to hide and disable relevant action buttons and when creating a bottom navigation bar. This navigation bar is used to move between the pages of the dashboard. This proved difficult, but with the help of the open-source community I was able to resolve the issue, creating a key feature of the app’s UI.

After developing the first stage of the app, I gave a demonstration to data science colleagues who gave positive feedback, with ideas for the future development of the app. Once the first version of the app was complete, it was time to test it. I used the package {shinytest} to automate the testing via a snapshot-based testing strategy. Once the tests passed, I finally deployed the app using RStudio Connect, which allows users to access it via a URL. I also deployed it via shinyapps.io.

Following the completion of the first version of the app, I gave a presentation at the BarcelonaR conference, demonstrating the app and the code behind it. The code for this version can be found on GitHub along with smaller example apps.

Goal 2: Remote Git

The next, and major release, of the app continues the materials for learning Git both locally and remotely. It also includes a project designed to introduce the user to the concept of an Agile framework, as well as a practical scenario for using Git.

Moreover, I created a help page and took the opportunity to learn how to send emails from within a Shiny app. This was a successful learning exercise, however the main challenge that followed this was its maintenance. Eventually I removed this feature and instead created a help page that contains a number of Git references.

Towards the end of the development, I gave another demonstration to data science colleagues, receiving positive feedback. The app also received direct user testing from a new Graduate. Once some changes were made based on this feedback, I tested the app again using {shinytest}. Finally, I deployed this via RStudio Connect, ready to be used.

Results from the Project

I learnt a lot from this project, such as how to create a production grade Shiny app, best practices for using Git, and R package building. Moreover, the app will help users gain knowledge and experience of using Git for version control, specifically Graduates and Placement Students.

Since finishing the project, I have continued to expand my knowledge of the Shiny ecosystem by exploring code profiling, load balancing and load testing. {shiny} is an excellent package, allowing for flexibility and creativity.

 

Blogs home Featured Image

This week I (Laura Swales, Community and Events Manager) spoke with Beth Ashlee and Owen Jones ahead of their EARL workshop – Package Development in R, which will take place on 7th September 2pm BST.

Hi Beth and Owen! Before we talk about what people can learn from the workshop, could you tell me a bit about yourselves?

Owen – As a Data Science Consultant,  I work on wide range of projects from a broad spectrum of industries. Most recently I’ve been working with the UK Government as part of the UK’s response to the Covid pandemic. I’m also the ValidR tech lead, which means that I do a lot of automated package testing too.

Beth – I’m a Senior Data Scientist and a Team Lead at Mango, so my role is very similar to Owen’s in that I work on a range of consultant projects. My most recent project was with a large retailer, working with them to upskill their Data Science team. Along with the consultancy work, I manage a team of Data Scientists at Mango.

You are both teaching the Package Development in R workshop at EARL – can you tell us a bit more about the workshop and what people can expect to learn?

Beth – They’re going to learn how to build a package in R (!), but more importantly, the reasons why that’s useful to do. We’ll talk about how it can make your code easier to maintain for others to use it and how to write good documentation…

Owen – There’s a lot of best practice incorporated in this workshop, in terms of how you are structuring the code you are writing and how you make it easy for yourself and others to contribute and maintain. Above all else, it’s about good practice, consistency, and code which other people can both use and look after.

Beth – All using the RStudio dev tools package!

How many EARL Conferences have you been to,  if you can remember?!

Owen- I think it’s 5 for me! Starting in 2017 with London, shortly followed by Boston EARL.

Beth – All of those for me as well, plus a San Fransico EARL in 2017 and two in Boston, so maybe 7 or 8 EARL’s!

I believe I’ve been to 8 now – the US Roadshow helped push up my EARL number quite quickly! Do you have any particularly fond memories or highlights?

Owen – I enjoyed the Shiny testing workshop that Beth and I collaborated on a few years back. We’ve both delivered workshops in the past that have always been really fun to work on.

Beth – Agreed! Other than those, I always enjoy the keynote talk because the talks are usually approachable to everyone – Jenny Bryant in Boston jumps out at me as being a great example of a keynote we’ve had. Outside of the actual conference, I and another colleague got booed off stage doing karaoke in America – which was definitely memorable!

Thank you both for talking to me – we are really looking forward to your workshop.

Beth and Owen’s workshop will run on the 7th of September and will be £90 for a half day. For more information on their workshop and to get your tickets, click here.