KISS (Keep It Simple Stupid) in Data Science

Helen Tanner, from Data3, gave an interesting presentation on the value of making simplicity the priority when choosing metrics and models. Data3’s mission statement is to help businesses get more value from their data. Helen recalled two examples where Data3 opted for metrics that were easy to understand metrics rather than the complex alternatives proposed by academic papers. These simple metrics were quickly adopted by business leaders.

Helen also described the problems of explaining unsupervised models to key business stakeholders. Being complex and unsupervised, these models seem like “black boxes” to businesses.  Helen then revealed two cases where Data3 opted to use a Decision Tree supervised learning model over better performing unsupervised learning models. Data 3 chose to use decision tree models as their mechanics rely on thresholds, which can readily be translated into tree-diagrams and bar-charts to help business stakeholders understand how the model works.

Voice Search – The Stats behind the Hype

Kevin Mason, Strategy Director at Proctor + Stevenson, presented his analysis on the value of investing in voice search. Kevin listed many examples of the hype around voice search as the next big trend in consumer technology. The focus of Kevin’s analysis was Google’s claim that “20% of all searches on mobile devices now use voice search”.  Kevin outlined his reasons for being sceptical of this claim.

Building on the work done by Will Critchlow (CEO of Distilled), Kevin broke Google’s “voice search” into four categories – control actions (“call mum”) , informal repeated queries (“will it rain today?”) , personal searches (“play my wedding video”) and real search (“where is the best Pizza in my area?”).

Real searches only account for only 19% of all of Google’s claimed voice searches, leaving actual voice searches on mobile devices at 4%. With 96% of search coming through text, Kevin now advises most businesses that voice search optimization is not a good return on investment. However, Kevin also revealed the bias in voice search towards local businesses and how they could benefit from investing in search optimization.

Approaches to address matching 

Nigel Legg, of Knowtext, gave us a fascinating insight into the problems faced by the UK Government when trying to match addresses to people. Nigel showed us an example of the same address presented in five different ways.

Nigel then presented three different approaches that his team had tried to match addresses with more accuracy. The outcome of his team’s analysis was that a Complex SQL model, produced by Dr Hufton at the Department for Community and Local Government, performed the best compared to a Conditional Random Field Machine Learning model and the commonly used Levenshtein distance algorithm. His team are now testing the SQL model’s performance at scale and resilience to new types of addresses.

It was a great evening followed by networking and free drinks – if you’d like to join us at the next Bristol DS meet up, visit our site for more information and to also view the slides from the meetup.