Best practice in data science can lead to long-lived business results. A structure that encourages repeatable processes for generating value from data, leads to a fully productive team working, allowing reproducible results, time and time again. When this process is ingrained across a company’s culture and the business and data teams are working together in harmony with the business goals, then the value of data can be realised into an overall centre of excellence and a shared language for best practice.
A shared language of best practice
Layers of operational best practice allow a standard practice to be adopted – ensuring the best possible outcome of your data science investment. For a data science team, best practices could relate to developing models or structuring analysis, quality standards or how a project is delivered. Alternatively, they could even align to the selection of your data and analysis tools, as these can easily impact the success of your project.
With data science teams coming from a diverse range of backgrounds and experiences, what may be obvious to one can be a novelty to another. A shared language of best practice allows collaborators to focus on the all-important value generated. A workflow that adheres to a best practice ensures quality, whether that be business value of insights to the accuracy of models. Best practices take the guess work out, minimise mistakes and create a platform for future success.
4 best practices every data delivery teams should focus on:
- Reproducibility – Whatever the task is. If your results can’t be repeated, then is it really done?
- Robustness – Results and quality of analysis can have a huge impact, ensuring your best practices that has checks and balances will lead to better quality
- Collaboration – What use are your results if they are difficult to share. Having standards for collaboration means business value can be attained
- Automation – It is very easy to do work with no automation, frameworks for automation can help accelerate teams
Best practice in Dockerisation
My talk at the Big Data London Meet Up ‘ How Docker can help you become more reproducible’, takes one element of best practice in data science, focusing on Dockerisation which is proving to be a powerful tool – one that is already turning established best practices in teams on its head. The tools allow teams to collaborate much easier, to be much more reproducible and automate workflows, in an impressive way. Yet, it has not had as much adoption within data science as it has within software engineering. My talk will explore just how Docker can super charge workflow and your valuable use cases.
This talk will be of interest to any data scientist who has had trouble with, deploying or working with engineering teams, reproducing colleagues’ analysis. It will also be of interest to anyone wanting to know how docker can scale a team, making it less intimidating and perfectly arming practitioners with the tools to give it a go.
I look forward to seeing you at Mango’s Big Data London, Meet Up, 22nd September 6-8pm, Olympia AI & MLOPS Theatre. You can sign up here
Kapil Patel is one of Mango’s Data Science Consultants.