Building High Performing Data Science Teams

Building High Performing Data Science Teams

Building High Performing Data Science Teams

The field of data science is no longer the new kid on the block it once was. “Data science” in its modern usage has been around since the early 2000s.
Many companies, of all sizes, now have data scientists trying to extract value from their data and help companies make better decisions.
What has changed is that, as a discipline, data science has matured. Many companies have gone from proof-of-concept mode to running multiple productionised machine learning models in a short span of time, with varying degrees of success.
A lot of the reason behind this success or failure is not having the most highly qualified data science talent, or having the most expensive, state-of-the-art tools and technology, but by organisational structure and team culture.
In this article, I outline some changes I’ve seen in this industry and how data science teams or even departments can best be structured for success.
Data Science is a Team Sport
Many companies have realised that hiring one, or a bunch, of data scientists and expecting instant results hasn’t worked. In order for data science teams to be most effective, a multidisciplinary team is essential. Data science is a team sport.
A team of data scientists alone is not a data science team. If a team only consists of data scientists, often these data scientists will cover the gaps of missing roles in the team, such as Project Manager, Data Engineer, Business Analyst, Tester or DevOps Engineer, all of which are roles in their own right.
The most effective teams have the right balance of skills and personalities to work together and develop great data products or deliver impactful insights to the business.
Hub & Spoke Model
Once an organisation has grown beyond having one data science team to many, the question arises around how the teams should be structured and interact.
There are many different operating models for data science and analytics departments, each with their own pros and cons.
These include:
  • Centralised — A large central data science team that serves all parts of the business, usually within an IT, technology or data department.
  • Federated — Smaller data science teams embedded within business units.
The advantages of a centralised department, are shared data and technology within the department, more specialised roles and improved knowledge sharing and internal capability development. However, this comes at the cost of lacking a deep understanding of the businesses needs and domain knowledge.
Federated teams are closer to the business units, and hence have a better understanding of their needs and more domain-specific knowledge.
However, the operating model which strikes the best balance between capability and proximity to the business is the “Hub & Spoke Model”.
The Hub
The hub is a central team that cover areas including:
  • Industrialised Services & Capabilities
  • Training & Onboarding
The hub can also be known as the:
  • Lab
  • Capability Centre
  • Centre of Excellence
The Spokes
The spokes are teams embedded in the business that cover areas including:
  • Analytical Briefs
  • Domain-Specific Services
The spokes can also be known as the:
  • Squads
  • Pods
  • Business Engagement Teams
Image by Author
Team Structure
As the work Hub and Spoke teams carry out differs, so do the roles within these teams.
Hub Team
Since Hub teams are developing robust, industrialised services (e.g. deploying machine learning pipelines). The possible roles in this team reflect this:
  • Product Owners define the vision for data products, manage the product roadmap, prioritise and anticipate customer needs. 🧑🏻💼
  • Scrum Masters run agile ceremonies, ensuring that goals and scope are understood by everyone on the team, aid in the estimation of tasks and removing impediments to the team’s progress. ⏱
  • Machine Learning Engineers train and deploy, scalable, productionised machine learning models to serve the business via APIs or applications. 💻
  • DevOps/MLOps Engineers support the deployment and monitoring of machine learning models into the company infrastructure and set-up CI/CD pipelines.⚙️
  • Business Analysts gather requirements from the customer, support in understanding business processes, rules and complex data mapping. 📋
  • Testers develop integration and system tests to ensure the robustness and quality of the data products. ✅
Spoke Team
Since Spoke teams are rapidly responding to strategic briefs from their business units, different role profiles are needed:
  • Team Managers engage with the business to understand their needs, help write and define analytical briefs and act as the interface between the business and the team.👩🏽‍💼
  • Data Scientists frame the business questions as machine learning problems and develop machine learning models make predictions and help drive decisions. They also carry out statistical analysis of available data to derive insights valuable to the business.🔬
  • Data Storytellers/Visualisation Experts create analytical reports, presentations and dashboards and present insights back to the business in a way they can understand to drive impact.📊
What is a High Performing Team?
Once data science teams and departments are set up, how can they be made to perform?
The concept of a “High Performing Team” (HPT) is not unique to data science, a lot has been written about them in management literature, however many of the learnings can be applied to data science specific problems.
High performing data science teams do the following:
  1. Forge a powerful sense of purpose and identity. They create a common performance challenge.
In practice — this can be achieved by creating a clear Product Roadmap & Team Charter.
2. Build an atmosphere of Trust and Cohesion — Develop relationships and ensure they can rely on each other.
In practice — this can be supported by organising one-to-one meetings, keeping the team size small and holding regular team building sessions.
3. Create a team where everyone is a leader — Everyone should be able to make decisions (and trust them)!
In practice — this can be supported by organising design forums, where team members can brainstorm ideas and decide on product designs.
4. Development — the desire to constantly get better — Challenge and provide constructive feedback to each other.
In practice — this can be achieved by having regular knowledge sharing sessions, pair programming or mentoring/buddy schemes.
5. Standards and Processes — Focus on the processes that will lead to the outcome rather than the outcome itself.
In practice — this can be achieved by agreeing and documenting standards and processes within the team, such as peer review processes, code quality standards and testing strategies.
6. People are made to feel welcome — In order for people to perform; they have to be made to feel part of the team and welcomed.
In practice — this can be achieved by having onboarding sessions for new joiners; setting up a buddying system between senior and junior data scientists and ensuring a positive and inclusive working culture.
Companies are investing an ever-growing amount of time and money into data science and artificial intelligence. In order to make the most of this investment, it is worth structuring the organisation and establishing the ways of working to make sure data science teams are acting as High Performing Teams.
Author: Jon Howells