#2 Let’s map your data sources

This blog is one of an eight-part series of blogs – read our introduction to see how this blog fits into the series.

Every business has multiple data sources which are often disconnected and disparate. Many businesses have 20+ different data sources, and some businesses have many more. Therefore, businesses often feel like they have islands of data and no ability to see a single source of truth. So, the concept of a single customer view, across all data sources, feels illusive.

What type of data?

There are many types of data that could be used within a business – think about:

  1. Financial data – revenue, costs, profit, tax, payments, subscriptions, late payments/unpaid, etc
  2. Customer data – name, address, contacts, emails, telephones, sector, etc
  3. Sales data – purchases, value, date, frequency, product, type, etc
  4. Marketing data – responses, engagement, information requests, surveys, etc
  5. Social media data – posts, engagement, comments, feedback, etc
  6. Website data – visitors, behaviours, downloads, views, registrations, etc
  7. Service data – service requests, complaints, feedback, etc
  8. Call centre data – calls, completions, times, etc
  9. IT data – eg downtime

And there could be much more, depending on the business, or sector, you’re working in.

The data sources could either be:

  1. Original data owned and managed by your business
  2. Data stored on a third-party platform used by the business – for instance, a finance tool like Xero, a CRM tool like HubSpot, an ecommerce tool, and so on
  3. Data provided to your business from your customers for you to serve them – for instance, their business information
  4. Publicly available data – for instance, government figures on population and demographics

There is so much data stored within even small businesses – make sure you go wide at this stage and think about every possible data source available both within your business and from external sources.

We need to find out as much as we can about the data, and this could include:

  • Purpose – what is the data used for?
  • Type – is the data confidential, commercial, personal, private, or public?
  • Location – where is it held, hosted, or stored?
  • Format – what type of format is used to store the data? eg database vs Excel 
  • Size – what volumes of data are there?
  • Frequency of updates – how often is it refreshed?
  • Accessibility – can the data be extracted from the source via an automated data extract?
  • Ownership – who is accountable for the data source?

And there could be other areas to explore depending on the individual data sources.

How is the data connected?

We need to know how data is or could be, connected. For instance, multiple data sources could contain information on customers – so if there is a unique identifier or a customer reference number, this can be used to connect all customer data together.

So, the questions we need to ask here are:

  • What unique identifiers have been created for key information like customers? 
    • Where are they created?
    • How are the unique identifiers maintained/controlled/deleted?
  • How is data matched together today?
    • If not through unique identifiers, is it using something like customer name or email address or postcode?
  • Which data source is the ‘master’ – this is the data source that should be trusted if there is ever a duplication or conflict when matching data across different data sources
    • What are the business rules or business logic for matching data?

Often, we find businesses don’t have clear and consistent use of unique identifiers and master data – and that’s OK. We need to understand the gaps so we can recommend the right solutions to implement in future.

This image has an empty alt attribute; its file name is Step-5-Focus-on-turning-insights-into-actions-that-drive-growth-13.png

Holding high volumes of data is great if it’s accurate and valuable. We often run a data HealthCheck to assess the quality of data being held. We use four criteria:

  1. Data completeness – what percentage of data is available in each data field? eg how many customers have included their postcode? for how many customers, do we have their sector listed?
  2. Data uniqueness – how many duplicate entries do we have? eg the same customer or the same contact
  3. Data timeliness – when was the data last updated? eg is it now out-of-date?
  4. Data validity – have we captured the data in the right format for easy comparisons? eg the date fields are consistently DD/MM/YY

These checks can influence our view on two things:

  • The value of the data for analysis – for instance, if the data is complete, unique, and timely, we will feel more confident in the ability to derive value from it
  • The resource requirements for analysis – for instance, if the data is not valid or complete, there will be a greater need for data cleansing work

What are the data skills in house?

At this stage, we often run an internal data skills assessment, so we understand the internal capability to develop, manage, and support the data in-house.

To do this, we create a grid with two axes:

  1. Skills – depending on the business, and the project, this could include:
    • Spreadsheets – we often find high usage of Excel/Google spreadsheets so we start here
    • Data visualisation tools – if the business is using tools in-house already, we track the skills level for each tool eg Power BI, Tableau
    • Data engineering tools – if the business is using tools in-house already, we track the skills level for each tool eg ETL tools, data warehouses
    • Coding or programming – the most popular ones are Python, SQL, and R
    • And similar
  2. Capability  – we create a scale from 0-10 where:
  • 0 – means no experience at all
    • 1-2 – means low/rare usage
    • 3-5 – means some usage but not a confident user
    • 6-7 – means confident user level…this person knows everything required for everyday usage
    • 8-9 – means expert user level…this person knows all the best practice and is an internal champion
    • 10 – means ninja level…this person could teach it to anyone

We usually ask team members to do a simple self-assessment, which can then be verified by a manager. This skills assessment informs our recommendations on technology stack selection – for instance, if you have skills in-house, this will make traction and engagement faster.

Now that you understand the business requirements and the data sources, you need to explore the data outputs in the form of reports, analytics, and dashboards. So, check out our next blog in this series for some simple tips on how to review the needs for reports, analytics, dashboards, data science and other visualisations.

Do you need an independent, objective review of your data sources?

Well, you’re in the right place. We can run the Discovery & Design programme for your business. The benefits of outsourcing to us are:

  1. OBJECTIVITY – we bring a fresh pair of eyes to your business and we’re unhindered by office politics, historical decisions, and legacy systems
  2. INDEPENDENCE – we’re technology-agnostic, so we can give you an independent view, with no vested interest in you selecting, or staying with, a certain vendor, tool, or platform
  3. AWARD-WINNING DATA CONSULTANTS – we’ve done this before…for 75+ projects and for 50+ businesses, so we can bring our wider experience to the mix

When we run a Discovery & Design programme for one of our clients, it typically takes 4 weeks, depending on the scope of the project. Most businesses want results quickly and simply…so that’s what we do – we worry about the complexity, so you don’t have to.

Find out more at https://data-cubed.co.uk/services/.