The Analytics Wild West and the Data Cowboy

As the business realises the potential value they have in their data, there is a rush to exploit it. Analytics teams are growing rapidly, with scattered resources across the organisation getting involved.

We are seeing organisations with many hundreds of analysts consuming business intelligence reports from many hundreds of engineers creating new data sets and reports.

Although this is an exciting new world, new perils await us. As important decisions are made on the back of this new intelligence, we need to question the validity of the data.

We need to question the quality of the original data, the transformations that it has been through, the appropriateness and context of the original data, and the meaning of the data points throughout.

For example, a figure for the UK market’s cumulative risk score – how was it calculated and from which data points?

We are also all painfully aware of the rules and regulations that now govern data and the penalties for improper use, so we need to question the entitlement to that data.

In the data warehouse, we see engineers wrangling new data sets, producing new layers of data, without control. What is this data, what does it mean, where did it come from, what happened to it along the way, was the original source of good quality?

We need to make sure we ask these questions and ensure our data cowboys are white hat good guys and not outlaws.

Governance Framework for Analytics

A governance framework is required for analytics.

For our operational systems we have, for a long time, governed change with the data architects and data modelers being at the core. Producing conceptual, logical and physical models of our databases to translate business driven requirements to designs.

As the business and the systems team require change to the data layer, we have processes that ensure risk is reduced and efficiency maintained. We can leverage much of this experience in our analytics world.

A typical data governance initiative consists of two layers: an operating model and an information model.

The operating model defines how the people will participate in processes that we put in place to ensure our valuable data assets are handled correctly. For example, for our analytics team, this will define how new operational data assets are registered so that they can be available to the analytics world.

Also, processes for classifying and approving those assets for use in analytics, and how requests for data will be approved and delivered. How will the creation of data sets within the warehouse be handled?

Then there is the information model which comprises three parts: the Business Glossary, the Data Dictionary and the Data Usage Catalog.

The Business Glossary contains the language of the organisation as a collection of business terms. Each term is classified to show important information such as approved definition, a data owner, security classifications and regulatory requirements.

The Data Dictionary is a catalog of data assets, nicely classified against terms from the Business Glossary, with the quality of the data tested and the asset described in terms of context, provenance, ownership, and entitlements to that information.

The data assets include operational databases, file-based data sources, IOT data feeds through the data warehouse or lake to the business intelligence (BI) layer, cataloguing the BI reports.

The next part of story is to build an understanding of the data movements between these data assets. Most organisations have a range of technologies performing data movements including ETL/ELT tools, procedural code and database stored procedures.

To be able to track issues at the BI layer or identify impact of change throughout the landscape having a map of these movements is vital.

Lastly, the Data Usage Catalog is required. We need to understand the systems and people that interact with our data. If we are looking at Master Data Management initiatives, it is vital to have an understanding of key systems and which data they are the System of Record for.

We can build a subset of critical data from the Business Glossary and use this to map to systems. This knowledge also helps the analytics team when choosing data sources for data analysis.

So, as we recognise and attempt to realise the value in our growing data and our data consumers rush to the hills, we need to ensure that our data assets are well understood and well governed.

We need people and processes in place to protect those assets and tools to help us contain a model of the assets that supports these processes.

Let us make these new frontiers safe with erwin Mapping Manage and not the Wild West that they could be. View our erwin Mapping Manager Demo under the Videos section of the website, here. Or, Contact us for a personal demonstration.