Why should we try and use data better?
The short answer is because we can make money or save money using the insights we get from analysing data.
But a more useful answer might be because it helps us see more clearly.
That’s very Zen.
Zen teaches that we normally see the world through a haze – a fog made up of the assumptions, preconceptions and fixed ideas – that is created by our desires.
For example, many people are so involved in the way they do something that they fail to see that the something they do is no longer relevant. Innovation simply passes them by.
Or, if we really want something to happen or believe that an approach is right, we become blind to anything that contradicts that point of view.
The way to see clearly is to look at the world as it is.
This is pragmatic advice, but how do we go about doing it?
The Cross Industry Standard Process for Data Mining (CRISP-DM) is now over 20 years old – but still useful.
There is data everywhere these days, leaking out of sensors, meters, surveys, analysis and social media.
We could analyse everything, but we should probably start with an understanding of the business.
What is special about this particular industry, what is the competition doing, how do customers act and what would the business like to do?
We can then stick our heads in the fog and try to understand the data that’s available – going back to the business to ask questions or get some more.
Is it clean or patchy? Do we have lots of numbers or is it full of words? Is it in an open format or do we need to get it out of databases or proprietary formats?
For example, the way we treat energy metering data is different from twitter comment mining or newsfeed monitoring
A huge amount of work then often goes into data preparation.
The data needs to be bashed and cut and manipulated into a form that we can work with.
All too often, the data preparation takes so long and ends up in another messy form that there is little time left to do any modelling.
But – getting to the right kind of data structures can make the task of modelling much easier.
Models help us create and test theories.
Do we think there is a relationship between variables or a robust way to identify problems?
Then its time for evaluation.
We could do an infinite number of analyses, but we need to focus on the ones that are aligned with what the business needs to know.
Are we providing useful insights that can be used to make the business more effective?
If so, then we can head towards deployment.
There is no point doing analysis once and then forgetting about it.
The law of entropy – things decay over time – means that without management things will simply drift and become worse again.
Ideally, we’d automate the analysis and reports and let people know when things are going wrong so that they can get involved and fix things quickly.
The CRISP-DM is a process we can follow to create a data mining project.
The real value comes from the understanding we get when we see what is actually going on.