We live in a world where we collect increasing amounts of data – but how many of us do anything with it?
At one extreme, we might do nothing at all, missing out on insights that could make a difference to the way in which we live and work.
At the other, we could create very detailed and sophisticated analyses that either no one understands or work under such specific conditions that they are not terribly useful.
The data out there includes information on customer behaviour, sales activity, operational production, energy use, waste generation – the list seems endless.
So, what can we do about improving the way we go about analysing the data?
We can start by describing the data – finding out more about its shape and characteristics.
How large is the data set? What is the average value? What does the distribution look like? Are there any outliers?
A large number of analyses stop here and go no further.
But what we should do next is explore the data. This means that we look for relationships between variables.
How does one variable correlate with another, or change over time?
For example, a classic use case is to look at energy consumption in relation to the outside air temperature.
Google correlate is an interesting tool that lets us see what kinds of search patterns match real world data.
The warning, as always, is correlation is not causation.
Our next, cautious, step is to see if we can infer something from our analysis to date.
Given what we have learned, can we say something about what might happen more widely?
So given the reactions of a sample of customers, can we be reasonably confident that the wider market will react in a certain way?
The level of certainty we have will make this method flow into the next as we predict what will happen.
At election time, this kind of data analysis always reaches fever pitch. All night analyses, updated with constant data feeds, update and predict the outcome.
Prediction is usually possible over relatively short time scales – we might be able to predict with accuracy the winner of a presidential contest in the next month, but not the winner from a pool of potential candidates five years from now.
Now we have to see if there is a causal relationship between two variables – a change in one will cause a specific change in the other.
This is the kind of analysis that happens in clinical trials. For example, a specific dose of a drug will result in a measurable improvement in a condition.
Finally we can look at a mechanistic analysis or an exact model, where we know what will happen as all the variables change.
This is usually the domain of engineering models – we know that a steam train will operate in a certain way once the water gets up to temperature and pressure and the various mechanical systems start to operate.
In a sense, the various methods of analysis progress from simple to complex.
A complex system – like human beings in a social environment – may only be understood with simpler analyses.
We can predict what someone will do, but we cannot say with certainty that a particular set of input stimuli will cause exactly certain neurons to fire and result in a defined activity.
An exact model may only be possible with engineering systems that operate within clear parameters and tolerances.
Analysing data isn’t something that comes naturally to most people.
We need to work on developing the skills, capabilities and toolkit needed to make sense of data.
And that probably starts with knowing what types of analyses we can do and understanding the situations where we can apply them.